Class CDXUtils


  • public class CDXUtils
    extends Object
    Utility class for creating CDX-files. The CDX-format is described here: http://www.archive.org/web/researcher/cdx_file_format.php
    • Constructor Detail

      • CDXUtils

        public CDXUtils()
    • Method Detail

      • writeCDXInfo

        public static void writeCDXInfo​(File archivefile,
                                        OutputStream cdxstream)
        Add cdx info for a given archive file to a given OutputStream. Note, any exceptions are logged on level FINE but otherwise ignored.
        Parameters:
        archivefile - A file with archive records
        cdxstream - An output stream to add CDX lines to
      • generateCDX

        public static void generateCDX​(ArchiveProfile archiveProfile,
                                       File archiveFileDirectory,
                                       File cdxFileDirectory)
                                throws ArgumentNotValid
        Applies createCDXRecord() to all ARC/WARC files in a directory, creating one CDX file per ARC/WARC file. Note, any exceptions during index generation are logged at level FINE but otherwise ignored. Exceptions creating any cdx file are logged at level WARNING but otherwise ignored. CDX files are named as the ARC/WARC files except ".(w)arc" or ".(w)arc.gz" is extended with ".cdx"
        Parameters:
        archiveProfile - archive profile including filters, patterns, etc.
        archiveFileDirectory - A directory with archive files to generate index for
        cdxFileDirectory - A directory to generate CDX files in
        Throws:
        ArgumentNotValid - if any of directories are null or is not an existing directory, or if cdxFileDirectory is not writable.