Class CDXUtils


  • public class CDXUtils
    extends java.lang.Object
    Utility class for creating CDX-files. The CDX-format is described here: http://www.archive.org/web/researcher/cdx_file_format.php
    • Constructor Summary

      Constructors 
      Constructor Description
      CDXUtils()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void generateCDX​(ArchiveProfile archiveProfile, java.io.File archiveFileDirectory, java.io.File cdxFileDirectory)
      Applies createCDXRecord() to all ARC/WARC files in a directory, creating one CDX file per ARC/WARC file.
      static void writeCDXInfo​(java.io.File archivefile, java.io.OutputStream cdxstream)
      Add cdx info for a given archive file to a given OutputStream.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • writeCDXInfo

        public static void writeCDXInfo​(java.io.File archivefile,
                                        java.io.OutputStream cdxstream)
        Add cdx info for a given archive file to a given OutputStream. Note, any exceptions are logged on level FINE but otherwise ignored.
        Parameters:
        archivefile - A file with archive records
        cdxstream - An output stream to add CDX lines to
      • generateCDX

        public static void generateCDX​(ArchiveProfile archiveProfile,
                                       java.io.File archiveFileDirectory,
                                       java.io.File cdxFileDirectory)
                                throws ArgumentNotValid
        Applies createCDXRecord() to all ARC/WARC files in a directory, creating one CDX file per ARC/WARC file. Note, any exceptions during index generation are logged at level FINE but otherwise ignored. Exceptions creating any cdx file are logged at level WARNING but otherwise ignored. CDX files are named as the ARC/WARC files except ".(w)arc" or ".(w)arc.gz" is extended with ".cdx"
        Parameters:
        archiveProfile - archive profile including filters, patterns, etc.
        archiveFileDirectory - A directory with archive files to generate index for
        cdxFileDirectory - A directory to generate CDX files in
        Throws:
        ArgumentNotValid - if any of directories are null or is not an existing directory, or if cdxFileDirectory is not writable.