Class ArchiveExtractCDXJob

  • All Implemented Interfaces:
    Serializable

    public class ArchiveExtractCDXJob
    extends ArchiveBatchJob
    Batch job that extracts information to create a CDX file.

    A CDX file contains sorted lines of metadata from the ARC/WARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. The timeout of this job is 7 days. See http://www.archive.org/web/researcher/cdx_file_format.php

    See Also:
    Serialized Form