Class WARCExtractCDXJob

  • All Implemented Interfaces:
    java.io.Serializable

    public class WARCExtractCDXJob
    extends WARCBatchJob
    Batch job that extracts information to create a CDX file.

    A CDX file contains sorted lines of metadata from the WARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. The timeout of this job is 7 days. See http://www.archive.org/web/researcher/cdx_file_format.php

    See Also:
    Serialized Form