Class ExtractCDXJob

  • All Implemented Interfaces:
    Serializable

    public class ExtractCDXJob
    extends ARCBatchJob
    Batch job that extracts information to create a CDX file.

    A CDX file contains sorted lines of metadata from the ARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. The timeout of this job is 7 days. See http://www.archive.org/web/researcher/cdx_file_format.php

    See Also:
    Serialized Form
    • Constructor Detail

      • ExtractCDXJob

        public ExtractCDXJob​(boolean includeChecksum)
        Constructs a new job for extracting CDX indexes.
        Parameters:
        includeChecksum - If true, an MD5 checksum is also written for each record. If false, it is not.
      • ExtractCDXJob

        public ExtractCDXJob()
        Equivalent to ExtractCDXJob(true).