Class WaybackCDXExtractionARCBatchJob

    • Method Detail

      • initialize

        public void initialize​(java.io.OutputStream os)
        Initializes the private fields of this class. Some of these are relatively heavy objects, so it is important that they are only initialised once.
        Specified by:
        initialize in class ARCBatchJob
        Parameters:
        os - unused argument
      • finish

        public void finish​(java.io.OutputStream os)
        Does nothing except log the end of the job.
        Specified by:
        finish in class ARCBatchJob
        Parameters:
        os - unused argument.
      • getFilter

        public ARCBatchFilter getFilter()
        Description copied from class: ARCBatchJob
        returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed. The default value is a neutral filter which allows all records.
        Overrides:
        getFilter in class ARCBatchJob
        Returns:
        A filter telling which records should be given to processRecord().
      • processRecord

        public void processRecord​(org.archive.io.arc.ARCRecord record,
                                  java.io.OutputStream os)
        For each ARCRecord writes one CDX line (including newline) to the output. If an arcrecord cannot be converted to a CDX record for any reason then any resulting exception is caught and logged.
        Specified by:
        processRecord in class ARCBatchJob
        Parameters:
        record - the ARCRecord to be indexed.
        os - the OutputStream to which output is written.