dk.netarkivet.common.utils.cdx
Class ArchiveExtractCDXJob
java.lang.Object
dk.netarkivet.common.utils.batch.FileBatchJob
dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
dk.netarkivet.common.utils.archive.ArchiveBatchJob
dk.netarkivet.common.utils.cdx.ArchiveExtractCDXJob
- All Implemented Interfaces:
- java.io.Serializable
public class ArchiveExtractCDXJob
- extends ArchiveBatchJob
Batch job that extracts information to create a CDX file.
A CDX file contains sorted lines of metadata from the ARC/WARC files, with
each line followed by the file and offset the record was found at, and
optionally a checksum.
The timeout of this job is 7 days.
See http://www.archive.org/web/researcher/cdx_file_format.php
- See Also:
- Serialized Form
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
ArchiveExtractCDXJob
public ArchiveExtractCDXJob(boolean includeChecksum)
- Constructs a new job for extracting CDX indexes.
- Parameters:
includeChecksum
- If true, an MD5 checksum is also
written for each record. If false, it is not.
ArchiveExtractCDXJob
public ArchiveExtractCDXJob()
- Equivalent to ArchiveExtractCDXJob(true).
getFilter
public ArchiveBatchFilter getFilter()
- Filters out the NON-RESPONSE records.
- Overrides:
getFilter
in class ArchiveBatchJob
- Returns:
- The filter that defines what ARC/WARC records are wanted
in the output CDX file.
- See Also:
ArchiveBatchJob.getFilter()
initialize
public void initialize(java.io.OutputStream os)
- Initialize any data needed (none).
- Specified by:
initialize
in class ArchiveBatchJobBase
- Parameters:
os
- The OutputStream to which output data is written- See Also:
ArchiveBatchJobBase.initialize(OutputStream)
processRecord
public void processRecord(ArchiveRecordBase record,
java.io.OutputStream os)
- Process this entry, reading metadata into the output stream.
- Specified by:
processRecord
in class ArchiveBatchJob
- Parameters:
record
- the object to be processed.os
- The OutputStream to which output data is written
- Throws:
IOFailure
- on trouble reading arc record data- See Also:
ArchiveBatchJob.processRecord(
ArchiveRecordBase, OutputStream)
finish
public void finish(java.io.OutputStream os)
- End of the batch job.
- Specified by:
finish
in class ArchiveBatchJobBase
- Parameters:
os
- The OutputStream to which output data is written- See Also:
ARCBatchJob.finish(OutputStream)
toString
public java.lang.String toString()
- Overrides:
toString
in class java.lang.Object
- Returns:
- Humanly readable description of this instance.