public class ArchiveExtractCDXJob extends ArchiveBatchJob
A CDX file contains sorted lines of metadata from the ARC/WARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. The timeout of this job is 7 days. See http://www.archive.org/web/researcher/cdx_file_format.php
FileBatchJob.ExceptionOccurrence
noOfRecordsProcessed
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
Constructor and Description |
---|
ArchiveExtractCDXJob()
Equivalent to ArchiveExtractCDXJob(true).
|
ArchiveExtractCDXJob(boolean includeChecksum)
Constructs a new job for extracting CDX indexes.
|
Modifier and Type | Method and Description |
---|---|
void |
finish(OutputStream os)
End of the batch job.
|
ArchiveBatchFilter |
getFilter()
Filters out the NON-RESPONSE records.
|
void |
initialize(OutputStream os)
Initialize any data needed (none).
|
void |
processRecord(ArchiveRecordBase record,
OutputStream os)
Process this entry, reading metadata into the output stream.
|
String |
toString() |
processFile
getExceptionArray, handleException, handleOurException, noOfRecordsProcessed
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
public ArchiveExtractCDXJob(boolean includeChecksum)
includeChecksum
- If true, an MD5 checksum is also written for each record. If false, it is not.public ArchiveExtractCDXJob()
public ArchiveBatchFilter getFilter()
getFilter
in class ArchiveBatchJob
ArchiveBatchJob.getFilter()
public void initialize(OutputStream os)
initialize
in class ArchiveBatchJobBase
os
- The OutputStream to which output data is writtenArchiveBatchJobBase.initialize(OutputStream)
public void processRecord(ArchiveRecordBase record, OutputStream os)
processRecord
in class ArchiveBatchJob
record
- the object to be processed.os
- The OutputStream to which output data is writtenIOFailure
- on trouble reading arc record dataArchiveBatchJob.processRecord(ArchiveRecordBase, OutputStream)
public void finish(OutputStream os)
finish
in class ArchiveBatchJobBase
os
- The OutputStream to which output data is writtenARCBatchJob.finish(OutputStream)
Copyright © 2005–2015 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.