public class ExtractCDXJob extends ARCBatchJob
A CDX file contains sorted lines of metadata from the ARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. The timeout of this job is 7 days. See http://www.archive.org/web/researcher/cdx_file_format.php
FileBatchJob.ExceptionOccurrence
noOfRecordsProcessed
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
Constructor and Description |
---|
ExtractCDXJob()
Equivalent to ExtractCDXJob(true).
|
ExtractCDXJob(boolean includeChecksum)
Constructs a new job for extracting CDX indexes.
|
Modifier and Type | Method and Description |
---|---|
void |
finish(OutputStream os)
End of the batch job.
|
ARCBatchFilter |
getFilter()
Filter out the filedesc: headers.
|
void |
initialize(OutputStream os)
Initialize any data needed (none).
|
void |
processRecord(org.archive.io.arc.ARCRecord sar,
OutputStream os)
Process this entry, reading metadata into the output stream.
|
String |
toString() |
getExceptionArray, handleException, noOfRecordsProcessed, processFile
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
public ExtractCDXJob(boolean includeChecksum)
includeChecksum
- If true, an MD5 checksum is also written for each record. If false, it is not.public ExtractCDXJob()
public ARCBatchFilter getFilter()
getFilter
in class ARCBatchJob
ARCBatchJob.getFilter()
public void initialize(OutputStream os)
initialize
in class ARCBatchJob
os
- The OutputStream to which output data is writtenARCBatchJob.initialize(OutputStream)
public void processRecord(org.archive.io.arc.ARCRecord sar, OutputStream os)
processRecord
in class ARCBatchJob
sar
- the object to be processed.os
- The OutputStream to which output data is writtenIOFailure
- on trouble reading arc record dataARCBatchJob.processRecord(ARCRecord, OutputStream)
public void finish(OutputStream os)
finish
in class ARCBatchJob
os
- The OutputStream to which output data is writtenARCBatchJob.finish(OutputStream)
Copyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.