|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object dk.netarkivet.common.utils.batch.FileBatchJob dk.netarkivet.common.utils.arc.ARCBatchJob dk.netarkivet.common.utils.cdx.ExtractCDXJob
public class ExtractCDXJob
Batch job that extracts information to create a CDX file. A CDX file contains sorted lines of metadata from the ARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. See http://www.archive.org/web/researcher/cdx_file_format.php
Nested Class Summary |
---|
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
---|
FileBatchJob.ExceptionOccurrence |
Field Summary |
---|
Fields inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob |
---|
noOfRecordsProcessed |
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
---|
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed |
Constructor Summary | |
---|---|
ExtractCDXJob()
Equivalent to ExtractCDXJob(true). |
|
ExtractCDXJob(boolean includeChecksum)
Constructs a new job for extracting CDX indexes. |
Method Summary | |
---|---|
void |
finish(java.io.OutputStream os)
End of the batch job. |
ARCBatchFilter |
getFilter()
Filter out the filedesc: headers. |
void |
initialize(java.io.OutputStream os)
Initialize any data needed (none). |
void |
processRecord(org.archive.io.arc.ARCRecord sar,
java.io.OutputStream os)
Process this entry, reading metadata into the output stream. |
java.lang.String |
toString()
Humanly readable description of this instance. |
Methods inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob |
---|
getExceptionArray, handleException, noOfRecordsProcessed, processFile |
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
---|
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public ExtractCDXJob(boolean includeChecksum)
includeChecksum
- If true, an MD5 checksum is also
written for each record. If false, it is not.public ExtractCDXJob()
Method Detail |
---|
public ARCBatchFilter getFilter()
getFilter
in class ARCBatchJob
ARCBatchJob.getFilter()
public void initialize(java.io.OutputStream os)
initialize
in class ARCBatchJob
os
- The OutputStream to which output data is writtenARCBatchJob.initialize(OutputStream)
public void processRecord(org.archive.io.arc.ARCRecord sar, java.io.OutputStream os)
processRecord
in class ARCBatchJob
sar
- the object to be processed.os
- The OutputStream to which output data is written
IOFailure
- on trouble reading arc record dataARCBatchJob.processRecord(
ARCRecord, OutputStream)
public void finish(java.io.OutputStream os)
finish
in class ARCBatchJob
os
- The OutputStream to which output data is writtenARCBatchJob.finish(OutputStream)
public java.lang.String toString()
toString
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |