public class WaybackCDXExtractionWARCBatchJob extends WARCBatchJob
FileBatchJob.ExceptionOccurrence
noOfRecordsProcessed
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
Constructor and Description |
---|
WaybackCDXExtractionWARCBatchJob()
Constructor which set timeout to one day.
|
WaybackCDXExtractionWARCBatchJob(long timeout)
Alternate constructor, where a timeout can be set.
|
Modifier and Type | Method and Description |
---|---|
void |
finish(OutputStream os)
Does nothing except log the end of the job.
|
WARCBatchFilter |
getFilter()
Set the filter, so only response records are currently processed.
|
void |
initialize(OutputStream os)
Initializes the private fields of this class.
|
void |
processRecord(org.archive.io.warc.WARCRecord record,
OutputStream os)
For each response WARCRecord it writes one CDX line (including newline) to the output.
|
getExceptionArray, handleException, noOfRecordsProcessed, processFile
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
public WaybackCDXExtractionWARCBatchJob()
public WaybackCDXExtractionWARCBatchJob(long timeout)
timeout
- specific timeout periodpublic WARCBatchFilter getFilter()
getFilter
in class WARCBatchJob
public void initialize(OutputStream os)
initialize
in class WARCBatchJob
os
- unused argumentpublic void finish(OutputStream os)
finish
in class WARCBatchJob
os
- unused argument.public void processRecord(org.archive.io.warc.WARCRecord record, OutputStream os)
processRecord
in class WARCBatchJob
record
- the WARCRecord to be indexed.os
- the OutputStream to which output is written.Copyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.