dk.netarkivet.wayback.batch
Class ExtractWaybackCDXBatchJob
java.lang.Object
dk.netarkivet.common.utils.batch.FileBatchJob
dk.netarkivet.common.utils.arc.ARCBatchJob
dk.netarkivet.wayback.batch.ExtractWaybackCDXBatchJob
- All Implemented Interfaces:
- java.io.Serializable
public class ExtractWaybackCDXBatchJob
- extends ARCBatchJob
Returns a cdx file using the appropriate format for wayback, including
canonicalisation of urls. The returned files are unsorted.
- See Also:
- Serialized Form
Method Summary |
void |
finish(java.io.OutputStream os)
Does nothing except log the end of the job. |
void |
initialize(java.io.OutputStream os)
Initializes the private fields of this class. |
void |
processRecord(org.archive.io.arc.ARCRecord record,
java.io.OutputStream os)
For each ARCRecord writes one CDX line (including newline) to the output. |
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ExtractWaybackCDXBatchJob
public ExtractWaybackCDXBatchJob()
- Constructor which set timeout to one day.
ExtractWaybackCDXBatchJob
public ExtractWaybackCDXBatchJob(long timeout)
- Constructor.
- Parameters:
timeout
- specific timeout period
initialize
public void initialize(java.io.OutputStream os)
- Initializes the private fields of this class. Some of these are
relatively heavy objects, so it is important that they are only
initialised once.
- Specified by:
initialize
in class ARCBatchJob
- Parameters:
os
- unused argument
processRecord
public void processRecord(org.archive.io.arc.ARCRecord record,
java.io.OutputStream os)
- For each ARCRecord writes one CDX line (including newline) to the output.
If an arcrecord cannot be converted to a CDX record for any reason then
any resulting exception is caught and logged.
- Specified by:
processRecord
in class ARCBatchJob
- Parameters:
record
- the ARCRecord to be indexed.os
- the OutputStream to which output is written.
finish
public void finish(java.io.OutputStream os)
- Does nothing except log the end of the job.
- Specified by:
finish
in class ARCBatchJob
- Parameters:
os
- unused argument.