Package dk.netarkivet.wayback.batch
Class WaybackCDXExtractionARCBatchJob
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.FileBatchJob
-
- dk.netarkivet.common.utils.arc.ARCBatchJob
-
- dk.netarkivet.wayback.batch.WaybackCDXExtractionARCBatchJob
-
- All Implemented Interfaces:
java.io.Serializable
public class WaybackCDXExtractionARCBatchJob extends ARCBatchJob
Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls. The returned files are unsorted.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
-
-
Field Summary
-
Fields inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
noOfRecordsProcessed
-
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
-
-
Constructor Summary
Constructors Constructor Description WaybackCDXExtractionARCBatchJob()
Constructor which set timeout to one day.WaybackCDXExtractionARCBatchJob(long timeout)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
finish(java.io.OutputStream os)
Does nothing except log the end of the job.ARCBatchFilter
getFilter()
returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed.void
initialize(java.io.OutputStream os)
Initializes the private fields of this class.void
processRecord(org.archive.io.arc.ARCRecord record, java.io.OutputStream os)
For each ARCRecord writes one CDX line (including newline) to the output.-
Methods inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
getExceptionArray, handleException, noOfRecordsProcessed, processFile
-
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
-
-
-
-
Constructor Detail
-
WaybackCDXExtractionARCBatchJob
public WaybackCDXExtractionARCBatchJob()
Constructor which set timeout to one day.
-
WaybackCDXExtractionARCBatchJob
public WaybackCDXExtractionARCBatchJob(long timeout)
Constructor.- Parameters:
timeout
- specific timeout period
-
-
Method Detail
-
initialize
public void initialize(java.io.OutputStream os)
Initializes the private fields of this class. Some of these are relatively heavy objects, so it is important that they are only initialised once.- Specified by:
initialize
in classARCBatchJob
- Parameters:
os
- unused argument
-
finish
public void finish(java.io.OutputStream os)
Does nothing except log the end of the job.- Specified by:
finish
in classARCBatchJob
- Parameters:
os
- unused argument.
-
getFilter
public ARCBatchFilter getFilter()
Description copied from class:ARCBatchJob
returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed. The default value is a neutral filter which allows all records.- Overrides:
getFilter
in classARCBatchJob
- Returns:
- A filter telling which records should be given to processRecord().
-
processRecord
public void processRecord(org.archive.io.arc.ARCRecord record, java.io.OutputStream os)
For each ARCRecord writes one CDX line (including newline) to the output. If an arcrecord cannot be converted to a CDX record for any reason then any resulting exception is caught and logged.- Specified by:
processRecord
in classARCBatchJob
- Parameters:
record
- the ARCRecord to be indexed.os
- the OutputStream to which output is written.
-
-