Package dk.netarkivet.wayback.batch
Class DeduplicationCDXExtractionBatchJob
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.FileBatchJob
-
- dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
-
- dk.netarkivet.common.utils.archive.ArchiveBatchJob
-
- dk.netarkivet.wayback.batch.DeduplicationCDXExtractionBatchJob
-
- All Implemented Interfaces:
Serializable
public class DeduplicationCDXExtractionBatchJob extends ArchiveBatchJob
This batch batch job takes deduplication records from a crawl log in a metadata arcfile and converts them to cdx records for use in wayback.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
-
-
Field Summary
-
Fields inherited from class dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
noOfRecordsProcessed
-
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
-
-
Constructor Summary
Constructors Constructor Description DeduplicationCDXExtractionBatchJob()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
finish(OutputStream os)
Does nothing.void
initialize(OutputStream os)
Initializes various fields of this class.void
processRecord(ArchiveRecordBase record, OutputStream os)
If the ArchiveRecord is a crawl-log entry then any duplicate entries in the crawl log are converted to CDX entries and written to the output.-
Methods inherited from class dk.netarkivet.common.utils.archive.ArchiveBatchJob
getFilter, processFile
-
Methods inherited from class dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
getExceptionArray, handleException, handleOurException, noOfRecordsProcessed
-
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
-
-
-
-
Method Detail
-
initialize
public void initialize(OutputStream os)
Initializes various fields of this class.- Specified by:
initialize
in classArchiveBatchJobBase
- Parameters:
os
- unused parameter
-
processRecord
public void processRecord(ArchiveRecordBase record, OutputStream os)
If the ArchiveRecord is a crawl-log entry then any duplicate entries in the crawl log are converted to CDX entries and written to the output. Otherwise this method returns without doing anything. If the ArchiveRecord is a WarcRecord, and the record is the warcinfo, the record is skipped.- Specified by:
processRecord
in classArchiveBatchJob
- Parameters:
record
- The ArchiveRecord to be processedos
- the stream to which output is written
-
finish
public void finish(OutputStream os)
Does nothing.- Specified by:
finish
in classArchiveBatchJobBase
- Parameters:
os
- an outputstream
-
-