dk.netarkivet.wayback.batch
Class ExtractDeduplicateCDXBatchJob

java.lang.Object
  extended by dk.netarkivet.common.utils.batch.FileBatchJob
      extended by dk.netarkivet.common.utils.arc.ARCBatchJob
          extended by dk.netarkivet.wayback.batch.ExtractDeduplicateCDXBatchJob
All Implemented Interfaces:
java.io.Serializable

public class ExtractDeduplicateCDXBatchJob
extends ARCBatchJob

This batch batch job takes deduplication records from a crawl log in a metadata arcfile and converts them to cdx records for use in wayback.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
 
Field Summary
 
Fields inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
noOfRecordsProcessed
 
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
 
Constructor Summary
ExtractDeduplicateCDXBatchJob()
           
 
Method Summary
 void finish(java.io.OutputStream os)
          Does nothing
 void initialize(java.io.OutputStream os)
          Initializes various fields of this class
 void processRecord(org.archive.io.arc.ARCRecord record, java.io.OutputStream os)
          If the ARCRecord is a crawl-log entry then any duplicate entries in the crawl log are converted to CDX entries and written to the output.
 
Methods inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
getExceptionArray, getFilter, handleException, noOfRecordsProcessed, processFile
 
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExtractDeduplicateCDXBatchJob

public ExtractDeduplicateCDXBatchJob()
Method Detail

initialize

public void initialize(java.io.OutputStream os)
Initializes various fields of this class

Specified by:
initialize in class ARCBatchJob
Parameters:
os - unused parameter

processRecord

public void processRecord(org.archive.io.arc.ARCRecord record,
                          java.io.OutputStream os)
If the ARCRecord is a crawl-log entry then any duplicate entries in the crawl log are converted to CDX entries and written to the output. Otherwise this method returns without doing anything.

Specified by:
processRecord in class ARCBatchJob
Parameters:
record - The ARCRecord to be processed
os - the stream to which output is written

finish

public void finish(java.io.OutputStream os)
Does nothing

Specified by:
finish in class ARCBatchJob
Parameters:
os -