dk.netarkivet.wayback
Class ExtractWaybackCDXBatchJob

java.lang.Object
  extended by dk.netarkivet.common.utils.batch.FileBatchJob
      extended by dk.netarkivet.common.utils.arc.ARCBatchJob
          extended by dk.netarkivet.wayback.ExtractWaybackCDXBatchJob
All Implemented Interfaces:
java.io.Serializable

public class ExtractWaybackCDXBatchJob
extends ARCBatchJob

Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls. The returned files are unsorted.

Since:
Jul 1, 2009
See Also:
Serialized Form

Nested Class Summary
static class ExtractWaybackCDXBatchJob.MyAggressiveUrlCanonicalizer
          This class overrides the standard wayback canonicalizer in order to use our version of UURIFactory (see Bug 1719).
 class ExtractWaybackCDXBatchJob.MyARCRecordToSearchResultAdapter
           
 
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
 
Field Summary
 
Fields inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
noOfRecordsProcessed
 
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
exceptions, filesFailed, noOfFilesProcessed
 
Constructor Summary
ExtractWaybackCDXBatchJob()
           
 
Method Summary
 void finish(java.io.OutputStream os)
          Finish up the job.
 void initialize(java.io.OutputStream os)
          Initialize the job before runnning.
 void processRecord(org.archive.io.arc.ARCRecord record, java.io.OutputStream os)
          Exceptions should be handled with the handleException() method.
 
Methods inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
getExceptionArray, getFilter, handleException, noOfRecordsProcessed, processFile
 
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExtractWaybackCDXBatchJob

public ExtractWaybackCDXBatchJob()
Method Detail

initialize

public void initialize(java.io.OutputStream os)
Description copied from class: ARCBatchJob
Initialize the job before runnning. This is called before the processRecord() calls start coming.

Specified by:
initialize in class ARCBatchJob
Parameters:
os - The OutputStream to which output data is written

processRecord

public void processRecord(org.archive.io.arc.ARCRecord record,
                          java.io.OutputStream os)
Description copied from class: ARCBatchJob
Exceptions should be handled with the handleException() method.

Specified by:
processRecord in class ARCBatchJob
Parameters:
record - the object to be processed.
os - The OutputStream to which output data is written

finish

public void finish(java.io.OutputStream os)
Description copied from class: ARCBatchJob
Finish up the job. This is called after the last processRecord() call.

Specified by:
finish in class ARCBatchJob
Parameters:
os - The OutputStream to which output data is written