Package dk.netarkivet.common.utils.arc
Class ARCBatchJob
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.FileBatchJob
-
- dk.netarkivet.common.utils.arc.ARCBatchJob
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
ExtractCDXJob
,GetCDXRecordsBatchJob
,WaybackCDXExtractionARCBatchJob
public abstract class ARCBatchJob extends FileBatchJob
Abstract class defining a batch job to run on a set of ARC files. Each implementation is required to define initialize() , processRecord() and finish() methods. The bitarchive application then ensures that the batch job run initialize(), runs processRecord() on each record in each file in the archive, and then runs finish().- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
-
-
Field Summary
Fields Modifier and Type Field Description protected int
noOfRecordsProcessed
The total number of records processed.-
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
-
-
Constructor Summary
Constructors Constructor Description ARCBatchJob()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract void
finish(OutputStream os)
Finish up the job.Exception[]
getExceptionArray()
Returns a representation of the list of Exceptions recorded for this ARC batch job.ARCBatchFilter
getFilter()
returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed.void
handleException(Exception e, File arcfile, long index)
When the org.archive.io.arc classes throw IOExceptions while reading, this is where they go.abstract void
initialize(OutputStream os)
Initialize the job before running.int
noOfRecordsProcessed()
boolean
processFile(File arcFile, OutputStream os)
Accepts only ARC and ARCGZ files.abstract void
processRecord(org.archive.io.arc.ARCRecord record, OutputStream os)
Exceptions should be handled with the handleException() method.-
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
-
-
-
-
Method Detail
-
initialize
public abstract void initialize(OutputStream os)
Initialize the job before running. This is called before the processRecord() calls start coming.- Specified by:
initialize
in classFileBatchJob
- Parameters:
os
- The OutputStream to which output data is written
-
processRecord
public abstract void processRecord(org.archive.io.arc.ARCRecord record, OutputStream os)
Exceptions should be handled with the handleException() method.- Parameters:
os
- The OutputStream to which output data is writtenrecord
- the object to be processed.
-
finish
public abstract void finish(OutputStream os)
Finish up the job. This is called after the last processRecord() call.- Specified by:
finish
in classFileBatchJob
- Parameters:
os
- The OutputStream to which output data is written
-
getFilter
public ARCBatchFilter getFilter()
returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed. The default value is a neutral filter which allows all records.- Returns:
- A filter telling which records should be given to processRecord().
-
processFile
public final boolean processFile(File arcFile, OutputStream os) throws ArgumentNotValid
Accepts only ARC and ARCGZ files. Runs through all records and calls processRecord() on every record that is allowed by getFilter(). Does nothing on a non-arc file.- Specified by:
processFile
in classFileBatchJob
- Parameters:
arcFile
- The ARC or ARCGZ file to be processed.os
- the OutputStream to which output is to be written- Returns:
- true, if file processed successful, otherwise false
- Throws:
ArgumentNotValid
- if either argument is null
-
handleException
public void handleException(Exception e, File arcfile, long index) throws ArgumentNotValid
When the org.archive.io.arc classes throw IOExceptions while reading, this is where they go. Subclasses are welcome to override the default functionality which simply logs and records them in a list. TODO Actually use the arcfile/index entries in the exception list- Parameters:
e
- An Exception thrown by the org.archive.io.arc classes.arcfile
- The arcFile that was processed while the Exception was thrownindex
- The index (in the ARC file) at which the Exception was thrown- Throws:
ArgumentNotValid
- if e is null
-
getExceptionArray
public Exception[] getExceptionArray()
Returns a representation of the list of Exceptions recorded for this ARC batch job. If called by a subclass, a method overriding handleException() should always call super.handleException().- Returns:
- All Exceptions passed to handleException so far.
-
noOfRecordsProcessed
public int noOfRecordsProcessed()
- Returns:
- the number of records processed.
-
-