|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdk.netarkivet.common.utils.batch.FileBatchJob
dk.netarkivet.common.utils.warc.WARCBatchJob
public abstract class WARCBatchJob
Abstract class defining a batch job to run on a set of WARC files. Each implementation is required to define initialize() , processRecord() and finish() methods. The bitarchive application then ensures that the batch job run initialize(), runs processRecord() on each record in each file in the archive, and then runs finish().
Nested Class Summary |
---|
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
---|
FileBatchJob.ExceptionOccurrence |
Field Summary | |
---|---|
protected int |
noOfRecordsProcessed
The total number of records processed. |
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
---|
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed |
Constructor Summary | |
---|---|
WARCBatchJob()
|
Method Summary | |
---|---|
abstract void |
finish(java.io.OutputStream os)
Finish up the job. |
java.lang.Exception[] |
getExceptionArray()
Returns a representation of the list of Exceptions recorded for this WARC batch job. |
WARCBatchFilter |
getFilter()
returns a BatchFilter object which restricts the set of warc records in the archive on which this batch-job is performed. |
void |
handleException(java.lang.Exception e,
java.io.File warcfile,
long index)
When the org.archive.io.arc classes throw IOExceptions while reading, this is where they go. |
abstract void |
initialize(java.io.OutputStream os)
Initialize the job before running. |
int |
noOfRecordsProcessed()
|
boolean |
processFile(java.io.File warcFile,
java.io.OutputStream os)
Accepts only WARC and WARCGZ files. |
abstract void |
processRecord(org.archive.io.warc.WARCRecord record,
java.io.OutputStream os)
Exceptions should be handled with the handleException() method. |
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob |
---|
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int noOfRecordsProcessed
Constructor Detail |
---|
public WARCBatchJob()
Method Detail |
---|
public abstract void initialize(java.io.OutputStream os)
initialize
in class FileBatchJob
os
- The OutputStream to which output data is writtenpublic abstract void processRecord(org.archive.io.warc.WARCRecord record, java.io.OutputStream os)
os
- The OutputStream to which output data is writtenrecord
- the object to be processed.public abstract void finish(java.io.OutputStream os)
finish
in class FileBatchJob
os
- The OutputStream to which output data is writtenpublic WARCBatchFilter getFilter()
public final boolean processFile(java.io.File warcFile, java.io.OutputStream os) throws ArgumentNotValid
processFile
in class FileBatchJob
warcFile
- The WARC or WARCGZ file to be processed.os
- the OutputStream to which output is to be written
ArgumentNotValid
- if either argument is nullpublic void handleException(java.lang.Exception e, java.io.File warcfile, long index) throws ArgumentNotValid
e
- An Exception thrown by the org.archive.io.arc classes.warcfile
- The arcFile that was processed while the Exception
was thrownindex
- The index (in the WARC file) at which the Exception
was thrown
ArgumentNotValid
- if e is nullpublic java.lang.Exception[] getExceptionArray()
public int noOfRecordsProcessed()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |