Class ArchiveBatchJob
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.FileBatchJob
-
- dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
-
- dk.netarkivet.common.utils.archive.ArchiveBatchJob
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
ArchiveExtractCDXJob
,CrawlLogLinesMatchingRegexp
,DeduplicationCDXExtractionBatchJob
,GetMetadataArchiveBatchJob
,HarvestedUrlsForDomainBatchJob
public abstract class ArchiveBatchJob extends ArchiveBatchJobBase
Abstract class defining a batch job to run on a set of ARC/WARC files. Each implementation is required to define initialize() , processRecord() and finish() methods. The bitarchive application then ensures that the batch job runs initialize(), runs processRecord() on each record in each file in the archive, and then runs finish().- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
-
-
Field Summary
-
Fields inherited from class dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
noOfRecordsProcessed
-
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
-
-
Constructor Summary
Constructors Constructor Description ArchiveBatchJob()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description ArchiveBatchFilter
getFilter()
Returns an ArchiveBatchFilter object which restricts the set of records in the archive on which this batch-job is performed.boolean
processFile(File archiveFile, OutputStream os)
Accepts only arc(.gz) and warc(.gz) files.abstract void
processRecord(ArchiveRecordBase record, OutputStream os)
Exceptions should be handled with the handleException() method.-
Methods inherited from class dk.netarkivet.common.utils.archive.ArchiveBatchJobBase
finish, getExceptionArray, handleException, handleOurException, initialize, noOfRecordsProcessed
-
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
-
-
-
-
Method Detail
-
processRecord
public abstract void processRecord(ArchiveRecordBase record, OutputStream os)
Exceptions should be handled with the handleException() method.- Parameters:
os
- The OutputStream to which output data is writtenrecord
- the object to be processed.
-
getFilter
public ArchiveBatchFilter getFilter()
Returns an ArchiveBatchFilter object which restricts the set of records in the archive on which this batch-job is performed. The default value is a neutral filter which allows all records.- Returns:
- A filter telling which records should be given to processRecord().
-
processFile
public final boolean processFile(File archiveFile, OutputStream os) throws ArgumentNotValid
Accepts only arc(.gz) and warc(.gz) files. Runs through all records and calls processRecord() on every record that is allowed by getFilter(). Does nothing on a non-(w)arc file.- Specified by:
processFile
in classFileBatchJob
- Parameters:
archiveFile
- The arc(.gz) or warc(.gz) file to be processed.os
- the OutputStream to which output is to be written- Returns:
- true, if file processed successful, otherwise false
- Throws:
ArgumentNotValid
- if either argument is null
-
-