Package dk.netarkivet.common.utils.batch
Class FileBatchJob
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.FileBatchJob
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
ARCBatchJob
,ArchiveBatchJobBase
,ChecksumJob
,FileListJob
,FileRemover
,LoadableFileBatchJob
,LoadableJarBatchJob
,WARCBatchJob
public abstract class FileBatchJob extends Object implements Serializable
Interface defining a batch job to run on a set of files. The job is initialized by calling initialize(), executed on a file by calling processFile() and any cleanup is handled by finish().- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FileBatchJob.ExceptionOccurrence
This class holds the information about exceptions that occurred in a batchjob.
-
Field Summary
Fields Modifier and Type Field Description protected long
batchJobTimeout
If positiv it is the timeout of specific Batch Job in miliseconds.protected List<FileBatchJob.ExceptionOccurrence>
exceptions
A list with information about the exceptions thrown during the execution of the batchjob.protected Set<File>
filesFailed
A Set of files which generated errors.protected int
noOfFilesProcessed
The total number of files processed (including any that generated errors).
-
Constructor Summary
Constructors Constructor Description FileBatchJob()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
addException(File currentFile, long currentOffset, long outputOffset, Exception e)
Record an exception that occurred during the processFile of this job and that should be returned with the result.protected void
addFinishException(long outputOffset, Exception e)
Record an exception that occurred during the finish() method of this job.protected void
addInitializeException(long outputOffset, Exception e)
Record an exception that occurred during the initialize() method of this job.abstract void
finish(OutputStream os)
Finish up the job.long
getBatchJobTimeout()
Getter for batchJobTimeout.List<FileBatchJob.ExceptionOccurrence>
getExceptions()
Get the list of exceptions that have occurred during processing.Pattern
getFilenamePattern()
Get the pattern for files that should be processed.Collection<File>
getFilesFailed()
Return the list of names of files where processing failed.int
getNoOfFilesProcessed()
Return the number of files processed in this job.abstract void
initialize(OutputStream os)
Initialize the job before runnning.protected boolean
maxExceptionsReached()
Returns true if we have already recorded the maximum number of exceptions.boolean
postProcess(InputStream input, OutputStream output)
Processes the concatenated result files.abstract boolean
processFile(File file, OutputStream os)
Process one file stored in the bit archive.void
processOnlyFileNamed(String specifiedFilename)
Helper method for only processing one file.void
processOnlyFilesMatching(String specifiedPattern)
Set this job to match only a certain pattern.void
processOnlyFilesMatching(List<String> specifiedPatterns)
Set this job to match only a certain set of patterns.void
processOnlyFilesNamed(List<String> specifiedFilenames)
Mark the job to process only the specified files.void
setBatchJobTimeout(long batchJobTimeout)
Override predefined timeout period for batchjob.
-
-
-
Field Detail
-
noOfFilesProcessed
protected int noOfFilesProcessed
The total number of files processed (including any that generated errors).
-
batchJobTimeout
protected long batchJobTimeout
If positiv it is the timeout of specific Batch Job in miliseconds. If numbers is negative we use standard timeout from settings.
-
exceptions
protected List<FileBatchJob.ExceptionOccurrence> exceptions
A list with information about the exceptions thrown during the execution of the batchjob.
-
-
Method Detail
-
initialize
public abstract void initialize(OutputStream os)
Initialize the job before runnning. This is called before the processFile() calls. If this throws an exception, processFile() will not be called, but finish() will,- Parameters:
os
- the OutputStream to which output should be written
-
processFile
public abstract boolean processFile(File file, OutputStream os)
Process one file stored in the bit archive.- Parameters:
file
- the file to be processed.os
- the OutputStream to which output should be written- Returns:
- true if the file was successfully processed, false otherwise
-
finish
public abstract void finish(OutputStream os)
Finish up the job. This is called after the last process() call. If the initialize() call throws an exception, this will still be called so that any resources allocated can be cleaned up. Implementations should make sure that this method can handle a partial initialization- Parameters:
os
- the OutputStream to which output should be written
-
processOnlyFilesNamed
public void processOnlyFilesNamed(List<String> specifiedFilenames)
Mark the job to process only the specified files. This will override any previous setting of which files to process.- Parameters:
specifiedFilenames
- A list of filenamess to process (without paths). If null, all files will be processed.
-
processOnlyFileNamed
public void processOnlyFileNamed(String specifiedFilename)
Helper method for only processing one file. This will override any previous setting of which files to process.- Parameters:
specifiedFilename
- The name of the single file that should be processed. Should not include any path information.
-
processOnlyFilesMatching
public void processOnlyFilesMatching(List<String> specifiedPatterns)
Set this job to match only a certain set of patterns. This will override any previous setting of which files to process.- Parameters:
specifiedPatterns
- The patterns of file names that this job will operate on. These should not include any path information, but should match the entire filename (e.g. .*foo.* for any file with foo in the name).
-
processOnlyFilesMatching
public void processOnlyFilesMatching(String specifiedPattern)
Set this job to match only a certain pattern. This will override any previous setting of which files to process.- Parameters:
specifiedPattern
- Regular expression of file names that this job will operate on. This should not include any path information, but should match the entire filename (e.g. .*foo.* for any file with foo in the name).
-
getFilenamePattern
public Pattern getFilenamePattern()
Get the pattern for files that should be processed.- Returns:
- A pattern for files to process.
-
getNoOfFilesProcessed
public int getNoOfFilesProcessed()
Return the number of files processed in this job.- Returns:
- the number of files processed in this job
-
getFilesFailed
public Collection<File> getFilesFailed()
Return the list of names of files where processing failed. An empty list is returned, if none failed.- Returns:
- the possibly empty list of names of files where processing failed
-
getExceptions
public List<FileBatchJob.ExceptionOccurrence> getExceptions()
Get the list of exceptions that have occurred during processing.- Returns:
- List of exceptions together with information on where they happened.
-
postProcess
public boolean postProcess(InputStream input, OutputStream output)
Processes the concatenated result files. This is intended to be overridden by batchjobs, who they wants a different post-processing process than concatenation.- Parameters:
input
- The inputstream to the file containing the concatenated results.output
- The outputstream where the resulting data should be written.- Returns:
- Whether it actually does any post processing. If false is returned then the default concatenated result file is returned.
- Throws:
ArgumentNotValid
- If the concatenated file is null.
-
addException
protected void addException(File currentFile, long currentOffset, long outputOffset, Exception e)
Record an exception that occurred during the processFile of this job and that should be returned with the result. If maxExceptionsReached() returns true, this method silently does nothing.- Parameters:
currentFile
- The file that is currently being processed.currentOffset
- The relevant offset into the file when the exception happened (e.g. the start of an ARC record).outputOffset
- The offset we were at in the outputstream when the exception happened. If UNKNOWN_OFFSET, the offset could not be found.e
- The exception thrown. This exception must be serializable.
-
addInitializeException
protected void addInitializeException(long outputOffset, Exception e)
Record an exception that occurred during the initialize() method of this job.- Parameters:
outputOffset
- The offset we were at in the outputstream when the exception happened. If UNKNOWN_OFFSET, the offset could not be found.e
- The exception thrown. This exception must be serializable.
-
addFinishException
protected void addFinishException(long outputOffset, Exception e)
Record an exception that occurred during the finish() method of this job.- Parameters:
outputOffset
- The offset we were at in the outputstream when the exception happened. If UNKNOWN_OFFSET, the offset could not be found.e
- The exception thrown. This exception must be serializable.
-
getBatchJobTimeout
public long getBatchJobTimeout()
Getter for batchJobTimeout. If the batchjob has not defined a maximum time (thus set the value to -1) then the default value from settings are used.- Returns:
- timeout in miliseconds.
-
maxExceptionsReached
protected boolean maxExceptionsReached()
Returns true if we have already recorded the maximum number of exceptions. At this point, no more exceptions will be recorded, and processing should be aborted.- Returns:
- True if the maximum number of exceptions (MAX_EXCEPTIONS) has been recorded already.
-
setBatchJobTimeout
public void setBatchJobTimeout(long batchJobTimeout)
Override predefined timeout period for batchjob.- Parameters:
batchJobTimeout
- timout period
-
-