dk.netarkivet.common.utils.batch
Class FileBatchJob

java.lang.Object
  extended by dk.netarkivet.common.utils.batch.FileBatchJob
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
ARCBatchJob, ChecksumJob, FileListJob, FileRemover, LoadableFileBatchJob, LoadableJarBatchJob

public abstract class FileBatchJob
extends java.lang.Object
implements java.io.Serializable

Interface defining a batch job to run on a set of files. The job is initialized by calling initialize(), executed on a file by calling processFile() and any cleanup is handled by finish().

See Also:
Serialized Form

Nested Class Summary
static class FileBatchJob.ExceptionOccurrence
          This class holds the information about exceptions that occurred in a batchjob.
 
Field Summary
protected  long batchJobTimeout
          If positiv it is the timeout of specific Batch Job in miliseconds.
protected  java.util.List<FileBatchJob.ExceptionOccurrence> exceptions
          A list with information about the exceptions thrown during the execution of the batchjob.
protected  java.util.Set<java.io.File> filesFailed
          A Set of files which generated errors.
protected  int noOfFilesProcessed
          The total number of files processed (including any that generated errors).
 
Constructor Summary
FileBatchJob()
           
 
Method Summary
protected  void addException(java.io.File currentFile, long currentOffset, long outputOffset, java.lang.Exception e)
          Record an exception that occurred during the processFile of this job and that should be returned with the result.
protected  void addFinishException(long outputOffset, java.lang.Exception e)
          Record an exception that occurred during the finish() method of this job.
protected  void addInitializeException(long outputOffset, java.lang.Exception e)
          Record an exception that occurred during the initialize() method of this job.
abstract  void finish(java.io.OutputStream os)
          Finish up the job.
 long getBatchJobTimeout()
          Getter for batchJobTimeout.
 java.util.List<FileBatchJob.ExceptionOccurrence> getExceptions()
          Get the list of exceptions that have occurred during processing.
 java.util.regex.Pattern getFilenamePattern()
          Get the pattern for files that should be processed.
 java.util.Collection<java.io.File> getFilesFailed()
          Return the list of names of files where processing failed.
 int getNoOfFilesProcessed()
          Return the number of files processed in this job.
abstract  void initialize(java.io.OutputStream os)
          Initialize the job before runnning.
protected  boolean maxExceptionsReached()
          Returns true if we have already recorded the maximum number of exceptions.
 boolean postProcess(java.io.InputStream input, java.io.OutputStream output)
          Processes the concatenated result files.
abstract  boolean processFile(java.io.File file, java.io.OutputStream os)
          Process one file stored in the bit archive.
 void processOnlyFileNamed(java.lang.String specifiedFilename)
          Helper method for only processing one file.
 void processOnlyFilesMatching(java.util.List<java.lang.String> specifiedPatterns)
          Set this job to match only a certain set of patterns.
 void processOnlyFilesMatching(java.lang.String specifiedPattern)
          Set this job to match only a certain pattern.
 void processOnlyFilesNamed(java.util.List<java.lang.String> specifiedFilenames)
          Mark the job to process only the specified files.
 void setBatchJobTimeout(long batchJobTimeout)
          Override predefined timeout period for batchjob
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

noOfFilesProcessed

protected int noOfFilesProcessed
The total number of files processed (including any that generated errors).


batchJobTimeout

protected long batchJobTimeout
If positiv it is the timeout of specific Batch Job in miliseconds. If numbers is negative we use standard timeout from settings.


filesFailed

protected java.util.Set<java.io.File> filesFailed
A Set of files which generated errors.


exceptions

protected java.util.List<FileBatchJob.ExceptionOccurrence> exceptions
A list with information about the exceptions thrown during the execution of the batchjob.

Constructor Detail

FileBatchJob

public FileBatchJob()
Method Detail

initialize

public abstract void initialize(java.io.OutputStream os)
Initialize the job before runnning. This is called before the processFile() calls. If this throws an exception, processFile() will not be called, but finish() will,

Parameters:
os - the OutputStream to which output should be written

processFile

public abstract boolean processFile(java.io.File file,
                                    java.io.OutputStream os)
Process one file stored in the bit archive.

Parameters:
file - the file to be processed.
os - the OutputStream to which output should be written
Returns:
true if the file was successfully processed, false otherwise

finish

public abstract void finish(java.io.OutputStream os)
Finish up the job. This is called after the last process() call. If the initialize() call throws an exception, this will still be called so that any resources allocated can be cleaned up. Implementations should make sure that this method can handle a partial initialization

Parameters:
os - the OutputStream to which output should be written

processOnlyFilesNamed

public void processOnlyFilesNamed(java.util.List<java.lang.String> specifiedFilenames)
Mark the job to process only the specified files. This will override any previous setting of which files to process.

Parameters:
specifiedFilenames - A list of filenamess to process (without paths). If null, all files will be processed.

processOnlyFileNamed

public void processOnlyFileNamed(java.lang.String specifiedFilename)
Helper method for only processing one file. This will override any previous setting of which files to process.

Parameters:
specifiedFilename - The name of the single file that should be processed. Should not include any path information.

processOnlyFilesMatching

public void processOnlyFilesMatching(java.util.List<java.lang.String> specifiedPatterns)
Set this job to match only a certain set of patterns. This will override any previous setting of which files to process.

Parameters:
specifiedPatterns - The patterns of file names that this job will operate on. These should not include any path information, but should match the entire filename (e.g. .*foo.* for any file with foo in the name).

processOnlyFilesMatching

public void processOnlyFilesMatching(java.lang.String specifiedPattern)
Set this job to match only a certain pattern. This will override any previous setting of which files to process.

Parameters:
specifiedPattern - Regular expression of file names that this job will operate on. This should not include any path information, but should match the entire filename (e.g. .*foo.* for any file with foo in the name).

getFilenamePattern

public java.util.regex.Pattern getFilenamePattern()
Get the pattern for files that should be processed.

Returns:
A pattern for files to process.

getNoOfFilesProcessed

public int getNoOfFilesProcessed()
Return the number of files processed in this job.

Returns:
the number of files processed in this job

getFilesFailed

public java.util.Collection<java.io.File> getFilesFailed()
Return the list of names of files where processing failed. An empty list is returned, if none failed.

Returns:
the possibly empty list of names of files where processing failed

getExceptions

public java.util.List<FileBatchJob.ExceptionOccurrence> getExceptions()
Get the list of exceptions that have occurred during processing.

Returns:
List of exceptions together with information on where they happened.

postProcess

public boolean postProcess(java.io.InputStream input,
                           java.io.OutputStream output)
Processes the concatenated result files. This is intended to be overridden by batchjobs, who they wants a different post-processing process than concatenation.

Parameters:
input - The inputstream to the file containing the concatenated results.
output - The outputstream where the resulting data should be written.
Returns:
Whether it actually does any post processing. If false is returned then the default concatenated result file is returned.
Throws:
ArgumentNotValid - If the concatenated file is null.

addException

protected void addException(java.io.File currentFile,
                            long currentOffset,
                            long outputOffset,
                            java.lang.Exception e)
Record an exception that occurred during the processFile of this job and that should be returned with the result. If maxExceptionsReached() returns true, this method silently does nothing.

Parameters:
currentFile - The file that is currently being processed.
currentOffset - The relevant offset into the file when the exception happened (e.g. the start of an ARC record).
outputOffset - The offset we were at in the outputstream when the exception happened. If UNKNOWN_OFFSET, the offset could not be found.
e - The exception thrown. This exception must be serializable.

addInitializeException

protected void addInitializeException(long outputOffset,
                                      java.lang.Exception e)
Record an exception that occurred during the initialize() method of this job.

Parameters:
outputOffset - The offset we were at in the outputstream when the exception happened. If UNKNOWN_OFFSET, the offset could not be found.
e - The exception thrown. This exception must be serializable.

addFinishException

protected void addFinishException(long outputOffset,
                                  java.lang.Exception e)
Record an exception that occurred during the finish() method of this job.

Parameters:
outputOffset - The offset we were at in the outputstream when the exception happened. If UNKNOWN_OFFSET, the offset could not be found.
e - The exception thrown. This exception must be serializable.

getBatchJobTimeout

public long getBatchJobTimeout()
Getter for batchJobTimeout. If the batchjob has not defined a maximum time (thus set the value to -1) then the default value from settings are used.

Returns:
timeout in miliseconds.

maxExceptionsReached

protected boolean maxExceptionsReached()
Returns true if we have already recorded the maximum number of exceptions. At this point, no more exceptions will be recorded, and processing should be aborted.

Returns:
True if the maximum number of exceptions (MAX_EXCEPTIONS) has been recorded already.

setBatchJobTimeout

public void setBatchJobTimeout(long batchJobTimeout)
Override predefined timeout period for batchjob

Parameters:
batchJobTimeout - timout period