dk.netarkivet.common.utils.arc
Class FileBatchJob

java.lang.Object
  extended by dk.netarkivet.common.utils.arc.FileBatchJob
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
ARCBatchJob, ChecksumJob, FileListJob, LoadableFileBatchJob, LoadableJarBatchJob

public abstract class FileBatchJob
extends java.lang.Object
implements java.io.Serializable

Interface defining a batch job to run on a set of files. The job is initialized by calling initialize(), executed on a file by calling processFile() and any cleanup is handled by finish().

See Also:
Serialized Form

Field Summary
protected  java.util.Set<java.io.File> filesFailed
          A Set of files which generated errors.
protected  int noOfFilesProcessed
          The total number of files processed (including any that generated errors).
 
Constructor Summary
FileBatchJob()
           
 
Method Summary
abstract  void finish(java.io.OutputStream os)
          Finish up the job.
 java.util.regex.Pattern getFilenamePattern()
          Get the pattern for files that should be processed.
 java.util.Collection<java.io.File> getFilesFailed()
          Return the list of names of ARC-files where processing (of one or more ARC records) failed or an empty list if none failed.
 int getNoOfFilesProcessed()
          Return the number of ARC-files processed in this job (at this bit archive application).
abstract  void initialize(java.io.OutputStream os)
          Initialize the job before runnning.
abstract  boolean processFile(java.io.File file, java.io.OutputStream os)
          Process one file stored in the bit archive.
 void processOnlyFileNamed(java.lang.String specifiedFilename)
          Helper method for only processing one file.
 void processOnlyFilesMatching(java.util.List<java.lang.String> specifiedPatterns)
          Set this job to match only a certain set of patterns.
 void processOnlyFilesMatching(java.lang.String specifiedPattern)
          Set this job to match only a certain pattern.
 void processOnlyFilesNamed(java.util.List<java.lang.String> specifiedFilenames)
          Mark the job to process only the specified files.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

noOfFilesProcessed

protected int noOfFilesProcessed
The total number of files processed (including any that generated errors).


filesFailed

protected java.util.Set<java.io.File> filesFailed
A Set of files which generated errors.

Constructor Detail

FileBatchJob

public FileBatchJob()
Method Detail

initialize

public abstract void initialize(java.io.OutputStream os)
Initialize the job before runnning. This is called before the processFile() calls

Parameters:
os - the OutputStream to which output should be written

processFile

public abstract boolean processFile(java.io.File file,
                                    java.io.OutputStream os)
Process one file stored in the bit archive.

Parameters:
file - the file to be processed.
os - the OutputStream to which output should be written
Returns:
true if the file was successfully processed, false otherwise

finish

public abstract void finish(java.io.OutputStream os)
Finish up the job. This is called after the last process() call.

Parameters:
os - the OutputStream to which output should be written

processOnlyFilesNamed

public void processOnlyFilesNamed(java.util.List<java.lang.String> specifiedFilenames)
Mark the job to process only the specified files. This will override any previous setting of which files to process.

Parameters:
specifiedFilenames - A list of filenamess to process (without paths). If null, all files will be processed.

processOnlyFileNamed

public void processOnlyFileNamed(java.lang.String specifiedFilename)
Helper method for only processing one file. This will override any previous setting of which files to process.

Parameters:
specifiedFilename - The name of the single file that should be processed. Should not include any path information.

processOnlyFilesMatching

public void processOnlyFilesMatching(java.util.List<java.lang.String> specifiedPatterns)
Set this job to match only a certain set of patterns. This will override any previous setting of which files to process.

Parameters:
specifiedPatterns - The patterns of file names that this job will operate on. These should not include any path information, but should match the entire filename (e.g. .*foo.* for any file with foo in the name).

processOnlyFilesMatching

public void processOnlyFilesMatching(java.lang.String specifiedPattern)
Set this job to match only a certain pattern. This will override any previous setting of which files to process.

Parameters:
specifiedPattern - Regular expression of file names that this job will operate on. This should not include any path information, but should match the entire filename (e.g. .*foo.* for any file with foo in the name).

getFilenamePattern

public java.util.regex.Pattern getFilenamePattern()
Get the pattern for files that should be processed.

Returns:
A pattern for files to process.

getNoOfFilesProcessed

public int getNoOfFilesProcessed()
Return the number of ARC-files processed in this job (at this bit archive application).

Returns:
the number of ARC-files processed in this job

getFilesFailed

public java.util.Collection<java.io.File> getFilesFailed()
Return the list of names of ARC-files where processing (of one or more ARC records) failed or an empty list if none failed.

Returns:
the possibly empty list of names of ARC-files where processing (of one or more ARC records) failed