dk.netarkivet.harvester.harvesting
Class IngestableFiles

java.lang.Object
  extended by dk.netarkivet.harvester.harvesting.IngestableFiles

public class IngestableFiles
extends java.lang.Object

Encapsulation of files to be ingested into the archive. These files are presently placed subdirectories under the crawldir.


Constructor Summary
IngestableFiles(java.io.File crawlDir, long jobID)
          Constructor for this class.
 
Method Summary
 void closeOpenFiles(int waitSeconds)
          Close any ".open" files left by a crashed Heritrix.
 java.util.List<java.io.File> getArcFiles()
          Get a list of all ARC files that should get ingested.
 java.util.List<java.io.File> getMetadataArcFiles()
          Gets the files containing the metadata.
 org.archive.io.arc.ARCWriter getMetadataArcWriter()
          Get a ARCWriter for the temporary metadata arc-file.
 boolean isMetadataFailed()
          Return true if the metadata generation process is known to have failed.
 boolean isMetadataReady()
          Check, if the metadatafile already exists.
 void setMetadataGenerationSucceeded(boolean success)
          Marks generated metadata as final.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

IngestableFiles

public IngestableFiles(java.io.File crawlDir,
                       long jobID)
Constructor for this class.

Parameters:
crawlDir - directory, where all files for the harvestjob (including metadataFile) are
jobID - ID for the given harvestjob
Throws:
ArgumentNotValid - if null-arguments are given; if jobID < 1; if crawlDir does not exist
Method Detail

isMetadataReady

public boolean isMetadataReady()
Check, if the metadatafile already exists. If this is true, metadata has been successfully generated. If false, either metadata has not finished being generated, or there was an error generating them.

Returns:
true, if it does exist; false otherwise.

isMetadataFailed

public boolean isMetadataFailed()
Return true if the metadata generation process is known to have failed.

Returns:
True if metadata generation is finished without success, false if generation is still ongoing or has been successfully done.

setMetadataGenerationSucceeded

public void setMetadataGenerationSucceeded(boolean success)
Marks generated metadata as final. Closes the arcwriter and moves the temporary metadata file to its final position, if successfull.

Parameters:
success - True if metadata was successfully generated, false otherwise.
Throws:
PermissionDenied - If the metadata has already been marked as ready, or if no metadata file exists upon success.
IOFailure - if there is an error marking the metadata as ready.

getMetadataArcWriter

public org.archive.io.arc.ARCWriter getMetadataArcWriter()
Get a ARCWriter for the temporary metadata arc-file. Successive calls to this method on the same object will return the same writer. Once the metadata have been finalized, calling this method will fail.

Returns:
a ARCWriter for the temporary metadata arc-file.
Throws:
PermissionDenied - if metadata generation is already finished.

getMetadataArcFiles

public java.util.List<java.io.File> getMetadataArcFiles()
Gets the files containing the metadata.

Returns:
the files in the metadata dir
Throws:
PermissionDenied - if the metadata file is not ready, either because generation is still going on or there was an error generating the metadata.

getArcFiles

public java.util.List<java.io.File> getArcFiles()
Get a list of all ARC files that should get ingested. Any open files should be closed with closeOpenFiles first.

Returns:
The ARC files that are ready to get ingested.

closeOpenFiles

public void closeOpenFiles(int waitSeconds)
Close any ".open" files left by a crashed Heritrix. ARC files ending in .open indicate that Heritrix is still writing to them. If Heritrix has died, we can just rename them before we upload. This must not be done while harvesting is still in progress.

Parameters:
waitSeconds - How many seconds to wait before closing files. This may be done in order to allow Heritrix to finish writing before we close the files.