public class IngestableFiles extends Object
Modifier and Type | Field and Description |
---|---|
protected static String |
METADATA_SUB_DIR
Subdir with final metadata file in it.
|
Constructor and Description |
---|
IngestableFiles(Heritrix3Files files)
Constructor for this class.
|
Modifier and Type | Method and Description |
---|---|
void |
cleanup()
Remove any temporary files.
|
void |
closeOpenFiles(int waitSeconds)
Close any ".open" files left by a crashed Heritrix.
|
protected void |
closeOpenFiles(String archiveDirName,
FilenameFilter filter)
Given an archive sub-directory name and a filter to match against this method tries to rename the matched files.
|
List<File> |
getArcFiles()
Get a list of all ARC files that should get ingested.
|
File |
getArcsDir() |
File |
getCrawlDir() |
long |
getHarvestID() |
String |
getHarvestnamePrefix() |
File |
getHeritrix3JobDir() |
long |
getJobId() |
List<File> |
getMetadataArcFiles()
Gets the files containing the metadata.
|
protected File |
getMetadataFile()
Constructs the single metadata arc file from the crawlDir and the jobID.
|
MetadataFileWriter |
getMetadataWriter()
Get a MetaDatafileWriter for the temporary metadata file.
|
File |
getReportsDir() |
File |
getTmpMetadataDir()
Constructs the TEMPORARY metadata subdir from the crawlDir.
|
List<File> |
getWarcFiles()
Get a list of all WARC files that should get ingested.
|
File |
getWarcsDir() |
boolean |
isMetadataFailed()
Return true if the metadata generation process is known to have failed.
|
boolean |
isMetadataReady()
Check, if the metadatafile already exists.
|
void |
setMetadataGenerationSucceeded(boolean success)
Marks generated metadata as final, closes the writer, and moves the temporary metadata file to its final
position, if successful.
|
protected static final String METADATA_SUB_DIR
public IngestableFiles(Heritrix3Files files)
files
- An instance of Heritrix3FilesArgumentNotValid
- if null-arguments are given; if jobID < 1; if crawlDir does not existpublic boolean isMetadataReady()
public boolean isMetadataFailed()
public void setMetadataGenerationSucceeded(boolean success)
success
- True if metadata was successfully generated, false otherwise.PermissionDenied
- If the metadata has already been marked as ready, or if no metadata file exists upon
success.IOFailure
- if there is an error marking the metadata as ready.public MetadataFileWriter getMetadataWriter()
PermissionDenied
- if metadata generation is already finished.public List<File> getMetadataArcFiles()
IllegalState
- if the metadata file is not ready, either because generation is still going on or there
was an error generating the metadata.protected File getMetadataFile()
public File getTmpMetadataDir()
public List<File> getArcFiles()
public File getArcsDir()
public File getWarcsDir()
public File getReportsDir()
public List<File> getWarcFiles()
public File getHeritrix3JobDir()
public void closeOpenFiles(int waitSeconds)
waitSeconds
- How many seconds to wait before closing files. This may be done in order to allow Heritrix to
finish writing before we close the files.protected void closeOpenFiles(String archiveDirName, FilenameFilter filter)
archiveDirName
- archive directory name, currently "arc" or "warc"filter
- filename filter used to select ".open" files to renamepublic void cleanup()
public long getJobId()
public long getHarvestID()
public String getHarvestnamePrefix()
public File getCrawlDir()
Copyright © 2005–2015 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.