Class Heritrix3Files
- java.lang.Object
-
- dk.netarkivet.harvester.heritrix3.Heritrix3Files
-
public class Heritrix3Files extends java.lang.Object
This class encapsulates the information generated by Heritrix3 or delivered to Heritrix3 before a crawl.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cleanUpAfterHarvest(java.io.File oldJobsDir)
void
deleteFinalLogs()
java.lang.String
getArchiveFilePrefix()
java.io.File
getCertificateFile()
java.io.File
getCrawlDir()
java.io.File
getCrawlLog()
java.io.File[]
getDisposableFiles()
Considered as disposable files are the following: crawlDir/checkpoints h3JobDir/state h3JobDir/scratch h3BaseDir/bin h3BaseDir/extras h3BaseDir/libstatic Heritrix3Files
getH3HeritrixFiles(java.io.File crawldir, Job job)
static Heritrix3Files
getH3HeritrixFiles(java.io.File crawldir, PersistentJobData harvestInfo)
java.lang.Long
getHarvestID()
java.io.File
getHeritrixBaseDir()
java.io.File
getHeritrixJobDir()
java.io.File
getHeritrixOutput()
java.io.File
getHeritrixStderrLog()
java.io.File
getHeritrixStdoutLog()
java.io.File
getHeritrixZip()
java.io.File
getIndexDir()
java.lang.Long
getJobID()
java.lang.String
getJobname()
java.io.File
getOrderFile()
java.io.File
getOrderXmlFile()
java.io.File
getProgressStatisticsLog()
java.io.File
getSeedsFile()
java.io.File
getSeedsTxtFile()
void
setIndexDir(java.io.File indexDir)
void
writeOrderXml(HeritrixTemplate orderXMLdoc)
void
writeSeedsTxt(java.lang.String seedListAsString)
-
-
-
Method Detail
-
getH3HeritrixFiles
public static Heritrix3Files getH3HeritrixFiles(java.io.File crawldir, PersistentJobData harvestInfo)
-
getH3HeritrixFiles
public static Heritrix3Files getH3HeritrixFiles(java.io.File crawldir, Job job)
-
getCrawlDir
public java.io.File getCrawlDir()
-
writeSeedsTxt
public void writeSeedsTxt(java.lang.String seedListAsString)
-
getSeedsFile
public java.io.File getSeedsFile()
-
getOrderFile
public java.io.File getOrderFile()
-
setIndexDir
public void setIndexDir(java.io.File indexDir)
-
writeOrderXml
public void writeOrderXml(HeritrixTemplate orderXMLdoc)
-
getProgressStatisticsLog
public java.io.File getProgressStatisticsLog()
-
getJobID
public java.lang.Long getJobID()
-
getOrderXmlFile
public java.io.File getOrderXmlFile()
-
getSeedsTxtFile
public java.io.File getSeedsTxtFile()
-
getHarvestID
public java.lang.Long getHarvestID()
-
getArchiveFilePrefix
public java.lang.String getArchiveFilePrefix()
-
getIndexDir
public java.io.File getIndexDir()
-
getCrawlLog
public java.io.File getCrawlLog()
-
getHeritrixZip
public java.io.File getHeritrixZip()
-
getCertificateFile
public java.io.File getCertificateFile()
-
getHeritrixOutput
public java.io.File getHeritrixOutput()
-
getHeritrixStderrLog
public java.io.File getHeritrixStderrLog()
-
getHeritrixStdoutLog
public java.io.File getHeritrixStdoutLog()
-
getHeritrixJobDir
public java.io.File getHeritrixJobDir()
-
getHeritrixBaseDir
public java.io.File getHeritrixBaseDir()
-
getJobname
public java.lang.String getJobname()
-
deleteFinalLogs
public void deleteFinalLogs()
-
cleanUpAfterHarvest
public void cleanUpAfterHarvest(java.io.File oldJobsDir)
-
getDisposableFiles
public java.io.File[] getDisposableFiles()
Considered as disposable files are the following: crawlDir/checkpoints h3JobDir/state h3JobDir/scratch h3BaseDir/bin h3BaseDir/extras h3BaseDir/lib- Returns:
- list of disposable files and file directories
-
-