Class Heritrix3Files


  • public class Heritrix3Files
    extends Object
    This class encapsulates the information generated by Heritrix3 or delivered to Heritrix3 before a crawl.
    • Method Detail

      • getCrawlDir

        public File getCrawlDir()
      • writeSeedsTxt

        public void writeSeedsTxt​(String seedListAsString)
      • getSeedsFile

        public File getSeedsFile()
      • getOrderFile

        public File getOrderFile()
      • setIndexDir

        public void setIndexDir​(File indexDir)
      • getProgressStatisticsLog

        public File getProgressStatisticsLog()
      • getJobID

        public Long getJobID()
      • getOrderXmlFile

        public File getOrderXmlFile()
      • getSeedsTxtFile

        public File getSeedsTxtFile()
      • getHarvestID

        public Long getHarvestID()
      • getArchiveFilePrefix

        public String getArchiveFilePrefix()
      • getIndexDir

        public File getIndexDir()
      • getCrawlLog

        public File getCrawlLog()
      • getHeritrixZip

        public File getHeritrixZip()
      • getCertificateFile

        public File getCertificateFile()
      • getHeritrixOutput

        public File getHeritrixOutput()
      • getHeritrixStderrLog

        public File getHeritrixStderrLog()
      • getHeritrixStdoutLog

        public File getHeritrixStdoutLog()
      • getHeritrixJobDir

        public File getHeritrixJobDir()
      • getHeritrixBaseDir

        public File getHeritrixBaseDir()
      • getJobname

        public String getJobname()
      • deleteFinalLogs

        public void deleteFinalLogs()
      • cleanUpAfterHarvest

        public void cleanUpAfterHarvest​(File oldJobsDir)
      • getDisposableFiles

        public File[] getDisposableFiles()
        Considered as disposable files are the following: crawlDir/checkpoints h3JobDir/state h3JobDir/scratch h3BaseDir/bin h3BaseDir/extras h3BaseDir/lib
        Returns:
        list of disposable files and file directories