Class HarvestJob


  • public class HarvestJob
    extends Object
    • Constructor Detail

      • HarvestJob

        public HarvestJob​(HarvestControllerServer hcs)
        Constructor.
        Parameters:
        hcs - a HarvestControllerServer instance
    • Method Detail

      • init

        public void init​(Job job,
                         HarvestDefinitionInfo origHarvestInfo,
                         List<MetadataEntry> metadataEntries)
        Initialization of the harvestJob.
        Parameters:
        job - A job from the jobs table in the harvestdatabase
        origHarvestInfo - metadata about the harvest
        metadataEntries - entries for the metadata file for the harvest
      • getHeritrix3Files

        public Heritrix3Files getHeritrix3Files()
        Returns:
        the Heritrix3Files object initialized with the init() method.
      • runHarvest

        public void runHarvest()
                        throws ArgumentNotValid
        Creates the actual HeritrixLauncher instance and runs it, after the various setup files have been written.
        Throws:
        ArgumentNotValid - if an argument isn't valid.
      • createCrawlDir

        public File createCrawlDir()
        Create the crawl dir, but make sure a message is sent if there is a problem.
        Returns:
        The directory that the crawl will take place in.
        Throws:
        PermissionDenied - if the directory cannot be created.
      • writeHarvestFiles

        public Heritrix3Files writeHarvestFiles​(File crawldir,
                                                Job job,
                                                HarvestDefinitionInfo hdi,
                                                List<MetadataEntry> metadataEntries)
        Writes the files needed to start a harvest..
        Parameters:
        crawldir - The directory that the crawl should take place in.
        job - The Job object containing various harvest setup data.
        hdi - The object encapsulating documentary information about the harvest.
        metadataEntries - Any metadata entries sent along with the job that should be stored for later use.
        Returns:
        An object encapsulating where these files have been written.