public class HarvestController extends Object
Modifier and Type | Method and Description |
---|---|
void |
cleanup()
Clean up this singleton, releasing the ArcRepositoryClient and removing the instance.
|
static HarvestController |
getInstance()
Get the instance of the singleton HarvestController.
|
static List<CDXRecord> |
getMetadataCDXRecordsForJob(long jobid)
Submit a batch job to generate cdx for all metadata files for a job, and report result in a list.
|
void |
runHarvest(HeritrixFiles files)
Creates the actual HeritrixLauncher instance and runs it, after the various setup files have been written.
|
HarvestReport |
storeFiles(HeritrixFiles files,
StringBuilder errorMessage,
List<File> failedFiles)
Controls storing all files involved in a job.
|
HeritrixFiles |
writeHarvestFiles(File crawldir,
Job job,
HarvestDefinitionInfo hdi,
List<MetadataEntry> metadataEntries)
Writes the files involved with a harvests.
|
public static HarvestController getInstance()
public void cleanup()
public HeritrixFiles writeHarvestFiles(File crawldir, Job job, HarvestDefinitionInfo hdi, List<MetadataEntry> metadataEntries)
crawldir
- The directory that the crawl should take place in.job
- The Job object containing various harvest setup data.hdi
- The object encapsulating documentary information about the harvest.metadataEntries
- Any metadata entries sent along with the job that should be stored for later use.public void runHarvest(HeritrixFiles files) throws ArgumentNotValid
files
- Description of files involved in running Heritrix. Not Null.ArgumentNotValid
- if an argument isn't valid.public HarvestReport storeFiles(HeritrixFiles files, StringBuilder errorMessage, List<File> failedFiles) throws ArgumentNotValid
Additionally, any leftover open ARC files are closed and harvest documentation is extracted before upload starts.
files
- The HeritrixFiles object for this crawl. Not Null.errorMessage
- A place where error messages accumulate. Not Null.failedFiles
- List of files that failed to upload. Not Null.ArgumentNotValid
- if an argument isn't valid.public static List<CDXRecord> getMetadataCDXRecordsForJob(long jobid)
jobid
- The job to get cdx for.ArgumentNotValid
- If jobid is 0 or negative.IOFailure
- On trouble generating the cdxCopyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.