public class HarvestDocumentation extends Object
Constructor and Description |
---|
HarvestDocumentation() |
Modifier and Type | Method and Description |
---|---|
static void |
documentHarvest(IngestableFiles ingestables)
Documents the harvest under the given dir in a packaged metadata arc file in a directory 'metadata' under the
current dir.
|
public HarvestDocumentation()
public static void documentHarvest(IngestableFiles ingestables) throws IOFailure
In the current implementation, the documentation consists of CDX indices over all ARC files (with one CDX record per harvested ARC file), plus packaging of log files.
If this method finishes without an exception, it is guaranteed that metadata is ready for upload.
TODO Place preharvestmetadata in IngestableFiles-defined area TODO This method may be a good place to copy deduplicate information from the crawl log to the cdx file.
ingestables
- Information about the finished crawl (crawldir, jobId, harvestID).ArgumentNotValid
- if crawlDir is null or does not exist, or if jobID or harvestID is negative.IOFailure
- if - reading ARC files or temporary files fails - writing a file to arcFilesDir failsCopyright © 2005–2015 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.