|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdk.netarkivet.harvester.harvesting.HarvestDocumentation
public class HarvestDocumentation
This class contains code for documenting a harvest. Metadata is read from the directories associated with a given harvest-job-attempt (i.e. one DoCrawlMessage sent to a harvest server). The collected metadata are written to a new metadata file that is managed by IngestableFiles. Temporary metadata files will be deleted after this metadata file has been written.
Constructor Summary | |
---|---|
HarvestDocumentation()
|
Method Summary | |
---|---|
static void |
documentHarvest(IngestableFiles ingestables)
Documents the harvest under the given dir in a packaged metadata arc file in a directory 'metadata' under the current dir. |
static java.net.URI |
getAlternateCDXURI(long jobID,
java.lang.String filename)
Generates a URI identifying CDX info for one harvested ARC file. |
static java.net.URI |
getCDXURI(java.lang.String harvestID,
java.lang.String jobID,
java.lang.String filename)
Generates a URI identifying CDX info for one harvested (W)ARC file. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HarvestDocumentation()
Method Detail |
---|
public static void documentHarvest(IngestableFiles ingestables) throws IOFailure
ingestables
- Information about the finished crawl (crawldir, jobId, harvestID).
ArgumentNotValid
- if crawlDir is null or does not exist, or if
jobID or harvestID is negative.
IOFailure
- if
- reading ARC files or temporary files fails
- writing a file to arcFilesDir failspublic static java.net.URI getCDXURI(java.lang.String harvestID, java.lang.String jobID, java.lang.String filename) throws ArgumentNotValid, UnknownID
harvestID
- The number of the harvest that generated the (W)ARC file.jobID
- The number of the job that generated the (W)ARC file.filename
- The name of the ARC or WARC file behind the cdx-data
ArgumentNotValid
- if any parameter is null.
UnknownID
- if something goes terribly wrong in our URI
construction.public static java.net.URI getAlternateCDXURI(long jobID, java.lang.String filename) throws ArgumentNotValid, UnknownID
jobID
- The number of the job that generated the ARC file.filename
- the filename.
ArgumentNotValid
- if any parameter is null.
UnknownID
- if something goes terribly wrong in our URI
construction.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |