public class LegacyHarvestReport extends AbstractHarvestReport
Constructor and Description |
---|
LegacyHarvestReport()
Default constructor.
|
LegacyHarvestReport(DomainStatsReport dsr)
The constructor gets the data in a crawl.log file, and parses the file.
|
Modifier and Type | Method and Description |
---|---|
void |
postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded.
|
getByteCount, getDefaultStopReason, getDomainNames, getObjectCount, getOrCreateDomainStats, getStopReason
public LegacyHarvestReport(DomainStatsReport dsr)
Each url listed in the file is assigned to a domain, the total object count and byte count per domain is calculated. Finally, a StopReason is found for each domain: When the response is CrawlURI.S_BLOCKED_BY_QUOTA ( currently = -5003), the StopReason is set to StopReason.SIZE_LIMIT, if the annotation equals "Q:group-max-all-kb" or StopReason.OBJECT_LIMIT, if the annotation equals "Q:group-max-fetch-successes".
hFiles
- the Heritrix reports and logs.public LegacyHarvestReport()
public void postProcess(Job job)
job
- the actual job.Copyright © 2005–2015 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.