|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object dk.netarkivet.harvester.harvesting.report.AbstractHarvestReport
public abstract class AbstractHarvestReport
Base implementation for a harvest report.
Nested Class Summary | |
---|---|
static class |
AbstractHarvestReport.ProgressStatisticsConstants
Strings found in the progress-statistics.log, used to devise the default stop reason for domains. |
Constructor Summary | |
---|---|
AbstractHarvestReport()
Default constructor that does nothing. |
|
AbstractHarvestReport(HeritrixFiles files)
Constructor from Heritrix report files. |
Method Summary | |
---|---|
static StopReason |
findDefaultStopReason(java.io.File logFile)
Find out whether we stopped normally in progress statistics log. |
java.lang.Long |
getByteCount(java.lang.String domainName)
Get the number of bytes downloaded for the given domain. |
StopReason |
getDefaultStopReason()
Returns the default stop reason initially assigned to every domain. |
java.util.Set<java.lang.String> |
getDomainNames()
Returns the set of domain names that are contained in hosts-report.txt (i.e. |
protected HeritrixFiles |
getHeritrixFiles()
|
java.lang.Long |
getObjectCount(java.lang.String domainName)
Get the number of objects found for the given domain. |
protected DomainStats |
getOrCreateDomainStats(java.lang.String domainName)
Attempts to get an already existing DomainStats object for that
domain, and if not found creates one with zero values. |
StopReason |
getStopReason(java.lang.String domainName)
Get the StopReason for the given domain. |
abstract void |
postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded. |
void |
preProcess(HeritrixFiles files)
Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public AbstractHarvestReport()
public AbstractHarvestReport(HeritrixFiles files)
files
- the set of Heritrix reports.Method Detail |
---|
public void preProcess(HeritrixFiles files)
preProcess
in interface HarvestReport
public abstract void postProcess(Job job)
postProcess
in interface HarvestReport
public StopReason getDefaultStopReason()
HarvestReport
getDefaultStopReason
in interface HarvestReport
public final java.util.Set<java.lang.String> getDomainNames()
getDomainNames
in interface HarvestReport
public final java.lang.Long getObjectCount(java.lang.String domainName)
getObjectCount
in interface HarvestReport
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNamepublic final java.lang.Long getByteCount(java.lang.String domainName)
getByteCount
in interface HarvestReport
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNamepublic final StopReason getStopReason(java.lang.String domainName)
getStopReason
in interface HarvestReport
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNameprotected HeritrixFiles getHeritrixFiles()
protected DomainStats getOrCreateDomainStats(java.lang.String domainName)
DomainStats
object for that
domain, and if not found creates one with zero values.
public static StopReason findDefaultStopReason(java.io.File logFile) throws ArgumentNotValid
logFile
- A progress-statistics.log file.
ArgumentNotValid
- on null argument.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |