|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdk.netarkivet.harvester.harvesting.report.AbstractHarvestReport
public abstract class AbstractHarvestReport
Base implementation for a harvest report.
Nested Class Summary | |
---|---|
static class |
AbstractHarvestReport.ProgressStatisticsConstants
Strings found in the progress-statistics.log, used to devise the default stop reason for domains. |
Constructor Summary | |
---|---|
AbstractHarvestReport()
Default constructor that does nothing. |
|
AbstractHarvestReport(HeritrixFiles files)
Constructor from Heritrix report files. |
Method Summary | |
---|---|
static StopReason |
findDefaultStopReason(java.io.File logFile)
Find out whether we stopped normally in progress statistics log. |
java.lang.Long |
getByteCount(java.lang.String domainName)
Get the number of bytes downloaded for the given domain. |
StopReason |
getDefaultStopReason()
Returns the default stop reason initially assigned to every domain. |
java.util.Set<java.lang.String> |
getDomainNames()
Returns the set of domain names that are contained in hosts-report.txt (i.e. |
protected HeritrixFiles |
getHeritrixFiles()
|
java.lang.Long |
getObjectCount(java.lang.String domainName)
Get the number of objects found for the given domain. |
protected DomainStats |
getOrCreateDomainStats(java.lang.String domainName)
Attempts to get an already existing DomainStats object for that
domain, and if not found creates one with zero values. |
StopReason |
getStopReason(java.lang.String domainName)
Get the StopReason for the given domain. |
abstract void |
postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded. |
void |
preProcess(HeritrixFiles files)
Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public AbstractHarvestReport()
public AbstractHarvestReport(HeritrixFiles files)
files
- the set of Heritrix reports.Method Detail |
---|
public void preProcess(HeritrixFiles files)
preProcess
in interface HarvestReport
public abstract void postProcess(Job job)
postProcess
in interface HarvestReport
public StopReason getDefaultStopReason()
HarvestReport
getDefaultStopReason
in interface HarvestReport
public final java.util.Set<java.lang.String> getDomainNames()
getDomainNames
in interface HarvestReport
public final java.lang.Long getObjectCount(java.lang.String domainName)
getObjectCount
in interface HarvestReport
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNamepublic final java.lang.Long getByteCount(java.lang.String domainName)
getByteCount
in interface HarvestReport
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNamepublic final StopReason getStopReason(java.lang.String domainName)
getStopReason
in interface HarvestReport
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNameprotected HeritrixFiles getHeritrixFiles()
protected DomainStats getOrCreateDomainStats(java.lang.String domainName)
DomainStats
object for that
domain, and if not found creates one with zero values.
domainName
- the name of the domain to get DomainStats for.
public static StopReason findDefaultStopReason(java.io.File logFile) throws ArgumentNotValid
logFile
- A progress-statistics.log file.
ArgumentNotValid
- on null argument.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |