dk.netarkivet.harvester.harvesting.report
Interface HarvestReport

All Superinterfaces:
java.io.Serializable
All Known Implementing Classes:
AbstractHarvestReport, BnfHarvestReport, LegacyHarvestReport

public interface HarvestReport
extends java.io.Serializable

Base interface for a post-crawl harvest report.


Method Summary
 java.lang.Long getByteCount(java.lang.String domainName)
          Get the number of bytes downloaded for the given domain.
 StopReason getDefaultStopReason()
          Returns the default stop reason initially assigned to every domain.
 java.util.Set<java.lang.String> getDomainNames()
          Returns the set of domain names that are contained in hosts-report.txt (i.e.
 java.lang.Long getObjectCount(java.lang.String domainName)
          Get the number of objects found for the given domain.
 StopReason getStopReason(java.lang.String domainName)
          Get the StopReason for the given domain.
 void postProcess(Job job)
          Post-processing happens on the scheduler side when ARC files have been uploaded.
 void preProcess(HeritrixFiles files)
          Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload.
 

Method Detail

getDefaultStopReason

StopReason getDefaultStopReason()
Returns the default stop reason initially assigned to every domain.


getDomainNames

java.util.Set<java.lang.String> getDomainNames()
Returns the set of domain names that are contained in hosts-report.txt (i.e. host names mapped to domains)

Returns:
a Set of Strings

getObjectCount

java.lang.Long getObjectCount(java.lang.String domainName)
                              throws ArgumentNotValid
Get the number of objects found for the given domain.

Parameters:
domainName - A domain name (as given by getDomainNames())
Returns:
How many objects were collected for that domain
Throws:
ArgumentNotValid - if null or empty domainName

getByteCount

java.lang.Long getByteCount(java.lang.String domainName)
                            throws ArgumentNotValid
Get the number of bytes downloaded for the given domain.

Parameters:
domainName - A domain name (as given by getDomainNames())
Returns:
How many bytes were collected for that domain
Throws:
ArgumentNotValid - if null or empty domainName

getStopReason

StopReason getStopReason(java.lang.String domainName)
                         throws ArgumentNotValid
Get the StopReason for the given domain.

Parameters:
domainName - A domain name (as given by getDomainNames())
Returns:
the StopReason for the given domain.
Throws:
ArgumentNotValid - if null or empty domainName

preProcess

void preProcess(HeritrixFiles files)
Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload.


postProcess

void postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded.