|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface HarvestReport
Base interface for a post-crawl harvest report.
Method Summary | |
---|---|
java.lang.Long |
getByteCount(java.lang.String domainName)
Get the number of bytes downloaded for the given domain. |
StopReason |
getDefaultStopReason()
Returns the default stop reason initially assigned to every domain. |
java.util.Set<java.lang.String> |
getDomainNames()
Returns the set of domain names that are contained in hosts-report.txt (i.e. |
java.lang.Long |
getObjectCount(java.lang.String domainName)
Get the number of objects found for the given domain. |
StopReason |
getStopReason(java.lang.String domainName)
Get the StopReason for the given domain. |
void |
postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded. |
void |
preProcess(HeritrixFiles files)
Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload. |
Method Detail |
---|
StopReason getDefaultStopReason()
java.util.Set<java.lang.String> getDomainNames()
java.lang.Long getObjectCount(java.lang.String domainName) throws ArgumentNotValid
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNamejava.lang.Long getByteCount(java.lang.String domainName) throws ArgumentNotValid
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNameStopReason getStopReason(java.lang.String domainName) throws ArgumentNotValid
domainName
- A domain name (as given by getDomainNames())
ArgumentNotValid
- if null or empty domainNamevoid preProcess(HeritrixFiles files)
void postProcess(Job job)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |