Interface HarvestReport
-
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
AbstractHarvestReport
,BnfHarvestReport
,LegacyHarvestReport
public interface HarvestReport extends Serializable
Base interface for a post-crawl harvest report.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Long
getByteCount(String domainName)
Get the number of bytes downloaded for the given domain.StopReason
getDefaultStopReason()
Returns the default stop reason initially assigned to every domain.Set<String>
getDomainNames()
Returns the set of domain names that are contained in hosts-report.txt (i.e.Long
getObjectCount(String domainName)
Get the number of objects found for the given domain.StopReason
getStopReason(String domainName)
Get the StopReason for the given domain.void
postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded.
-
-
-
Method Detail
-
getDefaultStopReason
StopReason getDefaultStopReason()
Returns the default stop reason initially assigned to every domain.
-
getDomainNames
Set<String> getDomainNames()
Returns the set of domain names that are contained in hosts-report.txt (i.e. host names mapped to domains)- Returns:
- a Set of Strings
-
getObjectCount
Long getObjectCount(String domainName) throws ArgumentNotValid
Get the number of objects found for the given domain.- Parameters:
domainName
- A domain name (as given by getDomainNames())- Returns:
- How many objects were collected for that domain
- Throws:
ArgumentNotValid
- if null or empty domainName
-
getByteCount
Long getByteCount(String domainName) throws ArgumentNotValid
Get the number of bytes downloaded for the given domain.- Parameters:
domainName
- A domain name (as given by getDomainNames())- Returns:
- How many bytes were collected for that domain
- Throws:
ArgumentNotValid
- if null or empty domainName
-
getStopReason
StopReason getStopReason(String domainName) throws ArgumentNotValid
Get the StopReason for the given domain.- Parameters:
domainName
- A domain name (as given by getDomainNames())- Returns:
- the StopReason for the given domain.
- Throws:
ArgumentNotValid
- if null or empty domainName
-
postProcess
void postProcess(Job job)
Post-processing happens on the scheduler side when ARC files have been uploaded.
-
-