Class HarvestReportGenerator
- java.lang.Object
-
- dk.netarkivet.harvester.heritrix3.report.HarvestReportGenerator
-
public class HarvestReportGenerator extends Object
Base implementation for a harvest report.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
HarvestReportGenerator.ProgressStatisticsConstants
Strings found in the progress-statistics.log, used to devise the default stop reason for domains.
-
Constructor Summary
Constructors Constructor Description HarvestReportGenerator()
Default constructor that does nothing.HarvestReportGenerator(Heritrix3Files files)
Constructor from Heritrix report files.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static StopReason
findDefaultStopReason(File logFile)
Find out whether we stopped normally in progress statistics log.StopReason
getDefaultStopReason()
Map<String,DomainStats>
getDomainStatsMap()
static DomainStatsReport
getDomainStatsReport(Heritrix3Files files)
protected DomainStats
getOrCreateDomainStats(String domainName)
Attempts to get an already existingDomainStats
object for that domain, and if not found creates one with zero values.void
preProcess(Heritrix3Files files)
Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload.
-
-
-
Constructor Detail
-
HarvestReportGenerator
public HarvestReportGenerator()
Default constructor that does nothing. The real construction is supposed to be done in the subclasses by filling out the domainStats map with crawl results.
-
HarvestReportGenerator
public HarvestReportGenerator(Heritrix3Files files)
Constructor from Heritrix report files. Subclasses might use a different set of Heritrix reports.- Parameters:
files
- the set of Heritrix reports.
-
-
Method Detail
-
preProcess
public void preProcess(Heritrix3Files files)
Pre-processing happens when the report is built just at the end of the crawl, before the ARC files upload.
-
getOrCreateDomainStats
protected DomainStats getOrCreateDomainStats(String domainName)
Attempts to get an already existingDomainStats
object for that domain, and if not found creates one with zero values.- Parameters:
domainName
- the name of the domain to get DomainStats for.- Returns:
- a DomainStats object for the given domain-name.
-
findDefaultStopReason
public static StopReason findDefaultStopReason(File logFile)
Find out whether we stopped normally in progress statistics log.- Parameters:
logFile
- A progress-statistics.log file.- Returns:
- StopReason.DOWNLOAD_COMPLETE for progress statistics ending with CRAWL ENDED, StopReason.DOWNLOAD_UNFINISHED otherwise or if file does not exist.
-
getDefaultStopReason
public StopReason getDefaultStopReason()
- Returns:
- the default stop reason.
-
getDomainStatsMap
public Map<String,DomainStats> getDomainStatsMap()
- Returns:
- the domainStatsMap generated from parsing the crawl-log.
-
getDomainStatsReport
public static DomainStatsReport getDomainStatsReport(Heritrix3Files files)
- Parameters:
files
- A set of Heritrix3 files used to produce a a HarvestReport.- Returns:
- a DomainStatsReport for a specific H3 crawl.
-
-