Class AbstractHarvestReport

    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.Long getByteCount​(java.lang.String domainName)
      Get the number of bytes downloaded for the given domain.
      StopReason getDefaultStopReason()
      Returns the default stop reason initially assigned to every domain.
      java.util.Set<java.lang.String> getDomainNames()
      Returns the set of domain names that are contained in hosts-report.txt (i.e.
      java.lang.Long getObjectCount​(java.lang.String domainName)
      Get the number of objects found for the given domain.
      protected DomainStats getOrCreateDomainStats​(java.lang.String domainName)
      Attempts to get an already existing DomainStats object for that domain, and if not found creates one with zero values.
      StopReason getStopReason​(java.lang.String domainName)
      Get the StopReason for the given domain.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • AbstractHarvestReport

        public AbstractHarvestReport()
        Default constructor that does nothing. The real construction is supposed to be done in the subclasses by filling out the domainStats map with crawl results.
      • AbstractHarvestReport

        public AbstractHarvestReport​(DomainStatsReport dsr)
        Constructor from DomainStatsReports.
        Parameters:
        dsr - the result of parsing the crawl.log for domain statistics
    • Method Detail

      • getDomainNames

        public final java.util.Set<java.lang.String> getDomainNames()
        Returns the set of domain names that are contained in hosts-report.txt (i.e. host names mapped to domains)
        Specified by:
        getDomainNames in interface HarvestReport
        Returns:
        a Set of Strings
      • getObjectCount

        public final java.lang.Long getObjectCount​(java.lang.String domainName)
        Get the number of objects found for the given domain.
        Specified by:
        getObjectCount in interface HarvestReport
        Parameters:
        domainName - A domain name (as given by getDomainNames())
        Returns:
        How many objects were collected for that domain or Null if none found
      • getByteCount

        public final java.lang.Long getByteCount​(java.lang.String domainName)
        Get the number of bytes downloaded for the given domain.
        Specified by:
        getByteCount in interface HarvestReport
        Parameters:
        domainName - A domain name (as given by getDomainNames())
        Returns:
        How many bytes were collected for that domain or null if information available for this domain.
      • getStopReason

        public final StopReason getStopReason​(java.lang.String domainName)
        Get the StopReason for the given domain.
        Specified by:
        getStopReason in interface HarvestReport
        Parameters:
        domainName - A domain name (as given by getDomainNames())
        Returns:
        the StopReason for the given domain or null, if no stopreason found for this domain
        Throws:
        ArgumentNotValid - if null or empty domainName
      • getOrCreateDomainStats

        protected DomainStats getOrCreateDomainStats​(java.lang.String domainName)
        Attempts to get an already existing DomainStats object for that domain, and if not found creates one with zero values.
        Parameters:
        domainName - the name of the domain to get DomainStats for.
        Returns:
        a DomainStats object for the given domain-name.