Class PartialHarvest

  • All Implemented Interfaces:
    Named

    public class PartialHarvest
    extends HarvestDefinition
    This class contains the specific properties and operations of harvest definitions which are not snapshot harvest definitions. I.e. this class models definitions of event and selective harvests.
    • Constructor Detail

      • PartialHarvest

        public PartialHarvest​(List<DomainConfiguration> domainConfigurations,
                              Schedule schedule,
                              String harvestDefName,
                              String comments,
                              String audience)
        Create new instance of a PartialHavest configured according to the properties of the supplied DomainConfiguration.
        Parameters:
        domainConfigurations - a list of domain configurations
        schedule - the harvest definition schedule
        harvestDefName - the name of the harvest definition
        comments - comments
        audience - The intended audience for this harvest (could be null)
    • Method Detail

      • getSchedule

        public Schedule getSchedule()
        Returns the schedule defined for this harvest definition.
        Returns:
        schedule
      • setSchedule

        public void setSchedule​(Schedule schedule)
        Set the schedule to be used for this harvestdefinition.
        Parameters:
        schedule - A schedule for when to try harvesting.
      • getNextDate

        public Date getNextDate()
        Get the next date this harvest definition should be run.
        Returns:
        The next date the harvest definition should be run or null, if the harvest definition should never run again.
      • setNextDate

        public void setNextDate​(Date nextDate)
        Set the next date this harvest definition should be run.
        Parameters:
        nextDate - The next date the harvest definition should be run. May be null, meaning never again.
      • removeDomainConfiguration

        public void removeDomainConfiguration​(SparseDomainConfiguration dcKey)
        Remove domainconfiguration from this partialHarvest.
        Parameters:
        dcKey - domainConfiguration key
      • addDomainConfiguration

        public void addDomainConfiguration​(DomainConfiguration newConfiguration)
        Add a new domainconfiguration to this PartialHarvest.
        Parameters:
        newConfiguration - A new DomainConfiguration
      • getDomainConfigurationsAsList

        public Collection<DomainConfiguration> getDomainConfigurationsAsList()
        Returns:
        the domainconfigurations as a list
      • setDomainConfigurations

        public void setDomainConfigurations​(List<DomainConfiguration> configs)
        Set the list of configurations that this PartialHarvest uses.
        Parameters:
        configs - List the configurations that this harvestdefinition will use.
      • reset

        public void reset()
        Reset the harvest definition to no harvests and next date being the first possible for the schedule.
      • runNow

        public boolean runNow​(Date now)
        Check if this harvest definition should be run, given the time now.
        Specified by:
        runNow in class HarvestDefinition
        Parameters:
        now - The current time
        Returns:
        true if harvest definition should be run
      • isSnapShot

        public boolean isSnapShot()
        Returns whether this HarvestDefinition represents a snapshot harvest.
        Specified by:
        isSnapShot in class HarvestDefinition
        Returns:
        false (always)
      • getMaxBytes

        public long getMaxBytes()
        Always returns no limit.
        Specified by:
        getMaxBytes in class HarvestDefinition
        Returns:
        -1, meaning no limit.
      • addSeeds

        public Set<String> addSeeds​(Set<String> seeds,
                                    String templateName,
                                    long maxBytes,
                                    int maxObjects,
                                    Map<String,​String> attributeValues)
        Takes a seed list and creates any necessary domains, configurations, and seedlists to enable them to be harvested with the given template and other parameters. JIRA issue NAS-1317 addresses this issue. Current naming of the seedlists and domainconfigurations are: one of
        harvestdefinitionname + "_" + templateName + "_" + "UnlimitedBytes" (if maxbytes is negative)
        harvestdefinitionname + "_" + templateName + "_" + maxBytes + "Bytes" (if maxbytes is zero or postive).
        Parameters:
        seeds - a list of the seeds to be added
        templateName - the name of the template to be used
        maxBytes - Maximum number of bytes to harvest per domain
        maxObjects - Maximum number of objects to harvest per domain
        attributeValues - Attributes read from webpage
        Returns:
        the list of invalid seeds found during this process.
        See Also:
        for details
      • addSeedsFromFile

        public Set<String> addSeedsFromFile​(File seedsFile,
                                            String templateName,
                                            long maxBytes,
                                            int maxObjects,
                                            Map<String,​String> attributeValues)
        This method is a duplicate of the addSeeds method but for seedsFile parameter
        Parameters:
        seedsFile - a newline-separated File containing the seeds to be added
        templateName - the name of the template to be used
        maxBytes - Maximum number of bytes to harvest per domain
        maxObjects - Maximum number of objects to harvest per domain