Class PartialHarvest
- java.lang.Object
-
- dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
-
- dk.netarkivet.harvester.datamodel.HarvestDefinition
-
- dk.netarkivet.harvester.datamodel.PartialHarvest
-
- All Implemented Interfaces:
Named
public class PartialHarvest extends HarvestDefinition
This class contains the specific properties and operations of harvest definitions which are not snapshot harvest definitions. I.e. this class models definitions of event and selective harvests.
-
-
Field Summary
-
Fields inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
audience, channelId, comments, edition, harvestDefName, isActive, numEvents, oid, submissionDate
-
Fields inherited from class dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
extendedFieldValues
-
-
Constructor Summary
Constructors Constructor Description PartialHarvest(List<DomainConfiguration> domainConfigurations, Schedule schedule, String harvestDefName, String comments, String audience)
Create new instance of a PartialHavest configured according to the properties of the supplied DomainConfiguration.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addDomainConfiguration(DomainConfiguration newConfiguration)
Add a new domainconfiguration to this PartialHarvest.Set<String>
addSeeds(Set<String> seeds, String templateName, long maxBytes, int maxObjects, Map<String,String> attributeValues)
Takes a seed list and creates any necessary domains, configurations, and seedlists to enable them to be harvested with the given template and other parameters.Set<String>
addSeedsFromFile(File seedsFile, String templateName, long maxBytes, int maxObjects, Map<String,String> attributeValues)
This method is a duplicate of the addSeeds method but for seedsFile parameterIterator<DomainConfiguration>
getDomainConfigurations()
Returns a List of domain configurations for this harvest definition.Collection<DomainConfiguration>
getDomainConfigurationsAsList()
long
getMaxBytes()
Always returns no limit.long
getMaxCountObjects()
Always returns no limit.Date
getNextDate()
Get the next date this harvest definition should be run.Schedule
getSchedule()
Returns the schedule defined for this harvest definition.boolean
isSnapShot()
Returns whether this HarvestDefinition represents a snapshot harvest.void
removeDomainConfiguration(SparseDomainConfiguration dcKey)
Remove domainconfiguration from this partialHarvest.void
reset()
Reset the harvest definition to no harvests and next date being the first possible for the schedule.boolean
runNow(Date now)
Check if this harvest definition should be run, given the time now.void
setDomainConfigurations(List<DomainConfiguration> configs)
Set the list of configurations that this PartialHarvest uses.void
setNextDate(Date nextDate)
Set the next date this harvest definition should be run.void
setSchedule(Schedule schedule)
Set the schedule to be used for this harvestdefinition.-
Methods inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
createFullHarvest, createPartialHarvest, equals, getActive, getAudience, getChannelId, getComments, getEdition, getExtendedFieldType, getName, getNumEvents, getOid, getSubmissionDate, hashCode, setActive, setAudience, setChannelId, setComments, setEdition, setName, setNumEvents, setOid, setSubmissionDate, toString
-
Methods inherited from class dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
addExtendedFieldValue, addExtendedFieldValues, getExtendedFieldValue, getExtendedFieldValues, setExtendedFieldValues, updateExtendedFieldValue
-
-
-
-
Constructor Detail
-
PartialHarvest
public PartialHarvest(List<DomainConfiguration> domainConfigurations, Schedule schedule, String harvestDefName, String comments, String audience)
Create new instance of a PartialHavest configured according to the properties of the supplied DomainConfiguration.- Parameters:
domainConfigurations
- a list of domain configurationsschedule
- the harvest definition scheduleharvestDefName
- the name of the harvest definitioncomments
- commentsaudience
- The intended audience for this harvest (could be null)
-
-
Method Detail
-
getSchedule
public Schedule getSchedule()
Returns the schedule defined for this harvest definition.- Returns:
- schedule
-
setSchedule
public void setSchedule(Schedule schedule)
Set the schedule to be used for this harvestdefinition.- Parameters:
schedule
- A schedule for when to try harvesting.
-
getNextDate
public Date getNextDate()
Get the next date this harvest definition should be run.- Returns:
- The next date the harvest definition should be run or null, if the harvest definition should never run again.
-
setNextDate
public void setNextDate(Date nextDate)
Set the next date this harvest definition should be run.- Parameters:
nextDate
- The next date the harvest definition should be run. May be null, meaning never again.
-
removeDomainConfiguration
public void removeDomainConfiguration(SparseDomainConfiguration dcKey)
Remove domainconfiguration from this partialHarvest.- Parameters:
dcKey
- domainConfiguration key
-
addDomainConfiguration
public void addDomainConfiguration(DomainConfiguration newConfiguration)
Add a new domainconfiguration to this PartialHarvest.- Parameters:
newConfiguration
- A new DomainConfiguration
-
getDomainConfigurations
public Iterator<DomainConfiguration> getDomainConfigurations()
Returns a List of domain configurations for this harvest definition.- Specified by:
getDomainConfigurations
in classHarvestDefinition
- Returns:
- List containing information about the domain configurations
-
getDomainConfigurationsAsList
public Collection<DomainConfiguration> getDomainConfigurationsAsList()
- Returns:
- the domainconfigurations as a list
-
setDomainConfigurations
public void setDomainConfigurations(List<DomainConfiguration> configs)
Set the list of configurations that this PartialHarvest uses.- Parameters:
configs
- Listthe configurations that this harvestdefinition will use.
-
reset
public void reset()
Reset the harvest definition to no harvests and next date being the first possible for the schedule.
-
runNow
public boolean runNow(Date now)
Check if this harvest definition should be run, given the time now.- Specified by:
runNow
in classHarvestDefinition
- Parameters:
now
- The current time- Returns:
- true if harvest definition should be run
-
isSnapShot
public boolean isSnapShot()
Returns whether this HarvestDefinition represents a snapshot harvest.- Specified by:
isSnapShot
in classHarvestDefinition
- Returns:
- false (always)
-
getMaxCountObjects
public long getMaxCountObjects()
Always returns no limit.- Specified by:
getMaxCountObjects
in classHarvestDefinition
- Returns:
- 0, meaning no limit.
-
getMaxBytes
public long getMaxBytes()
Always returns no limit.- Specified by:
getMaxBytes
in classHarvestDefinition
- Returns:
- -1, meaning no limit.
-
addSeeds
public Set<String> addSeeds(Set<String> seeds, String templateName, long maxBytes, int maxObjects, Map<String,String> attributeValues)
Takes a seed list and creates any necessary domains, configurations, and seedlists to enable them to be harvested with the given template and other parameters. JIRA issue NAS-1317 addresses this issue. Current naming of the seedlists and domainconfigurations are: one of
harvestdefinitionname + "_" + templateName + "_" + "UnlimitedBytes" (if maxbytes is negative)
harvestdefinitionname + "_" + templateName + "_" + maxBytes + "Bytes" (if maxbytes is zero or postive).- Parameters:
seeds
- a list of the seeds to be addedtemplateName
- the name of the template to be usedmaxBytes
- Maximum number of bytes to harvest per domainmaxObjects
- Maximum number of objects to harvest per domainattributeValues
- Attributes read from webpage- Returns:
- the list of invalid seeds found during this process.
- See Also:
for details
-
addSeedsFromFile
public Set<String> addSeedsFromFile(File seedsFile, String templateName, long maxBytes, int maxObjects, Map<String,String> attributeValues)
This method is a duplicate of the addSeeds method but for seedsFile parameter- Parameters:
seedsFile
- a newline-separated File containing the seeds to be addedtemplateName
- the name of the template to be usedmaxBytes
- Maximum number of bytes to harvest per domainmaxObjects
- Maximum number of objects to harvest per domain
-
-