Class FullHarvest
- java.lang.Object
-
- dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
-
- dk.netarkivet.harvester.datamodel.HarvestDefinition
-
- dk.netarkivet.harvester.datamodel.FullHarvest
-
- All Implemented Interfaces:
Named
public class FullHarvest extends HarvestDefinition
This class contains the specific properties and operations of snapshot harvest definitions.
-
-
Field Summary
-
Fields inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
audience, channelId, comments, edition, harvestDefName, isActive, numEvents, oid, submissionDate
-
Fields inherited from class dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
extendedFieldValues
-
-
Constructor Summary
Constructors Constructor Description FullHarvest(String harvestDefName, String comments, Long previousHarvestDefinitionOid, long maxCountObjects, long maxBytes, long maxJobRunningTime, boolean isIndexReady, javax.inject.Provider<HarvestDefinitionDAO> hdDaoProvider, javax.inject.Provider<JobDAO> jobDaoProvider, javax.inject.Provider<ExtendedFieldDAO> extendedFieldDAOProvide, javax.inject.Provider<DomainDAO> domainDAOProvider)
Create new instance of FullHarvest configured according to the properties of the supplied DomainConfiguration.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Iterator<DomainConfiguration>
getDomainConfigurations()
Returns an iterator of domain configurations for this harvest definition.Iterator<DomainConfiguration>
getDomainConfigurationsForIterativeHarvest()
boolean
getIndexReady()
Is index ready.long
getMaxBytes()
Get the maximum number of bytes that this fullharvest will harvest per domain, 0 for no limit.long
getMaxCountObjects()
Returns how many objects to harvest per domain, or 0 for no limit.long
getMaxJobRunningTime()
HarvestDefinition
getPreviousHarvestDefinition()
Get the previous HarvestDefinition which is used to base this.boolean
isSnapShot()
Returns whether this HarvestDefinition represents a snapshot harvest.boolean
runNow(Date now)
Check if this harvest definition should be run, given the time now.void
setIndexReady(boolean isIndexReady)
Set the indexReady field.void
setMaxBytes(long maxBytes)
Set the limit for how many bytes this fullharvest will harvest per domain, or -1 for no limit.void
setMaxCountObjects(long maxCountObjects)
void
setMaxJobRunningTime(long maxJobRunningtime)
Set the limit for how many seconds each crawljob in this fullharvest will run, or 0 for no limit.void
setPreviousHarvestDefinition(Long prev)
Set the previous HarvestDefinition which is used to base this.-
Methods inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
createFullHarvest, createPartialHarvest, equals, getActive, getAudience, getChannelId, getComments, getEdition, getExtendedFieldType, getName, getNumEvents, getOid, getSubmissionDate, hashCode, setActive, setAudience, setChannelId, setComments, setEdition, setName, setNumEvents, setOid, setSubmissionDate, toString
-
Methods inherited from class dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
addExtendedFieldValue, addExtendedFieldValues, getExtendedFieldValue, getExtendedFieldValues, setExtendedFieldValues, updateExtendedFieldValue
-
-
-
-
Constructor Detail
-
FullHarvest
public FullHarvest(String harvestDefName, String comments, Long previousHarvestDefinitionOid, long maxCountObjects, long maxBytes, long maxJobRunningTime, boolean isIndexReady, javax.inject.Provider<HarvestDefinitionDAO> hdDaoProvider, javax.inject.Provider<JobDAO> jobDaoProvider, javax.inject.Provider<ExtendedFieldDAO> extendedFieldDAOProvide, javax.inject.Provider<DomainDAO> domainDAOProvider)
Create new instance of FullHarvest configured according to the properties of the supplied DomainConfiguration. Should only be used by the HarvestFactory class.- Parameters:
harvestDefName
- the name of the harvest definitioncomments
- commentspreviousHarvestDefinitionOid
- This harvestDefinition is used to create this Fullharvest definition.maxCountObjects
- Limit for how many objects can be fetched per domainmaxBytes
- Limit for how many bytes can be fetched per domainmaxJobRunningTime
- Limit on how much time can be spent on each job. 0 means no limitisIndexReady
- Is the deduplication index ready for this harvest.
-
-
Method Detail
-
getPreviousHarvestDefinition
public HarvestDefinition getPreviousHarvestDefinition()
Get the previous HarvestDefinition which is used to base this.- Returns:
- The previous HarvestDefinition
-
setPreviousHarvestDefinition
public void setPreviousHarvestDefinition(Long prev)
Set the previous HarvestDefinition which is used to base this.- Parameters:
prev
- The id of a HarvestDefinition
-
getMaxCountObjects
public long getMaxCountObjects()
Description copied from class:HarvestDefinition
Returns how many objects to harvest per domain, or 0 for no limit.- Specified by:
getMaxCountObjects
in classHarvestDefinition
- Returns:
- Returns the maxCountObjects.
-
setMaxCountObjects
public void setMaxCountObjects(long maxCountObjects)
- Parameters:
maxCountObjects
- The maxCountObjects to set.
-
getMaxBytes
public long getMaxBytes()
Get the maximum number of bytes that this fullharvest will harvest per domain, 0 for no limit.- Specified by:
getMaxBytes
in classHarvestDefinition
- Returns:
- Total download limit in bytes per domain.
-
setMaxBytes
public void setMaxBytes(long maxBytes)
Set the limit for how many bytes this fullharvest will harvest per domain, or -1 for no limit.- Parameters:
maxBytes
- Number of bytes to stop harvesting at.
-
getDomainConfigurations
public Iterator<DomainConfiguration> getDomainConfigurations()
Returns an iterator of domain configurations for this harvest definition. Domains are filtered out if, on the previous harvest, they: 1) were completed 2) reached their maxBytes limit (and the maxBytes limit has not changed since time of harvest) 3) reached their maxObjects limit (and the maxObjects limit has not changed since time of harvest) 4) died uncleanly (e.g. due to a manual shutdown of heritrix) on their last harvest.Domains are also excluded if they are aliases of another domain.
- Specified by:
getDomainConfigurations
in classHarvestDefinition
- Returns:
- Iterator containing information about the domain configurations
-
getDomainConfigurationsForIterativeHarvest
public Iterator<DomainConfiguration> getDomainConfigurationsForIterativeHarvest()
- Returns:
- a iterator of DomainConfigurations not finished in previous SnapShot harvest
-
runNow
public boolean runNow(Date now)
Check if this harvest definition should be run, given the time now.- Specified by:
runNow
in classHarvestDefinition
- Parameters:
now
- The current time- Returns:
- true if harvest definition should be run
-
isSnapShot
public boolean isSnapShot()
Returns whether this HarvestDefinition represents a snapshot harvest.- Specified by:
isSnapShot
in classHarvestDefinition
- Returns:
- Returns true
-
getMaxJobRunningTime
public long getMaxJobRunningTime()
- Returns:
- Returns the max job running time
-
setMaxJobRunningTime
public void setMaxJobRunningTime(long maxJobRunningtime)
Set the limit for how many seconds each crawljob in this fullharvest will run, or 0 for no limit.- Parameters:
maxJobRunningtime
- max number of seconds
-
getIndexReady
public boolean getIndexReady()
Is index ready. Used to check, whether or a FullHarvest is ready for scheduling. The scheduling requires, that the deduplication index used by the jobs in the FullHarvest, has already been prepared by the IndexServer.- Returns:
- true, if the deduplication index is ready. Otherwise false.
-
setIndexReady
public void setIndexReady(boolean isIndexReady)
Set the indexReady field.- Parameters:
isIndexReady
- The new value of the indexReady field.
-
-