dk.netarkivet.harvester.datamodel
Class FullHarvest

java.lang.Object
  extended by dk.netarkivet.harvester.datamodel.HarvestDefinition
      extended by dk.netarkivet.harvester.datamodel.FullHarvest
All Implemented Interfaces:
Named

public class FullHarvest
extends HarvestDefinition

This class contains the specific properties and operations of snapshot harvest definitions.


Field Summary
 
Fields inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
comments, edition, harvestDefName, isActive, numEvents, oid, submissionDate
 
Constructor Summary
FullHarvest(java.lang.String harvestDefName, java.lang.String comments, java.lang.Long previousHarvestDefinitionOid, long maxCountObjects, long maxBytes, long maxJobRunningTime, boolean isIndexReady)
          Create new instance of FullHarvest configured according to the properties of the supplied DomainConfiguration.
 
Method Summary
 java.util.Iterator<DomainConfiguration> getDomainConfigurations()
          Returns an iterator of domain configurations for this harvest definition.
 boolean getIndexReady()
          Is index ready.
 long getMaxBytes()
          Get the maximum number of bytes that this fullharvest will harvest per domain, 0 for no limit.
 long getMaxCountObjects()
          Returns how many objects to harvest per domain, or 0 for no limit.
 long getMaxJobRunningTime()
           
protected  Job getNewJob(DomainConfiguration cfg)
          Get a new Job suited for this type of HarvestDefinition.
 HarvestDefinition getPreviousHarvestDefinition()
          Get the previous HarvestDefinition which is used to base this.
 boolean isSnapShot()
          Returns whether this HarvestDefinition represents a snapshot harvest.
 boolean runNow(java.util.Date now)
          Check if this harvest definition should be run, given the time now.
 void setIndexReady(boolean isIndexReady)
          Set the indexReady field.
 void setMaxBytes(long maxBytes)
          Set the limit for how many bytes this fullharvest will harvest per domain, or -1 for no limit.
 void setMaxCountObjects(long maxCountObjects)
           
 void setMaxJobRunningTime(long maxJobRunningtime)
          Set the limit for how many seconds each crawljob in this fullharvest will run, or 0 for no limit.
 void setPreviousHarvestDefinition(java.lang.Long prev)
          Set the previous HarvestDefinition which is used to base this.
 
Methods inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
createFullHarvest, createJobs, createPartialHarvest, equals, getActive, getComments, getEdition, getName, getNumEvents, getOid, getSubmissionDate, hashCode, hasID, makeJobs, setActive, setComments, setEdition, setNumEvents, setOid, setSubmissionDate, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

FullHarvest

public FullHarvest(java.lang.String harvestDefName,
                   java.lang.String comments,
                   java.lang.Long previousHarvestDefinitionOid,
                   long maxCountObjects,
                   long maxBytes,
                   long maxJobRunningTime,
                   boolean isIndexReady)
Create new instance of FullHarvest configured according to the properties of the supplied DomainConfiguration.

Parameters:
harvestDefName - the name of the harvest definition
comments - comments
previousHarvestDefinitionOid - This harvestDefinition is used to create this Fullharvest definition.
maxCountObjects - Limit for how many objects can be fetched per domain
maxBytes - Limit for how many bytes can be fetched per domain
maxJobRunningTime - Limit on how much time can be spent on each job. 0 means no limit
isIndexReady - Is the deduplication index ready for this harvest.
Method Detail

getNewJob

protected Job getNewJob(DomainConfiguration cfg)
Get a new Job suited for this type of HarvestDefinition.

Specified by:
getNewJob in class HarvestDefinition
Parameters:
cfg - The configuration to use when creating the job
Returns:
a new job

getPreviousHarvestDefinition

public HarvestDefinition getPreviousHarvestDefinition()
Get the previous HarvestDefinition which is used to base this.

Returns:
The previous HarvestDefinition

setPreviousHarvestDefinition

public void setPreviousHarvestDefinition(java.lang.Long prev)
Set the previous HarvestDefinition which is used to base this.

Parameters:
prev - The id of a HarvestDefinition

getMaxCountObjects

public long getMaxCountObjects()
Description copied from class: HarvestDefinition
Returns how many objects to harvest per domain, or 0 for no limit.

Specified by:
getMaxCountObjects in class HarvestDefinition
Returns:
Returns the maxCountObjects.

setMaxCountObjects

public void setMaxCountObjects(long maxCountObjects)
Parameters:
maxCountObjects - The maxCountObjects to set.

getMaxBytes

public long getMaxBytes()
Get the maximum number of bytes that this fullharvest will harvest per domain, 0 for no limit.

Specified by:
getMaxBytes in class HarvestDefinition
Returns:
Total download limit in bytes per domain.

setMaxBytes

public void setMaxBytes(long maxBytes)
Set the limit for how many bytes this fullharvest will harvest per domain, or -1 for no limit.

Parameters:
maxBytes - Number of bytes to stop harvesting at.

getDomainConfigurations

public java.util.Iterator<DomainConfiguration> getDomainConfigurations()
Returns an iterator of domain configurations for this harvest definition. Domains are filtered out if, on the previous harvest, they: 1) were completed 2) reached their maxBytes limit (and the maxBytes limit has not changed since time of harvest) 3) reached their maxObjects limit (and the maxObjects limit has not changed since time of harvest) 4) died uncleanly (e.g. due to a manual shutdown of heritrix) on their last harvest. Domains are also excluded if they are aliases of another domain.

Specified by:
getDomainConfigurations in class HarvestDefinition
Returns:
Iterator containing information about the domain configurations

runNow

public boolean runNow(java.util.Date now)
Check if this harvest definition should be run, given the time now.

Specified by:
runNow in class HarvestDefinition
Parameters:
now - The current time
Returns:
true if harvest definition should be run

isSnapShot

public boolean isSnapShot()
Returns whether this HarvestDefinition represents a snapshot harvest.

Specified by:
isSnapShot in class HarvestDefinition
Returns:
Returns true

getMaxJobRunningTime

public long getMaxJobRunningTime()
Returns:
Returns the max job running time

setMaxJobRunningTime

public void setMaxJobRunningTime(long maxJobRunningtime)
Set the limit for how many seconds each crawljob in this fullharvest will run, or 0 for no limit.

Parameters:
maxJobRunningtime - max number of seconds

getIndexReady

public boolean getIndexReady()
Is index ready. Used to check, whether or a FullHarvest is ready for scheduling. The scheduling requires, that the deduplication index used by the jobs in the FullHarvest, has allready been prepared by the IndexServer.

Returns:
true, if the deduplication index is ready. Otherwise false.

setIndexReady

public void setIndexReady(boolean isIndexReady)
Set the indexReady field.

Parameters:
isIndexReady - The new value of the indexReady field.