dk.netarkivet.harvester.datamodel
Class PartialHarvest

java.lang.Object
  extended by dk.netarkivet.harvester.datamodel.HarvestDefinition
      extended by dk.netarkivet.harvester.datamodel.PartialHarvest
All Implemented Interfaces:
Named

public class PartialHarvest
extends HarvestDefinition

This class contains the specific properties and operations of harvest definitions which are not snapshot harvest definitions. I.e. this class models definitions of event and selective harvests.


Field Summary
 
Fields inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
comments, edition, harvestDefName, isActive, numEvents, oid, submissionDate
 
Constructor Summary
PartialHarvest(java.util.List<DomainConfiguration> domainConfigurations, Schedule schedule, java.lang.String harvestDefName, java.lang.String comments)
          Create new instance of a PartialHavest configured according to the properties of the supplied DomainConfiguration.
 
Method Summary
 void addSeeds(java.lang.String seeds, java.lang.String templateName, long maxBytes, int maxObjects)
          Takes a seed list and creates any necessary domains, configurations, and seedlists to enable them to be harvested with the given template and other parameters.
 int createJobs()
          Generates jobs in files from this harvest definition, and updates the schedule for when the harvest definition should happen next time.
 java.util.Iterator<DomainConfiguration> getDomainConfigurations()
          Returns a List of domain configurations for this harvest definition.
protected  long getMaxBytes()
          Always returns no limit.
protected  long getMaxCountObjects()
          Always returns no limit.
protected  Job getNewJob(DomainConfiguration cfg)
          Get a new Job suited for this type of HarvestDefinition.
 java.util.Date getNextDate()
          Get the next date this harvest definition should be run.
 Schedule getSchedule()
          Returns the schedule defined for this harvest definition.
 boolean isSnapShot()
          Returns whether this HarvestDefinition represents a snapshot harvest.
 void reset()
          Reset the harvest definition to no harvests and next date being the first possible for the schedule.
 boolean runNow(java.util.Date now)
          Check if this harvest definition should be run, given the time now.
 void setDomainConfigurations(java.util.List<DomainConfiguration> configs)
          Set the list of configurations that this hd uses.
 void setNextDate(java.util.Date nextDate)
          Set the next date this harvest definition should be run.
 void setSchedule(Schedule schedule)
          Set the schedule to be used for this harvestdefinition.
 
Methods inherited from class dk.netarkivet.harvester.datamodel.HarvestDefinition
createFullHarvest, createPartialHarvest, equals, getActive, getComments, getEdition, getName, getNumEvents, getOid, getSubmissionDate, hashCode, hasID, makeJobs, setActive, setComments, setEdition, setNumEvents, setOid, setSubmissionDate, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PartialHarvest

public PartialHarvest(java.util.List<DomainConfiguration> domainConfigurations,
                      Schedule schedule,
                      java.lang.String harvestDefName,
                      java.lang.String comments)
Create new instance of a PartialHavest configured according to the properties of the supplied DomainConfiguration.

Parameters:
domainConfigurations - a list of domain configurations
schedule - the harvest definition schedule
harvestDefName - the name of the harvest definition
comments - comments
Method Detail

createJobs

public int createJobs()
Generates jobs in files from this harvest definition, and updates the schedule for when the harvest definition should happen next time. Create Jobs from the domainconfigurations in this harvestdefinition and the current value of the limits in Settings. Multiple jobs are generated if different order.xml-templates are used, or if the size of the job is inappropriate. The following settings are used: HarvesterSettings.JOBS_MAX_RELATIVE_SIZE_DIFFERENCE: The maximum relative difference between the smallest and largest number of objects expected in a job

HarvesterSettings.JOBS_MIN_ABSOLUTE_SIZE_DIFFERENCE: Size differences below this threshold are ignored even if the relative difference exceeds HarvesterSettings.JOBS_MAX_RELATIVE_SIZE_DIFFERENCE

HarvesterSettings.JOBS_MAX_TOTAL_JOBSIZE: The upper limit on the total number of objects that a job may retrieve Also updates the harvest definition to schedule the next event using the defined schedule. Will skip events if the next event would be in the past when using the schedule definition.

Overrides:
createJobs in class HarvestDefinition
Returns:
Number of jobs created

getNewJob

protected Job getNewJob(DomainConfiguration cfg)
Get a new Job suited for this type of HarvestDefinition.

Specified by:
getNewJob in class HarvestDefinition
Parameters:
cfg - The configuration to use when creating the job
Returns:
a new job

getSchedule

public Schedule getSchedule()
Returns the schedule defined for this harvest definition.

Returns:
schedule

setSchedule

public void setSchedule(Schedule schedule)
Set the schedule to be used for this harvestdefinition.

Parameters:
schedule - A schedule for when to try harvesting.

getNextDate

public java.util.Date getNextDate()
Get the next date this harvest definition should be run.

Returns:
The next date the harvest definition should be run or null, if the harvest definition should never run again.

setNextDate

public void setNextDate(java.util.Date nextDate)
Set the next date this harvest definition should be run.

Parameters:
nextDate - The next date the harvest definition should be run. May be null, meaning never again.

getDomainConfigurations

public java.util.Iterator<DomainConfiguration> getDomainConfigurations()
Returns a List of domain configurations for this harvest definition.

Specified by:
getDomainConfigurations in class HarvestDefinition
Returns:
List containing information about the domain configurations

setDomainConfigurations

public void setDomainConfigurations(java.util.List<DomainConfiguration> configs)
Set the list of configurations that this hd uses.

Parameters:
configs - List the configurations that this harvestdefinition will use.

reset

public void reset()
Reset the harvest definition to no harvests and next date being the first possible for the schedule.


runNow

public boolean runNow(java.util.Date now)
Check if this harvest definition should be run, given the time now.

Specified by:
runNow in class HarvestDefinition
Parameters:
now - The current time
Returns:
true if harvest definition should be run

isSnapShot

public boolean isSnapShot()
Returns whether this HarvestDefinition represents a snapshot harvest.

Specified by:
isSnapShot in class HarvestDefinition
Returns:
false (always)

getMaxCountObjects

protected long getMaxCountObjects()
Always returns no limit.

Specified by:
getMaxCountObjects in class HarvestDefinition
Returns:
0, meaning no limit.

getMaxBytes

protected long getMaxBytes()
Always returns no limit.

Specified by:
getMaxBytes in class HarvestDefinition
Returns:
-1, meaning no limit.

addSeeds

public void addSeeds(java.lang.String seeds,
                     java.lang.String templateName,
                     long maxBytes,
                     int maxObjects)
Takes a seed list and creates any necessary domains, configurations, and seedlists to enable them to be harvested with the given template and other parameters. Bug 717 addresses this issue. Current naming of the seedlists and domainconfigurations are: one of
harvestdefinitionname + "_" + templateName + "_" + "UnlimitedBytes" (if maxbytes is negative)
harvestdefinitionname + "_" + templateName + "_" + maxBytes + "Bytes" (if maxbytes is zero or postive).

Parameters:
seeds - a newline-separated list of the seeds to be added
templateName - the name of the template to be used
maxBytes - Maximum number of bytes to harvest per domain
maxObjects - Maximum number of objects to harvest per domain
See Also:
for details