Class Job
- java.lang.Object
-
- dk.netarkivet.harvester.datamodel.Job
-
- All Implemented Interfaces:
JobInfo
,Serializable
public class Job extends Object implements Serializable, JobInfo
This class represents one job to run by Heritrix. It's based on a number of configurations all based on the same order.xml and at most one configuration for each domain. Each job consists of configurations of the approximate same size; that is the difference in expectation from the smallest configuration to the largest configuration is within a factor of each other defined as limMaxRelSize (although differences smaller than limMinAbsSize are ignored) There is a limit limMaxTotalSize on the total size of the job in objects.A job may also be limited on bytes or objects, defined either by the configurations in the job or the harvest definition the job is generated by.
The job contains the order file, the seedlist and the current status of the job, as well as the ID of the harvest definition that defined it and names of all the configurations it is based on.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected Long
origHarvestDefinitionID
The Id of the harvestdefinition, that generated this job.protected JobStatus
status
The status of the job.
-
Constructor Summary
Constructors Modifier Constructor Description protected
Job()
Job(Long harvestID, DomainConfiguration cfg, HeritrixTemplate orderXMLdoc, HarvestChannel channel, long forceMaxObjectsPerDomain, long forceMaxBytesPerDomain, long forceMaxJobRunningTime, int harvestNum)
Package private constructor for common initialisation.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addConfiguration(DomainConfiguration cfg)
Adds a configuration to this Job.void
appendHarvestErrorDetails(String harvestErrorDetails)
Append to the list of harvest error details for this job.void
appendHarvestErrors(String harvestErrors)
Append to the list of harvest errors for this job.void
appendUploadErrorDetails(String uploadErrorDetails)
Append to the list of upload error details.void
appendUploadErrors(String uploadErrors)
Append to the list of upload errors.Date
getActualStart()
Get the actual time when this job was started.Date
getActualStop()
Get the actual time when this job was stopped/completed.String
getChannel()
Long
getContinuationOf()
int
getCountDomains()
Get's the total number of different domains harvested by this job.Date
getCreationDate()
Get the time when this job was created.Map<String,String>
getDomainConfigurationMap()
Returns a map of domain names and name of their corresponding configuration.long
getForceMaxBytesPerDomain()
long
getForceMaxObjectsPerDomain()
String
getHarvestAudience()
String
getHarvestErrorDetails()
Get the list of harvest error details for this job.String
getHarvestErrors()
Get the list of harvest errors for this job.String
getHarvestFilenamePrefix()
Get the harvestFilename prefix.int
getHarvestNum()
Get the harvestNum for this job.Long
getJobID()
Get the id of this Job.long
getMaxBytesPerDomain()
Gets the maximum number of bytes harvested per domain.long
getMaxCountObjects()
long
getMaxJobRunningTime()
long
getMaxObjectsPerDomain()
Gets the maximum number of objects harvested per domain.long
getMinCountObjects()
HeritrixTemplate
getOrderXMLdoc()
Gets a document representation of the order.xml associated with this Job.String
getOrderXMLName()
Get the name of the order XML file used by this Job.Long
getOrigHarvestDefinitionID()
Get the id of the HarvestDefinition from which this job originates.Long
getResubmittedAsJob()
Get the ID for the job which this job was resubmitted as.String
getSeedListAsString()
Get the seedlist as a String.File[]
getSettingsXMLfiles()
Get a list of Heritrix settings.xml files.List<String>
getSortedSeedList()
Returns a list of sorted seeds for this job.JobStatus
getStatus()
Get the current status of this Job.Date
getSubmittedDate()
Get the time when this job was submitted.long
getTotalCountObjects()
String
getUploadErrorDetails()
Get the list of upload error details.String
getUploadErrors()
Get the list of upload errors.boolean
isConfigurationSetsByteLimit()
boolean
isConfigurationSetsObjectLimit()
boolean
isSnapshot()
void
setActualStart(Date actualStart)
Set the actual time when this job was started.void
setActualStop(Date actualStop)
Set the actual time when this job was stopped/completed.void
setAttributes(List<EAV.AttributeAndType> attributesAndTypes)
void
setChannel(String channel)
Sets the associatedHarvestChannel
name.void
setCreationDate(Date creationDate)
Set the Date for when this job was created.void
setHarvestAudience(String theAudience)
Set the harvest audience for this job.void
setHarvestChannel(HarvestChannel harvestChannel)
void
setHarvestFilenamePrefix(String prefix)
void
setHarvestNum(int harvestNum)
Set the harvestNum for this job.void
setJobID(Long id)
Set the id of this Job.protected void
setMaxBytesPerDomain(long maxBytesPerDomain)
Set the maxbytes per domain value.protected void
setMaxJobRunningTime(long maxJobRunningTime)
Set the maxJobRunningTime value.protected void
setMaxObjectsPerDomain(long maxObjectsPerDomain)
Sets the maxObjectsPerDomain value.void
setOrderXMLDoc(HeritrixTemplate doc)
Set the orderxml for this job.void
setResubmittedAsJob(Long resubmittedAsJob)
Set the ID for the job which this job was resubmitted as.void
setSeedList(String seedList)
Set the seedlist of the job from the seedList argument.void
setSnapshot(boolean isSnapshot)
Sets whether job belongs to a snapshot or focused harvest.void
setStatus(JobStatus newStatus)
Sets status of this job.void
setSubmittedDate(Date submittedDate)
Set the Date for when this job was submitted.String
toString()
-
-
-
Constructor Detail
-
Job
protected Job()
-
Job
public Job(Long harvestID, DomainConfiguration cfg, HeritrixTemplate orderXMLdoc, HarvestChannel channel, long forceMaxObjectsPerDomain, long forceMaxBytesPerDomain, long forceMaxJobRunningTime, int harvestNum) throws ArgumentNotValid
Package private constructor for common initialisation.- Parameters:
harvestID
- the id of the harvestdefinitioncfg
- the configuration to base the Job onorderXMLdoc
-channel
- the channel on which the job will be submitted.forceMaxObjectsPerDomain
- the maximum number of objects harvested from a domain, overrides individual configuration settings. -1 means no limitforceMaxBytesPerDomain
- The maximum number of objects harvested from a domain, or -1 for no limit.forceMaxJobRunningTime
- The max time in seconds given to the harvester for this jobharvestNum
- the run number of the harvest definition- Throws:
ArgumentNotValid
- if cfg or priority is null or harvestID is invalid, or if any limit < -1
-
-
Method Detail
-
setAttributes
public void setAttributes(List<EAV.AttributeAndType> attributesAndTypes)
-
addConfiguration
public void addConfiguration(DomainConfiguration cfg)
Adds a configuration to this Job. Seedlists and settings are updated accordingly.- Parameters:
cfg
- the configuration to add- Throws:
ArgumentNotValid
- if cfg is null or cfg uses a different orderxml than this job or if this job already contains a configuration associated with domain of configuration cfg.
-
getOrderXMLName
public String getOrderXMLName()
Get the name of the order XML file used by this Job.- Returns:
- the name of the orderXML file
-
getActualStop
public Date getActualStop()
Get the actual time when this job was stopped/completed.- Returns:
- the time as Date
-
getActualStart
public Date getActualStart()
Get the actual time when this job was started.- Returns:
- the time as Date
-
getSubmittedDate
public Date getSubmittedDate()
Get the time when this job was submitted.- Returns:
- the time as Date
-
getCreationDate
public Date getCreationDate()
Get the time when this job was created.- Returns:
- the creation time as a
Date
-
getSettingsXMLfiles
public File[] getSettingsXMLfiles()
Get a list of Heritrix settings.xml files. Note that these files have nothing to do with NetarchiveSuite settings files. They are files that supplement the Heritrix order.xml files, and contain overrides for specific domains.- Returns:
- the list of Files as an array
-
getOrigHarvestDefinitionID
public Long getOrigHarvestDefinitionID()
Get the id of the HarvestDefinition from which this job originates.- Specified by:
getOrigHarvestDefinitionID
in interfaceJobInfo
- Returns:
- the id as a Long
-
getJobID
public Long getJobID()
Get the id of this Job.
-
setJobID
public void setJobID(Long id)
Set the id of this Job.- Parameters:
id
- The Id for this job.
-
getCountDomains
public int getCountDomains()
Get's the total number of different domains harvested by this job.- Returns:
- the number of configurations added to this domain
-
setActualStart
public void setActualStart(Date actualStart)
Set the actual time when this job was started.Sends a notification, if actualStart is set to a time after actualStop.
- Parameters:
actualStart
- A Date object representing the time when this job was started.
-
setActualStop
public void setActualStop(Date actualStop) throws ArgumentNotValid
Set the actual time when this job was stopped/completed. Sends a notification, if actualStop is set to a time before actualStart.- Parameters:
actualStop
- A Date object representing the time when this job was stopped.- Throws:
ArgumentNotValid
-
setOrderXMLDoc
public void setOrderXMLDoc(HeritrixTemplate doc)
Set the orderxml for this job.- Parameters:
doc
- A orderxml to be used by this job
-
getOrderXMLdoc
public HeritrixTemplate getOrderXMLdoc()
Gets a document representation of the order.xml associated with this Job.- Returns:
- the XML as a org.dom4j.Document
-
setSeedList
public void setSeedList(String seedList)
Set the seedlist of the job from the seedList argument. Individual seeds are separated by a '\n' character. Duplicate seeds are removed.- Parameters:
seedList
- List of seeds as one String
-
getSeedListAsString
public String getSeedListAsString()
Get the seedlist as a String. The individual seeds are separated by the character '\n'. The order of the seeds are unknown.- Returns:
- the seedlist as a String
-
getStatus
public JobStatus getStatus()
Get the current status of this Job.- Returns:
- the status as an int in the range 0 to 4.
-
setStatus
public void setStatus(JobStatus newStatus)
Sets status of this job.- Parameters:
newStatus
- Must be one of the values STATUS_NEW, ..., STATUS_FAILED- Throws:
ArgumentNotValid
- in case of invalid status argument or invalid status change
-
getDomainConfigurationMap
public Map<String,String> getDomainConfigurationMap()
Returns a map of domain names and name of their corresponding configuration.The returned Map cannot be changed.
- Returns:
- a read-only Map (
, )
-
getMaxObjectsPerDomain
public long getMaxObjectsPerDomain()
Gets the maximum number of objects harvested per domain.- Returns:
- The maximum number of objects harvested per domain. 0 means no limit.
-
getMaxBytesPerDomain
public long getMaxBytesPerDomain()
Gets the maximum number of bytes harvested per domain.- Returns:
- The maximum number of bytes harvested per domain. -1 means no limit.
-
setHarvestChannel
public void setHarvestChannel(HarvestChannel harvestChannel)
-
getChannel
public String getChannel()
- Returns:
- the associated
HarvestChannel
name.
-
setChannel
public void setChannel(String channel)
Sets the associatedHarvestChannel
name.- Parameters:
channel
- the channel name
-
isSnapshot
public boolean isSnapshot()
- Returns:
- true if the job belongs to a snapshot harvest, false if it belongs to a focused harvest.
-
setSnapshot
public void setSnapshot(boolean isSnapshot)
Sets whether job belongs to a snapshot or focused harvest.- Parameters:
isSnapshot
- true if the job belongs to a snapshot harvest, false if it belongs to a focused harvest.
-
getForceMaxObjectsPerDomain
public long getForceMaxObjectsPerDomain()
- Returns:
- Returns the forceMaxObjectsPerDomain. 0 means no limit.
-
setMaxObjectsPerDomain
protected void setMaxObjectsPerDomain(long maxObjectsPerDomain)
Sets the maxObjectsPerDomain value.- Parameters:
maxObjectsPerDomain
- The forceMaxObjectsPerDomain to set. 0 means no limit.- Throws:
IOFailure
- Thrown from auxiliary method editOrderXML_maxObjectsPerDomain.
-
setMaxBytesPerDomain
protected void setMaxBytesPerDomain(long maxBytesPerDomain)
Set the maxbytes per domain value.- Parameters:
maxBytesPerDomain
- The maxBytesPerDomain to set, or -1 for no limit.
-
setMaxJobRunningTime
protected void setMaxJobRunningTime(long maxJobRunningTime)
Set the maxJobRunningTime value.- Parameters:
maxJobRunningTime
- The maxJobRunningTime in seconds to set, or 0 for no limit.
-
getMaxJobRunningTime
public long getMaxJobRunningTime()
- Returns:
- Returns the MaxJobRunningTime. 0 means no limit.
-
getHarvestNum
public int getHarvestNum()
Get the harvestNum for this job. The number reflects which run of the harvest definition this is.- Returns:
- the harvestNum for this job.
-
setHarvestNum
public void setHarvestNum(int harvestNum)
Set the harvestNum for this job. The number reflects which run of the harvest definition this is. ONLY TO BE USED IN THE CONSTRUCTION PHASE.- Parameters:
harvestNum
- a given harvestNum
-
getHarvestErrors
public String getHarvestErrors()
Get the list of harvest errors for this job. If no harvest errors, null is returned This value is not meaningful until the job is finished (FAILED,DONE, RESUBMITTED)- Returns:
- the harvest errors for this job or null if no harvest errors.
-
appendHarvestErrors
public void appendHarvestErrors(String harvestErrors)
Append to the list of harvest errors for this job. Nothing happens, if argument harvestErrors is null.- Parameters:
harvestErrors
- a string containing harvest errors (may be null)
-
getHarvestErrorDetails
public String getHarvestErrorDetails()
Get the list of harvest error details for this job. If no harvest error details, null is returned This value is not meaningful until the job is finished (FAILED,DONE, RESUBMITTED)- Returns:
- the list of harvest error details for this job or null if no harvest error details.
-
appendHarvestErrorDetails
public void appendHarvestErrorDetails(String harvestErrorDetails)
Append to the list of harvest error details for this job. Nothing happens, if argument harvestErrorDetails is null.- Parameters:
harvestErrorDetails
- a string containing harvest error details.
-
getUploadErrors
public String getUploadErrors()
Get the list of upload errors. If no upload errors, null is returned. This value is not meaningful until the job is finished (FAILED,DONE, RESUBMITTED)- Returns:
- the list of upload errors as String, or null if no upload errors.
-
appendUploadErrors
public void appendUploadErrors(String uploadErrors)
Append to the list of upload errors. Nothing happens, if argument uploadErrors is null.- Parameters:
uploadErrors
- a string containing upload errors.
-
getUploadErrorDetails
public String getUploadErrorDetails()
Get the list of upload error details. If no upload error details, null is returned. This value is not meaningful until the job is finished (FAILED,DONE, RESUBMITTED)- Returns:
- the list of upload error details as String, or null if no upload error details
-
appendUploadErrorDetails
public void appendUploadErrorDetails(String uploadErrorDetails)
Append to the list of upload error details. Nothing happens, if argument uploadErrorDetails is null.- Parameters:
uploadErrorDetails
- a string containing upload error details.
-
getResubmittedAsJob
public Long getResubmittedAsJob()
Get the ID for the job which this job was resubmitted as. If null, this job has not been resubmitted.- Returns:
- this ID.
-
setSubmittedDate
public void setSubmittedDate(Date submittedDate)
Set the Date for when this job was submitted. If null, this job has not been submitted.- Parameters:
submittedDate
- The date when this was submitted
-
setCreationDate
public void setCreationDate(Date creationDate)
Set the Date for when this job was created. If null, this job has not been created.- Parameters:
creationDate
- The date when this was created
-
setResubmittedAsJob
public void setResubmittedAsJob(Long resubmittedAsJob)
Set the ID for the job which this job was resubmitted as.- Parameters:
resubmittedAsJob
- An Id for a new job.
-
getContinuationOf
public Long getContinuationOf()
- Returns:
- id of the job that this job is supposed to continue using Heritrix recover-log or null if it starts from scratch.
-
getHarvestFilenamePrefix
public String getHarvestFilenamePrefix()
Description copied from interface:JobInfo
Get the harvestFilename prefix.- Specified by:
getHarvestFilenamePrefix
in interfaceJobInfo
- Returns:
- the harvestFilename prefix.
-
setHarvestFilenamePrefix
public void setHarvestFilenamePrefix(String prefix)
- Parameters:
prefix
-
-
getForceMaxBytesPerDomain
public long getForceMaxBytesPerDomain()
- Returns:
- the forceMaxBytesPerDomain
-
isConfigurationSetsObjectLimit
public boolean isConfigurationSetsObjectLimit()
- Returns:
- the configurationSetsObjectLimit
-
isConfigurationSetsByteLimit
public boolean isConfigurationSetsByteLimit()
- Returns:
- the configurationSetsByteLimit
-
getMinCountObjects
public long getMinCountObjects()
- Returns:
- the minCountObjects
-
getMaxCountObjects
public long getMaxCountObjects()
- Returns:
- the maxCountObjects
-
getTotalCountObjects
public long getTotalCountObjects()
- Returns:
- the totalCountObjects
-
getHarvestAudience
public String getHarvestAudience()
- Returns:
- the harvest-audience.
-
setHarvestAudience
public void setHarvestAudience(String theAudience)
Set the harvest audience for this job. Taken from the harvestdefinition that generated this job.- Parameters:
theAudience
- the harvest-audience.
-
-