Class DefaultJobGenerator
- java.lang.Object
-
- dk.netarkivet.harvester.scheduler.jobgen.DefaultJobGenerator
-
- All Implemented Interfaces:
JobGenerator
public class DefaultJobGenerator extends Object
The legacy job generator implementation. Aims at generating jobs that execute in a predictable time by taking advantage of previous crawls statistics.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DefaultJobGenerator.CompareConfigsDesc
Compare two configurations using the following order: 1) Harvest template 2) Byte limit 3) expected number of object a harvest of the configuration will produce.
-
Constructor Summary
Constructors Constructor Description DefaultJobGenerator()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canAccept(Job job, DomainConfiguration cfg, DomainConfiguration previousCfg)
Tests if a configuration fits into this Job.protected boolean
checkSpecificAcceptConditions(Job job, DomainConfiguration cfg)
protected void
editJobOrderXml(Job job)
Once the job has been filled withDomainConfiguration
s, performs the following operations: Edit the harvest template to add/remove deduplicator configuration.int
generateJobs(HarvestDefinition harvest)
Generates a series of jobs for the given harvest definition.protected Comparator<DomainConfiguration>
getDomainConfigurationSubsetComparator(HarvestDefinition harvest)
Returns a comparator used to sort the subset ofDOMAIN_CONFIG_SUBSET_SIZE
configurations that are scanned at each iteration.static DefaultJobGenerator
getInstance()
Job
getNewJob(HarvestDefinition harvest, DomainConfiguration cfg)
Instantiates a new job.boolean
ignoreConfiguration(DomainConfiguration cfg)
Test if this configuration should be ignoredprotected int
processDomainConfigurationSubset(HarvestDefinition harvest, Iterator<DomainConfiguration> domainConfSubset)
Create new jobs from a collection of configurations.static void
reset()
Only to be used by unittests.
-
-
-
Method Detail
-
getInstance
public static DefaultJobGenerator getInstance()
- Returns:
- the singleton instance, builds it if necessary.
-
getDomainConfigurationSubsetComparator
protected Comparator<DomainConfiguration> getDomainConfigurationSubsetComparator(HarvestDefinition harvest)
Returns a comparator used to sort the subset ofDOMAIN_CONFIG_SUBSET_SIZE
configurations that are scanned at each iteration.- Parameters:
harvest
- theHarvestDefinition
being processed.- Returns:
- a comparator
-
processDomainConfigurationSubset
protected int processDomainConfigurationSubset(HarvestDefinition harvest, Iterator<DomainConfiguration> domainConfSubset)
Create new jobs from a collection of configurations. All configurations must use the same order.xml file.Jobs- Parameters:
harvest
- theHarvestDefinition
being processed.domainConfSubset
- the configurations to use to create the jobs- Returns:
- The number of jobs created
- Throws:
ArgumentNotValid
- if any of the parameters is null or if the cfglist does not contain any configurations
-
checkSpecificAcceptConditions
protected boolean checkSpecificAcceptConditions(Job job, DomainConfiguration cfg)
Called byJobGenerator.canAccept(Job, DomainConfiguration, DomainConfiguration)
. Tests the implementation-specific conditions to accept the givenDomainConfiguration
in the givenJob
. It is assumed thatcheckAddDomainConfInvariant(Job, DomainConfiguration, DomainConfiguration)
has already passed.- Parameters:
job
- theJob
n=being builtcfg
- theDomainConfiguration
to test- Returns:
- true if the configuration passes the conditions.
-
reset
public static void reset()
Only to be used by unittests.
-
generateJobs
public int generateJobs(HarvestDefinition harvest)
Description copied from interface:JobGenerator
Generates a series of jobs for the given harvest definition. Note that a job generator is expected to follow the singleton pattern, so implementations of this method should be thread-safe.- Specified by:
generateJobs
in interfaceJobGenerator
- Parameters:
harvest
- the harvest definition to process.- Returns:
- the number of jobs that were generated.
-
getNewJob
public Job getNewJob(HarvestDefinition harvest, DomainConfiguration cfg)
Instantiates a new job.- Parameters:
cfg
- theDomainConfiguration
being processedharvest
- theHarvestDefinition
being processed- Returns:
- an instance of
Job
-
canAccept
public boolean canAccept(Job job, DomainConfiguration cfg, DomainConfiguration previousCfg)
Description copied from interface:JobGenerator
Tests if a configuration fits into this Job. First tests if it's the right type of order-template and bytelimit, and whether the bytelimit is right for the job. The Job limits are compared against the configuration estimates and if no limits are exceeded true is returned otherwise false is returned.- Specified by:
canAccept
in interfaceJobGenerator
- Parameters:
job
- the job being built.cfg
- the configuration to checkpreviousCfg
- if not null, the configuration added to this job immediately prior- Returns:
- true if adding the configuration to this Job does not exceed any of the Job limits.
-
editJobOrderXml
protected void editJobOrderXml(Job job)
Once the job has been filled withDomainConfiguration
s, performs the following operations:- Edit the harvest template to add/remove deduplicator configuration.
- Parameters:
job
- the job
-
ignoreConfiguration
public boolean ignoreConfiguration(DomainConfiguration cfg)
Description copied from interface:JobGenerator
Test if this configuration should be ignored- Specified by:
ignoreConfiguration
in interfaceJobGenerator
- Parameters:
cfg
- a domain configuration- Returns:
- true if we should ignore this configuration (It could be that it is disabled in some way, or all seeds are prefixed with a '#' and so there are no active seeds
-
-