|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object dk.netarkivet.harvester.HarvesterSettings
public class HarvesterSettings
Settings specific to the harvester module of NetarchiveSuite.
Field Summary | |
---|---|
static java.lang.String |
ABORT_IF_CONNECTION_LOST
settings.harvester.harvesting.heritrix.abortIfConnectionLost: Boolean flag. |
static java.lang.String |
ALIAS_TIMEOUT
settings.harvester.aliases.timeout The amount of time in seconds before an alias times out, and needs to be re-evaluated. |
static java.lang.String |
CRAWL_LOOP_WAIT_TIME
settings.harvester.harvesting.heritrix.crawlLoopWaitTime: Time interval in seconds to wait during a crawl loop in the harvest controller. |
static java.lang.String |
CRAWLER_TIMEOUT_NON_RESPONDING
settings.harvester.harvesting.heritrix.noresponseTimeout: The timeout value (in seconds) used in HeritrixLauncher for aborting crawl when no bytes are being received from web servers. |
static java.lang.String |
DEDUPLICATION_ENABLED
settings.harvester.harvesting.deduplication.enabled: This setting tells the system whether or not to use deduplication. |
static java.lang.String |
DEFAULT_SEEDLIST
settings.harvester.datamodel.domain.defaultSeedlist: Default name of the seedlist to use when new domains are created. |
static java.lang.String |
DISPATCH_JOBS_PERIOD
The period between checking if new jobs should be dispatched to the harvest servers. |
static java.lang.String |
DOMAIN_CONFIG_MAXBYTES
settings.harvester.datamodel.domain.defaultMaxbytes: Default byte limit for domain configuration. |
static java.lang.String |
DOMAIN_CONFIG_MAXOBJECTS
settings.harvester.datamodel.domain.defaultMaxobjects: Default object limit for domain configuration. |
static java.lang.String |
DOMAIN_CONFIG_MAXRATE
settings.harvester.datamodel.domain.defaultMaxrate: Default download rate for domain configuration. |
static java.lang.String |
DOMAIN_DEFAULT_CONFIG
settings.harvester.datamodel.domain.defaultConfig: The name of a configuration that is created by default and which is initially used for snapshot harvests. |
static java.lang.String |
DOMAIN_DEFAULT_ORDERXML
settings.harvester.datamodel.domain.defaultOrderxml: Name of order xml template used for domains if nothing else is specified. |
static java.lang.String |
ERRORFACTOR_PERMITTED_BESTGUESS
settings.harvester.scheduler.errorFactorBestGuess: Used when calculating expected size of a harvest of some configuration during job-creation process. |
static java.lang.String |
ERRORFACTOR_PERMITTED_PREVRESULT
settings.harvester.scheduler.errorFactorPrevResult: Used when calculating expected size of a harvest of some configuration during job-creation process. |
static java.lang.String |
EXPECTED_AVERAGE_BYTES_PER_OBJECT
settings.harvester.scheduler.expectedAverageBytesPerObject: How many bytes the average object is expected to be on domains where we don't know any better. |
static java.lang.String |
FRONTIER_REPORT_FILTER_ARGS
settings.harvester.harvesting.frontier.filter.args Defines a frontier report filter's arguments. |
static java.lang.String |
FRONTIER_REPORT_FILTER_CLASS
settings.harvester.harvesting.frontier.filter.class Defines a filter to apply to the full frontier report. |
static java.lang.String |
FRONTIER_REPORT_WAIT_TIME
settings.harvester.harvesting.frontier.frontierReportWaitTime: Time interval in seconds to wait between two requests to generate a full frontier report. |
static java.lang.String |
GENERATE_JOBS_PERIOD
settings.harvester.scheduler.jobgenerationperiode: The period between checking if new jobs should be generated, in seconds. |
static java.lang.String |
HARVEST_CONTROLLER_OLDJOBSDIR
settings.harvester.harvesting.oldjobsDir: The directory in which data from old jobs is kept after uploading. |
static java.lang.String |
HARVEST_CONTROLLER_PRIORITY
settings.harvester.harvesting.queuePriority: Pool to take jobs from. |
static java.lang.String |
HARVEST_CONTROLLER_SERVERDIR
settings.harvester.harvesting.serverDir: Each job gets a subdir of this dir. |
static java.lang.String |
HARVEST_MONITOR_DISPLAYED_HISTORY_SIZE
settings.harvester.monitor.displayedHistorySize: Maximum number of most recent history records displayed on the running job details page. |
static java.lang.String |
HARVEST_MONITOR_HISTORY_CHART_GEN_INTERVAL
settings.harvester.monitor.historyChartGenIntervall: Time interval in seconds between regenerating the chart of historical data for a running job. |
static java.lang.String |
HARVEST_MONITOR_HISTORY_SAMPLE_RATE
settings.harvester.monitor.historySampleRate: Time interval in seconds between historical records stores in the DB. |
static java.lang.String |
HARVEST_MONITOR_REFRESH_INTERVAL
settings.harvester.monitor.refreshInterval: Time interval in seconds after which the harvest monitor pages will be automatically refreshed. |
static java.lang.String |
HARVEST_REPORT_CLASS
settings.harvester.harvesting.harvestReport: The implementation of HarvestReport interface to be used. |
static java.lang.String |
HARVEST_SERVERDIR_MINSPACE
settings.harvester.harvesting.minSpaceLeft: The minimum amount of free bytes in the serverDir required before accepting any harvest-jobs. |
static java.lang.String |
HERITRIX_ADMIN_NAME
settings.harvester.harvesting.heritrix.adminName: The name used to access the Heritrix GUI. |
static java.lang.String |
HERITRIX_ADMIN_PASSWORD
settings.harvester.harvesting.heritrix.adminPassword: The password used to access the Heritrix GUI. |
static java.lang.String |
HERITRIX_CONTROLLER_CLASS
settings.harvester.harvesting.heritrixControllerClass: The implementation of the HeritrixController interface to be used. |
static java.lang.String |
HERITRIX_GUI_PORT
settings.harvester.harvesting.heritrix.guiPort: Port used to access the Heritrix web user interface. |
static java.lang.String |
HERITRIX_HEAP_SIZE
settings.harvester.harvesting.heritrix.heapSize: The heap size to use for the Heritrix sub-process. |
static java.lang.String |
HERITRIX_JMX_PASSWORD
settings.harvester.harvesting.heritrix.jmxPassword: The password used to connect to Heritrix JMX interface The password must correspond to the value stored in the jmxremote.password file (name defined in setting settings.common.jmx.passwordFile). |
static java.lang.String |
HERITRIX_JMX_PORT
settings.harvester.harvesting.heritrix.jmxPort: The port that Heritrix uses to expose its JMX interface. |
static java.lang.String |
HERITRIX_JMX_USERNAME
settings.harvester.harvesting.heritrix.jmxUsername: The username used to connect to Heritrix JMX interface The username must correspond to the value stored in the jmxremote.password file (name defined in setting settings.common.jmx.passwordFile). |
static java.lang.String |
HERITRIX_JVM_OPTS
settings.harvester.harvesting.heritrix.javaOpts: Additional JVM options for the Heritrix sub-process. |
static java.lang.String |
HERITRIX_LAUNCHER_CLASS
settings.harvester.harvesting.heritrixLauncherClass: The implementation of the HeritrixLauncher abstract class to be used. |
static java.lang.String |
INACTIVITY_TIMEOUT_IN_SECS
settings.harvester.harvesting.heritrix.inactivityTimeout: The timeout setting for aborting a crawl based on crawler-inactivity. |
static java.lang.String |
JOB_TIMEOUT_TIME
settings.harvester.scheduler.jobtimeouttime: Time before a STARTED job times out and change status to FAILED. |
static java.lang.String |
JOBS_MAX_RELATIVE_SIZE_DIFFERENCE
settings.harvester.scheduler.jobs.maxRelativeSizeDifference: The maximum allowed relative difference in expected number of objects retrieved in a single job definition. |
static java.lang.String |
JOBS_MAX_TIME_TO_COMPLETE
settings.harvester.scheduler.jobs.maxTimeToCompleteJob: The limit on how many seconds Heritrix should continue on each job. |
static java.lang.String |
JOBS_MAX_TOTAL_JOBSIZE
settings.harvester.scheduler.jobs.maxTotalSize: When this limit is exceeded no more configurations may be added to a job. |
static java.lang.String |
JOBS_MIN_ABSOLUTE_SIZE_DIFFERENCE
settings.harvester.scheduler.jobs.minAbsoluteSizeDifference: Size differences for jobs below this threshold are ignored, regardless of the limits for the relative size difference. |
static java.lang.String |
MAX_CONFIGS_PER_JOB_CREATION
settings.harvester.scheduler.configChunkSize: How many domain configurations we will process in one go before making jobs out of them. |
static java.lang.String |
MAX_DOMAIN_SIZE
settings.harvester.scheduler.maxDomainSize: The initial guess of the domain size (number of objects) of an unknown domain. |
static java.lang.String |
METADATA_GENERATE_ARCFILES_REPORT
settings.harvester.harvesting.metadata.generateArcFilesReport This setting is a boolean flag that enables/disables the generation of an ARC files report. |
static java.lang.String |
METADATA_HERITRIX_FILE_PATTERN
settings.harvester.harvesting.metadata.heritrixFilePattern This setting allows to filter which Heritrix files should be stored in the metadata ARC. |
static java.lang.String |
METADATA_LOG_FILE_PATTERN
settings.harvester.harvesting.metadata.logFilePattern This setting allows to filter which Heritrix log files should be stored in the metadata ARC. |
static java.lang.String |
METADATA_REPORT_FILE_PATTERN
settings.harvester.harvesting.metadata.reportFilePattern This setting allows to filter which Heritrix files that should be stored in the metadata ARC are to be classified as a report. |
static java.lang.String |
SEND_STATUS_DELAY
settings.harvester.harvesting.sendStatusDelay: Time interval in seconds to wait before transmitting a HarvesterStatusMessage to the HarvestDispatcher . |
static java.lang.String |
SPLIT_BY_OBJECTLIMIT
settings.harvester.scheduler.splitByObjectLimit: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. |
static java.lang.String |
USE_QUOTA_ENFORCER
settings.harvester.scheduler.useQuotaEnforcer: Controls whether the domain configuration object limit should be set in Heritrix's crawl order through the QuotaEnforcer configuration (parameter set to true) or through the frontier parameter 'queue-total-budget' ( parameter set to false). |
static java.lang.String |
VALID_SEED_REGEX
settings.harvester.datamodel.domain.validSeedRegex: Regular expression used to validate a seed within a seedlist. |
static java.lang.String |
WAIT_FOR_REPORT_GENERATION_TIMEOUT
settings.harvester.harvesting.heritrix.waitForReportGenerationTimeout: Maximum time in seconds to wait for Heritrix to generate report files once crawling is over. |
Constructor Summary | |
---|---|
HarvesterSettings()
|
Method Summary |
---|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static java.lang.String DEFAULT_SEEDLIST
public static java.lang.String VALID_SEED_REGEX
public static java.lang.String DOMAIN_DEFAULT_CONFIG
public static java.lang.String DOMAIN_DEFAULT_ORDERXML
public static java.lang.String DOMAIN_CONFIG_MAXRATE
public static java.lang.String DOMAIN_CONFIG_MAXBYTES
public static java.lang.String DOMAIN_CONFIG_MAXOBJECTS
public static java.lang.String ERRORFACTOR_PERMITTED_PREVRESULT
public static java.lang.String ERRORFACTOR_PERMITTED_BESTGUESS
public static java.lang.String EXPECTED_AVERAGE_BYTES_PER_OBJECT
public static java.lang.String MAX_DOMAIN_SIZE
public static java.lang.String JOBS_MAX_RELATIVE_SIZE_DIFFERENCE
public static java.lang.String JOBS_MIN_ABSOLUTE_SIZE_DIFFERENCE
public static java.lang.String JOBS_MAX_TOTAL_JOBSIZE
public static java.lang.String JOBS_MAX_TIME_TO_COMPLETE
public static java.lang.String MAX_CONFIGS_PER_JOB_CREATION
public static java.lang.String SPLIT_BY_OBJECTLIMIT
public static java.lang.String USE_QUOTA_ENFORCER
public static java.lang.String JOB_TIMEOUT_TIME
public static java.lang.String DISPATCH_JOBS_PERIOD
SEND_STATUS_DELAY
, and be significantly higher.
This is set by default to 30 seconds (an estimate of the harvest servers
ability to consume messages being 5 seconds).
public static java.lang.String GENERATE_JOBS_PERIOD
public static java.lang.String HARVEST_CONTROLLER_SERVERDIR
public static java.lang.String HARVEST_SERVERDIR_MINSPACE
public static java.lang.String HARVEST_CONTROLLER_OLDJOBSDIR
public static java.lang.String HARVEST_CONTROLLER_PRIORITY
public static java.lang.String INACTIVITY_TIMEOUT_IN_SECS
public static java.lang.String CRAWLER_TIMEOUT_NON_RESPONDING
public static java.lang.String HARVEST_MONITOR_REFRESH_INTERVAL
public static java.lang.String HARVEST_MONITOR_HISTORY_SAMPLE_RATE
public static java.lang.String HARVEST_MONITOR_HISTORY_CHART_GEN_INTERVAL
public static java.lang.String HARVEST_MONITOR_DISPLAYED_HISTORY_SIZE
public static java.lang.String CRAWL_LOOP_WAIT_TIME
public static java.lang.String SEND_STATUS_DELAY
HarvesterStatusMessage
to the HarvestDispatcher
.
Note that this should adjusted in regard of
DISPATCH_JOBS_PERIOD
, and be significantly smaller.
Default value is 1 second.
public static java.lang.String FRONTIER_REPORT_WAIT_TIME
public static java.lang.String FRONTIER_REPORT_FILTER_CLASS
TopTotalEnqueuesFilter
public static java.lang.String FRONTIER_REPORT_FILTER_ARGS
public static java.lang.String ABORT_IF_CONNECTION_LOST
BnfHeritrixController
public static java.lang.String WAIT_FOR_REPORT_GENERATION_TIMEOUT
public static java.lang.String HERITRIX_ADMIN_NAME
public static java.lang.String HERITRIX_ADMIN_PASSWORD
public static java.lang.String HERITRIX_GUI_PORT
public static java.lang.String HERITRIX_JMX_PORT
public static java.lang.String HERITRIX_JMX_USERNAME
public static java.lang.String HERITRIX_JMX_PASSWORD
public static java.lang.String HERITRIX_HEAP_SIZE
public static java.lang.String HERITRIX_JVM_OPTS
public static java.lang.String HERITRIX_CONTROLLER_CLASS
public static java.lang.String HERITRIX_LAUNCHER_CLASS
public static java.lang.String HARVEST_REPORT_CLASS
HarvestReport
interface to be used.
public static java.lang.String DEDUPLICATION_ENABLED
public static java.lang.String METADATA_HERITRIX_FILE_PATTERN
Pattern
public static java.lang.String METADATA_REPORT_FILE_PATTERN
Pattern
public static java.lang.String METADATA_LOG_FILE_PATTERN
Pattern
public static java.lang.String METADATA_GENERATE_ARCFILES_REPORT
HarvestDocumentation.documentHarvest(java.io.File, long, long)
public static java.lang.String ALIAS_TIMEOUT
Constructor Detail |
---|
public HarvesterSettings()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |