|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
JobGenerator
implementations.ComponentLifeCycle
.
AggregationWorker
singleton contains the schedule and file
bookkeeping functionality needed in the aggregation of indexes.AggregationWorker
inside
Jetty.Constants.ARCDIRECTORY_NAME
.
DateFormat
object.
DateFormat
objects.ArchiveFileNaming
.ArcRepositoryClient.store(File)
.
LegacyHarvestReport
, but is intended to be used with a crawl order
that sets budget using "queue-total-budget" instead of the QuotaEnforcer
(@see HarvesterSettings.OBJECT_LIMIT_SET_BY_QUOTA_ENFORCER
).BnfHeritrixController
.CrawlProgressMessage
instance.
AbstractJobGenerator.canAccept(Job, DomainConfiguration)
.
HarvesterSettings.JOB_TIMEOUT_TIME
setting.
ComponentLifeCycle
better control over the component startup
and shutdown phases.FileBatchJob.processOnlyFilesMatching(String)
construct.ShowUnusedConfigurations
boolean, which is flipped.
ShowUnusedSeedLists
boolean, which is flipped.
MetadataFileWriter
for ARC output.
MetadataFileWriter
for WARC output.
Settings.SETTINGS_FILE_PROPERTY
is not set.
Document
instance.
DomainConfiguration
s, performs the
following operations:
Edit the harvest template to add/remove deduplicator configuration.
HarvestMonitor
.FrontierReportLine
.HarvestMonitor
.HarvestReport
implementation defined by the setting
HarvesterSettings.HARVEST_REPORT_CLASS
.
DateFormat
object for ARC date conversion.
File
getArchiveFile() -
Method in class dk.netarkivet.common.utils.archive.HeritrixArchiveHeaderWrapper
Date
object.
AbstractJobGenerator.DOMAIN_CONFIG_SUBSET_SIZE
configurations that are scanned at each iteration.
Set
of header keys.
Map
of all header key/value pairs.
ArchiveFileNaming
implementation defined by the setting
settings.harvester.harvesting.heritrix.archiveNaming.class .
HeritrixLauncher
implementation defined by the setting
dk.netarkivet.harvester.harvesting.heritrixLauncher.class .
JobGenerator
implementation defined by the setting HarvesterSettings.JOBGEN_CLASS
.
DomainStats
object for that
domain, and if not found creates one with zero values.
DateFormat
object for WARC date conversion.
CrawlProgressMessage
s.
HarvestReport
interface to be used.
HarvestControllerServer
periodically sends
HarvesterReadyMessage
s to the JobDispatcher
to notify
it whether it is available for processing a job or already processing one.HarvestJobManager
s lifecycle.
HarvestJobManager
application.CrawlProgressMessage
s on the proper JMS channel, and
stores information to be presented in the monitoring console.HarvestReport
.HeritrixLauncher
.BnfHeritrixController
insteadHarvestSchedulerMonitorServer
to the
HarvestMonitor
to notify it that a job ended and should not be
monitored anymore, and that any resource used to monitor this job
should be freed.DefaultJobGenerator
or FixedDomainConfigurationCountJobGenerator
.
FixedDomainConfigurationCountJobGenerator
,
then this parameter toggles whether or not domain configurations with a budget of zero
(byte or objects) should be excluded from jobs.
FixedDomainConfigurationCountJobGenerator
,
then this parameter represents the maximum number of domain configurations
in a partial harvest job.
FixedDomainConfigurationCountJobGenerator
,
then this parameter represents the maximum number of domain configurations
in a full harvest job.
JobGenerator
.JobSupervisor.start()
for details.LinkedHashMap
.HarvestJobManager
.
IndexAggregator
.
ARCRecordToSearchResultAdapter
ArcFilesReportGenerator.ArcFileStatus
instance.
ScheduledThreadPoolExecutor
, allowing to
periodically run one or several Runnable
tasks
(fixed rate execution).ReplicaCacheDatabase
.DomainnameQueueAssignmentPolicy
where domainname returned is the domainname of the candidateURI
except where the domainname of the SeedURI is a different one.HarvesterReadyMessage
to the JobDispatcher
.
start()
method.
BitSet.size()
this does not return the
size in bytes used to represent this set.
ComponentLifeCycle
object.
StartedJobInfo
record to the persistent storage.
StartedJobInfo
record to the persistent storage.
HarvesterDatabaseTables
to the required
version.
CommonSettings.REMOTE_FILE_CLASS
.
Constants.WARCDIRECTORY_NAME
.
DateFormat
object.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |