public class HarvestControllerServer extends HarvesterMessageHandler implements CleanupIF
During its operation CrawlStatus messages are sent to the HarvestSchedulerMonitorServer. When starting the actual harvesting a message is sent with status 'STARTED'. When the harvesting has finished a message is sent with either status 'DONE' or 'FAILED'. Either a 'DONE' or 'FAILED' message with result should ALWAYS be sent if at all possible, but only ever one such message per job.
It is necessary to be able to run the Heritrix harvester on several machines and several processes on each machine. Each instance of Heritrix is started and monitored by a HarvestControllerServer.
Initially, all directories under serverdir are scanned for harvestinfo files. If any are found, they are parsed for information, and all remaining files are attempted uploaded to the bitarchive. It will then send back a crawlstatusmessage with status failed.
A new thread is started for each actual crawl, in which the JMS listener is removed. Threading is required since JMS will not let the called thread remove the listener that's being handled.
After a harvestjob has been terminated, either successfully or unsuccessfully, the serverdir is again scanned for harvestInfo files to attempt upload of files not yet uploaded. Then it begins to listen again after new jobs, if there is enough room available on the machine. If not, it logs a warning about this, which is also sent as a notification.
Modifier and Type | Field and Description |
---|---|
static ChannelID |
HARVEST_CHAN_VALID_RESP_ID
The JMS channel on which to listen for
HarvesterRegistrationResponse s. |
Modifier and Type | Method and Description |
---|---|
void |
cleanup()
Will be called on shutdown.
|
void |
close()
Release all jms connections.
|
static HarvestControllerServer |
getInstance()
Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController,
uploads arc-files from unfinished harvests, and starts to listen to JMS messages on the incoming jms queues.
|
void |
visit(DoOneCrawlMessage msg)
Receives a DoOneCrawlMessage and call onDoOneCrawl.
|
void |
visit(HarvesterRegistrationResponse msg)
This method should be overridden and implemented by a sub class if message handling is wanted.
|
public static final ChannelID HARVEST_CHAN_VALID_RESP_ID
HarvesterRegistrationResponse
s.public static HarvestControllerServer getInstance() throws IOFailure
PermissionDenied
- If the serverdir or oldjobsdir can't be createdIOFailure
- if data from old harvests exist, but contain illegal datapublic void close()
public void cleanup()
cleanup
in interface CleanupIF
CleanupIF.cleanup()
public void visit(DoOneCrawlMessage msg) throws IOFailure, UnknownID, ArgumentNotValid, PermissionDenied
visit
in interface HarvesterMessageVisitor
visit
in class HarvesterMessageHandler
msg
- the message receivedIOFailure
- if the crawl fails if unable to write to harvestInfoFileUnknownID
- if jobID is null in the messageArgumentNotValid
- if the status of the job is not valid - must be SUBMITTEDPermissionDenied
- if the crawldir can't be createdpublic void visit(HarvesterRegistrationResponse msg)
HarvesterMessageHandler
visit
in interface HarvesterMessageVisitor
visit
in class HarvesterMessageHandler
msg
- a HarvesterRegistrationResponse
Copyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.