dk.netarkivet.harvester.harvesting.distribute
Class HarvestControllerServer

java.lang.Object
  extended by dk.netarkivet.harvester.distribute.HarvesterMessageHandler
      extended by dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
All Implemented Interfaces:
CleanupIF, HarvesterMessageVisitor, javax.jms.MessageListener

public class HarvestControllerServer
extends HarvesterMessageHandler
implements CleanupIF

This class responds to JMS doOneCrawl messages from the HarvestScheduler and launches a Heritrix crawl with the received job description. The generated ARC files are uploaded to the bitarchives once a harvest job has been completed. During its operation CrawlStatus messages are sent to the HarvestSchedulerMonitorServer. When starting the actual harvesting a message is sent with status 'STARTED'. When the harvesting has finished a message is sent with either status 'DONE' or 'FAILED'. Either a 'DONE' or 'FAILED' message with result should ALWAYS be sent if at all possible, but only ever one such message per job. It is necessary to be able to run the Heritrix harvester on several machines and several processes on each machine. Each instance of Heritrix is started and monitored by a HarvestControllerServer. Initially, all directories under serverdir are scanned for harvestinfo files. If any are found, they are parsed for information, and all remaining files are attempted uploaded to the bitarchive. It will then send back a crawlstatusmessage with status failed. A new thread is started for each actual crawl, in which the JMS listener is removed. Threading is required since JMS will not let the called thread remove the listener that's being handled. After a harvestjob has been terminated, either successfully or unsuccessfully, the serverdir is again scanned for harvestInfo files to attempt upload of files not yet uploaded. Then it begins to listen again after new jobs, if there is enough room available on the machine. If not, it logs a warning about this, which is also sent as a notification.


Field Summary
(package private) static int WAIT_FOR_HOSTS_REPORT_TIMEOUT_SECS
          The max time to wait for the hosts-report.txt to be available (in secs).
 
Method Summary
 void cleanup()
          Will be called on shutdown.
 void close()
          Release all jms connections.
static HarvestControllerServer getInstance()
          Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController, uploads arc-files from unfinished harvests, and starts to listen to JMS messages on the incoming jms queues.
 void visit(DoOneCrawlMessage msg)
          Receives a DoOneCrawlMessage and call onDoOneCrawl.
 void visit(IndexReadyMessage msg)
          This method should be overridden and implemented by a sub class if message handling is wanted.
 
Methods inherited from class dk.netarkivet.harvester.distribute.HarvesterMessageHandler
onMessage, visit, visit, visit, visit, visit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

WAIT_FOR_HOSTS_REPORT_TIMEOUT_SECS

static final int WAIT_FOR_HOSTS_REPORT_TIMEOUT_SECS
The max time to wait for the hosts-report.txt to be available (in secs).

See Also:
Constant Field Values
Method Detail

getInstance

public static HarvestControllerServer getInstance()
                                           throws IOFailure
Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController, uploads arc-files from unfinished harvests, and starts to listen to JMS messages on the incoming jms queues.

Returns:
The instance
Throws:
PermissionDenied - If the serverdir or oldjobsdir can't be created
IOFailure - if data from old harvests exist, but contain illegal data

close

public void close()
Release all jms connections. Close the Controller


cleanup

public void cleanup()
Will be called on shutdown.

Specified by:
cleanup in interface CleanupIF
See Also:
CleanupIF.cleanup()

visit

public void visit(DoOneCrawlMessage msg)
           throws IOFailure,
                  UnknownID,
                  ArgumentNotValid,
                  PermissionDenied
Receives a DoOneCrawlMessage and call onDoOneCrawl.

Specified by:
visit in interface HarvesterMessageVisitor
Overrides:
visit in class HarvesterMessageHandler
Parameters:
msg - the message received
Throws:
IOFailure - if the crawl fails if unable to write to harvestInfoFile
UnknownID - if jobID is null in the message
ArgumentNotValid - if the status of the job is not valid - must be SUBMITTED
PermissionDenied - if the crawldir can't be created

visit

public void visit(IndexReadyMessage msg)
Description copied from class: HarvesterMessageHandler
This method should be overridden and implemented by a sub class if message handling is wanted.

Specified by:
visit in interface HarvesterMessageVisitor
Overrides:
visit in class HarvesterMessageHandler
Parameters:
msg - a IndexReadyMessage