dk.netarkivet.harvester.harvesting.distribute
Class HarvestControllerServer

java.lang.Object
  extended by dk.netarkivet.harvester.distribute.HarvesterMessageHandler
      extended by dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
All Implemented Interfaces:
CleanupIF, HarvesterMessageVisitor, javax.jms.MessageListener

public class HarvestControllerServer
extends HarvesterMessageHandler
implements CleanupIF

This class responds to JMS doOneCrawl messages from the HarvestScheduler and launches a Heritrix crawl with the received job description. The generated ARC files are uploaded to the bitarchives once a harvest job has been completed. During operation CrawlStatus messages are sent to the HarvestSchedulerMonitorServer. When starting harvesting a message is sent with status 'STARTED'. When finished a message is sent with either status 'DONE' or 'FAILED'. Either a 'DONE' or 'FAILED' message with result should ALWAYS be sent if at all possible, but only ever one such message per job. It is necessary to be able to run the Heritrix harvester on several machines and several processes on each machine. Each instance of Heritrix is started and monitored by a HarvestControllerServer. If the VM is stopped during a harvest it will, on restart, read all directories under serverdir looking for harvestinfo files. If any are found, they are parsed for information, and all remaining files are attempted uploaded to the bitarchive. It will then send a crawlstatusmessage with failed. A new thread is started for each actual crawl, in which the JMS listener is removed. Threading is required since JMS will not let the called thread remove the listener that's being handled. If we fail to start the crawl, the listener is readded, otherwise the VM is shutdown and restarted by the SideKickApplication.


Field Summary
(package private) static int WAIT_FOR_HOSTS_REPORT_TIMEOUT_SECS
          The max time to wait for the hosts-report.txt to be available (in secs).
 
Method Summary
 void cleanup()
          Will be called on shutdown.
 void close()
          Release all jms connections.
static HarvestControllerServer getInstance()
          Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController, uploads arc-files from unfinshed harvests, and starts to listen to JMS messages on the incoming jms queues.
 void visit(DoOneCrawlMessage msg)
          Receives a DoOneCrawlMessage and call onDoOneCrawl.
 
Methods inherited from class dk.netarkivet.harvester.distribute.HarvesterMessageHandler
onMessage, visit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

WAIT_FOR_HOSTS_REPORT_TIMEOUT_SECS

static final int WAIT_FOR_HOSTS_REPORT_TIMEOUT_SECS
The max time to wait for the hosts-report.txt to be available (in secs).

See Also:
Constant Field Values
Method Detail

getInstance

public static HarvestControllerServer getInstance()
                                           throws IOFailure
Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController, uploads arc-files from unfinshed harvests, and starts to listen to JMS messages on the incoming jms queues.

Returns:
The instance
Throws:
PermissionDenied - If the serverdir or oldjobsdir can't be created
IOFailure - if data from old harvests exist, but contain illegal data

close

public void close()
Release all jms connections. Close the Controller


cleanup

public void cleanup()
Will be called on shutdown.

Specified by:
cleanup in interface CleanupIF
See Also:
CleanupIF.cleanup()

visit

public void visit(DoOneCrawlMessage msg)
           throws IOFailure,
                  UnknownID,
                  ArgumentNotValid,
                  PermissionDenied
Receives a DoOneCrawlMessage and call onDoOneCrawl.

Specified by:
visit in interface HarvesterMessageVisitor
Overrides:
visit in class HarvesterMessageHandler
Parameters:
msg - the message received
Throws:
IOFailure - if the crawl fails if unable to write to harvestInfoFile
UnknownID - if jobID is null in the message
ArgumentNotValid - if the status of the job is not valid - must be SUBMITTED
PermissionDenied - if the crawldir can't be created