java.lang.Object
- dk.netarkivet.harvester.distribute.HarvesterMessageHandler
- - dk.netarkivet.harvester.heritrix3.HarvestControllerServer

All Implemented Interfaces:

CleanupIF, HarvesterMessageVisitor, javax.jms.MessageListener
```
public class HarvestControllerServer
extends HarvesterMessageHandler
implements CleanupIF
```
This class responds to JMS doOneCrawl messages from the HarvestScheduler and launches a Heritrix crawl with the received job description. The generated ARC files are uploaded to the bitarchives once a harvest job has been completed. Initially, the HarvestControllerServer registers its channel with the Scheduler by sending a HarvesterRegistrationRequest and waits for a positive HarvesterRegistrationResponse that its channel is recognized. If not recognized by the Scheduler, the HarvestControllerServer will send a notification about this, and then close down the application.
During its operation CrawlStatus messages are sent to the HarvestSchedulerMonitorServer. When starting the actual harvesting a message is sent with status 'STARTED'. When the harvesting has finished a message is sent with either status 'DONE' or 'FAILED'. Either a 'DONE' or 'FAILED' message with result should ALWAYS be sent if at all possible, but only ever one such message per job. While the harvestControllerServer is waiting for the harvesting to finish, it sends HarvesterReadyMessages to the scheduler. The interval between each HarvesterReadyMessage being sent is defined by the setting 'settings.harvester.harvesting.sendReadyDelay'.
It is necessary to be able to run the Heritrix harvester on several machines and several processes on each machine. Each instance of Heritrix is started and monitored by a HarvestControllerServer.
Initially, all directories under serverdir are scanned for harvestinfo files. If any are found, they are parsed for information, and all remaining files are attempted uploaded to the bitarchive. It will then send back a CrawlStatusMessage with status failed.
A new thread is started for each actual crawl, in which the JMS listener is removed. Threading is required since JMS will not let the called thread remove the listener that's being handled.
After a harvestjob has been terminated, either successfully or unsuccessfully, the serverdir is again scanned for harvestInfo files to attempt upload of files not yet uploaded. Then it begins to listen again after new jobs, if there is enough room available on the machine. If not, it logs a warning about this, which is also sent as a notification.

Field Summary

Fields
Modifier and Type Field Description

static ChannelID HARVEST_CHAN_VALID_RESP_ID
The JMS channel on which to listen for HarvesterRegistrationResponses.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`cleanup()`	Will be called on shutdown.
`void`	`close()`	Release all jms connections.
`static HarvestControllerServer`	`getInstance()`	Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController, uploads arc-files from unfinished harvests, and starts to listen to JMS messages on the incoming jms queues.
`void`	`sendErrorMessage(long jobID, String message, String detailedMessage)`	Sends a CrawlStatusMessage for a failed job with the given short message and detailed message.
`void`	`visit(DoOneCrawlMessage msg)`	Checks that we're available to do a crawl, and if so, marks us as unavailable, checks that the job message is well-formed, and starts the thread that the crawl happens in.
`void`	`visit(HarvesterRegistrationResponse msg)`	This method should be overridden and implemented by a sub class if message handling is wanted.

Methods inherited from class dk.netarkivet.harvester.distribute.HarvesterMessageHandler
onMessage, visit, visit, visit, visit, visit, visit, visit, visit

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - HARVEST_CHAN_VALID_RESP_ID
```
public static final ChannelID HARVEST_CHAN_VALID_RESP_ID
```
    The JMS channel on which to listen for HarvesterRegistrationResponses.
- Method Detail
  - getInstance
```
public static HarvestControllerServer getInstance()
                                           throws IOFailure
```
    Returns or creates the unique instance of this singleton The server creates an instance of the HarvestController, uploads arc-files from unfinished harvests, and starts to listen to JMS messages on the incoming jms queues.
    
    Returns:
    
    The instance
    
    Throws:
    
    PermissionDenied - If the serverdir or oldjobsdir can't be created
    
    IOFailure - if data from old harvests exist, but contain illegal data
  - close
```
public void close()
```
    Release all jms connections. Close the Controller
  - cleanup
```
public void cleanup()
```
    Will be called on shutdown.
    
    Specified by:
    
    cleanup in interface CleanupIF
    
    See Also:
    
    CleanupIF.cleanup()
  - visit
```
public void visit(HarvesterRegistrationResponse msg)
```
    Description copied from class: HarvesterMessageHandler
    
    This method should be overridden and implemented by a sub class if message handling is wanted.
    
    Specified by:
    
    visit in interface HarvesterMessageVisitor
    
    Overrides:
    
    visit in class HarvesterMessageHandler
    
    Parameters:
    
    msg - a HarvesterRegistrationResponse
  - visit
```
public void visit(DoOneCrawlMessage msg)
           throws IOFailure,
                  UnknownID,
                  ArgumentNotValid,
                  PermissionDenied
```
    Checks that we're available to do a crawl, and if so, marks us as unavailable, checks that the job message is well-formed, and starts the thread that the crawl happens in. If an error occurs starting the crawl, we will start listening for messages again.
    The sequence of actions involved in a crawl are:
    1. If we are already running, resend the job to the queue and return
    2. Check the job for validity
    3. Send a CrawlStatus message that crawl has STARTED
    In a separate thread:
    4. Unregister this HACO as listener
    5. Create a new crawldir (based on the JobID and a timestamp)
    6. Write a harvestInfoFile (using JobID and crawldir) and metadata
    7. Instantiate a new HeritrixLauncher
    8. Start a crawl
    9. Store the generated arc-files and metadata in the known bit-archives
    10. _Always_ send CrawlStatus DONE or FAILED
    11. Move crawldir into oldJobs dir
    
    Specified by:
    
    visit in interface HarvesterMessageVisitor
    
    Overrides:
    
    visit in class HarvesterMessageHandler
    
    Parameters:
    
    msg - The crawl job
    
    Throws:
    
    IOFailure - On trouble harvesting, uploading or processing harvestInfo
    
    UnknownID - if jobID is null in the message
    
    ArgumentNotValid - if the status of the job is not valid - must be SUBMITTED
    
    PermissionDenied - if the crawldir can't be created
    
    See Also:
    
    for more details
  - sendErrorMessage
```
public void sendErrorMessage(long jobID,
                             String message,
                             String detailedMessage)
```
    Sends a CrawlStatusMessage for a failed job with the given short message and detailed message.
    
    Parameters:
    
    jobID - ID of the job that failed
    
    message - A short message indicating what went wrong
    
    detailedMessage - A more detailed message detailing why it went wrong.

Class HarvestControllerServer

Field Summary

Method Summary

Methods inherited from class dk.netarkivet.harvester.distribute.HarvesterMessageHandler

Methods inherited from class java.lang.Object

Field Detail

HARVEST_CHAN_VALID_RESP_ID

Method Detail

getInstance

close

cleanup

visit

visit

sendErrorMessage