dk.netarkivet.harvester.harvesting.controller
Class BnfHeritrixLauncher

java.lang.Object
  extended by dk.netarkivet.harvester.harvesting.HeritrixLauncher
      extended by dk.netarkivet.harvester.harvesting.controller.BnfHeritrixLauncher

public class BnfHeritrixLauncher
extends HeritrixLauncher

BnF specific Heritrix launcher, that forces the use of BnfHeritrixController. Every turn of the crawl control loop, asks the Heritrix controller to generate a progress report as a CrawlProgressMessage and then send this message on the JMS bus to be consumed by the HarvestMonitor instance.


Field Summary
(package private) static long FRONTIER_REPORT_GEN_FREQUENCY
          Frequency in seconds for generating the full harvest report.
(package private) static org.apache.commons.logging.Log log
          The class logger.
 
Fields inherited from class dk.netarkivet.harvester.harvesting.HeritrixLauncher
CRAWL_CONTROL_WAIT_PERIOD
 
Method Summary
 void doCrawl()
          Initializes an Heritrix controller, then launches the Heritrix instance.
static BnfHeritrixLauncher getInstance(HeritrixFiles files)
          Get instance of this class.
 
Methods inherited from class dk.netarkivet.harvester.harvesting.HeritrixLauncher
getControllerArguments, getHeritrixFiles, getJMSConnection, isDeduplicationEnabledInTemplate, setupOrderfile
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

static final org.apache.commons.logging.Log log
The class logger.


FRONTIER_REPORT_GEN_FREQUENCY

static final long FRONTIER_REPORT_GEN_FREQUENCY
Frequency in seconds for generating the full harvest report. Also serves as delay before the first generation occurs.

Method Detail

getInstance

public static BnfHeritrixLauncher getInstance(HeritrixFiles files)
                                       throws ArgumentNotValid
Get instance of this class.

Parameters:
files - Object encapsulating location of Heritrix crawldir and configuration files
Returns:
BnfHeritrixLauncher object
Throws:
ArgumentNotValid - If either order.xml or seeds.txt does not exist, or argument files is null.

doCrawl

public void doCrawl()
             throws IOFailure
Initializes an Heritrix controller, then launches the Heritrix instance. Then starts the crawl control loop:
  1. Waits the amount of time configured in HarvesterSettings.CRAWL_LOOP_WAIT_TIME.
  2. Obtains crawl progress information as a CrawlProgressMessage from the Heritrix controller
  3. Sends the progress message via JMS
  4. If the crawl if reported as finished, end loop.

Specified by:
doCrawl in class HeritrixLauncher
Throws:
IOFailure