dk.netarkivet.harvester.harvesting.controller
Class DefaultHeritrixLauncher

java.lang.Object
  extended by dk.netarkivet.harvester.harvesting.HeritrixLauncher
      extended by dk.netarkivet.harvester.harvesting.controller.DefaultHeritrixLauncher

public class DefaultHeritrixLauncher
extends HeritrixLauncher

Default implementation of the crawl control.


Field Summary
(package private)  org.apache.commons.logging.Log log
          The class logger.
 
Fields inherited from class dk.netarkivet.harvester.harvesting.HeritrixLauncher
CRAWL_CONTROL_WAIT_PERIOD
 
Method Summary
 void doCrawl()
          This method launches heritrix in the following way:
1.
static DefaultHeritrixLauncher getInstance(HeritrixFiles files)
          Get instance of this class.
 
Methods inherited from class dk.netarkivet.harvester.harvesting.HeritrixLauncher
getControllerArguments, getHeritrixFiles, getJMSConnection, isDeduplicationEnabledInTemplate, setupOrderfile
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

final org.apache.commons.logging.Log log
The class logger.

Method Detail

getInstance

public static DefaultHeritrixLauncher getInstance(HeritrixFiles files)
                                           throws ArgumentNotValid
Get instance of this class.

Parameters:
files - Object encapsulating location of Heritrix crawldir and configuration files
Returns:
DefaultHeritrixLauncher object
Throws:
ArgumentNotValid - If either order.xml or seeds.txt does not exist, or argument files is null.

doCrawl

public void doCrawl()
             throws IOFailure
This method launches heritrix in the following way:
1. copies the orderfile and the seedsfile to current working directory.
2. sets up the newly created copy of the orderfile
3. starts the crawler
4. stops the crawler (Either when heritrix has finished crawling, or when heritrix is forcefully stopped due to inactivity).

The exit from the while-loop depends on Heritrix calling the crawlEnded() method, when the crawling is finished. This method is called from the HarvestControllerServer.onDoOneCrawl() method.

Specified by:
doCrawl in class HeritrixLauncher
Throws:
IOFailure - - if the order.xml is invalid if unable to initialize Heritrix CrawlController if Heritrix process interrupted