dk.netarkivet.harvester.harvesting
Class HeritrixLauncher

java.lang.Object
  extended by dk.netarkivet.harvester.harvesting.HeritrixLauncher
Direct Known Subclasses:
BnfHeritrixLauncher, DefaultHeritrixLauncher

public abstract class HeritrixLauncher
extends java.lang.Object

A HeritrixLauncher object wraps around an instance of the web crawler Heritrix. The object is constructed with the necessary information to do a crawl. The crawl is performed when doOneCrawl() is called. doOneCrawl() monitors progress and returns when the crawl is finished or must be stopped because it has stalled.


Field Summary
protected static int CRAWL_CONTROL_WAIT_PERIOD
          The period to wait in seconds before checking if Heritrix has done anything.
(package private)  org.apache.commons.logging.Log log
          The class logger.
 
Constructor Summary
protected HeritrixLauncher(HeritrixFiles files)
          Private HeritrixLauncher constructor.
  HeritrixLauncher(java.lang.Object... args)
          Generic constructor to allow HeritrixLauncher to use any implementation of HeritrixController.
 
Method Summary
abstract  void doCrawl()
          Launches the crawl and monitors its progress.
protected  java.lang.Object[] getControllerArguments()
           
protected  HeritrixFiles getHeritrixFiles()
           
 void setupOrderfile(HeritrixFiles files)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CRAWL_CONTROL_WAIT_PERIOD

protected static final int CRAWL_CONTROL_WAIT_PERIOD
The period to wait in seconds before checking if Heritrix has done anything.


log

final org.apache.commons.logging.Log log
The class logger.

Constructor Detail

HeritrixLauncher

protected HeritrixLauncher(HeritrixFiles files)
                    throws ArgumentNotValid
Private HeritrixLauncher constructor. Sets up the HeritrixLauncher from the given order file and seedsfile.

Parameters:
files - Object encapsulating location of Heritrix crawldir and configuration files.
Throws:
ArgumentNotValid - If either seedsfile or orderfile does not exist.

HeritrixLauncher

public HeritrixLauncher(java.lang.Object... args)
Generic constructor to allow HeritrixLauncher to use any implementation of HeritrixController.

Parameters:
args - the arguments to be passed to the constructor or non-static factory method of the HeritrixController class specified in settings
Method Detail

doCrawl

public abstract void doCrawl()
                      throws IOFailure
Launches the crawl and monitors its progress.

Throws:
IOFailure

getHeritrixFiles

protected HeritrixFiles getHeritrixFiles()
Returns:
an instance of the wrapper class for Heritrix files.

getControllerArguments

protected java.lang.Object[] getControllerArguments()
Returns:
the optional arguments used to initialize the chosen Heritrix controller implementation.

setupOrderfile

public void setupOrderfile(HeritrixFiles files)