java.lang.Object
- dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController
- - dk.netarkivet.harvester.heritrix3.controller.HeritrixController

All Implemented Interfaces:

IHeritrixController
```
public class HeritrixController
extends AbstractRestHeritrixController
```
This implementation of the HeritrixController interface starts Heritrix3 as a separate process and uses JMX to communicate with it. Each instance executes exactly one process that runs exactly one crawl job.

Nested Class Summary
- Nested classes/interfaces inherited from class dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController
  AbstractRestHeritrixController.LaunchResultHandler

Field Summary
- Fields inherited from class dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController
  errorPrinter, files, h3handler, h3launcher, h3wrapper, heritrixBaseDir, outputPrinter

Constructor Summary

Constructors
Constructor Description

HeritrixController(Heritrix3Files files, java.lang.String jobName)
Create a BnfHeritrixController object.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`boolean`	`atFinish()`	Query whether Heritrix is in a state where it can finish crawling.
`void`	`beginCrawlStop()`	Tell Heritrix to stop crawling.
`void`	`cleanup()`	Release any resources kept by the class.
`void`	`cleanup(java.io.File crawlDir)`	Cleanup after an Heritrix3 process.
`boolean`	`crawlIsEnded()`	Returns true if the crawl has ended, either because Heritrix finished or because we terminated it.
`int`	`getActiveToeCount()`	Get the number of currently active ToeThreads (crawler threads).
`java.lang.String`	`getAdminInterfaceUrl()`	Return the URL for monitoring this instance.
`CrawlProgressMessage`	`getCrawlProgress()`	Gets a message that stores the information summarizing the crawl progress.
`int`	`getCurrentProcessedKBPerSec()`	Get an estimate of the rate, in kb, at which documents are currently being processed by the crawler.
`FullFrontierReport`	`getFullFrontierReport()`	Generates a full frontier report from H3 using an REST call (Groovy script)
`java.lang.String`	`getHarvestInformation()`	Get harvest information.
`java.lang.String`	`getHeritrixConsoleURL()`	Return the URL for monitoring this instance.
`java.lang.String`	`getHeritrixJobConsoleURL()`	Return the URL for monitoring the job of this instance.
`java.lang.String`	`getProgressStats()`	Get a human-readable set of statistics on the progress of the crawl.
`long`	`getQueuedUriCount()`	Get the number of URIs currently on the queue to be processed.
`void`	`initialize()`	Initialize the JMXconnection to the Heritrix3.
`boolean`	`isPaused()`	Returns true if the crawler has been paused, and thus not supposed to fetch anything.
`void`	`requestCrawlStart()`	Request that Heritrix start crawling.
`void`	`requestCrawlStop(java.lang.String reason)`	Request that the crawler stops.
`void`	`stopHeritrix()`	Stop the heritrix process.

Methods inherited from class dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController
getFiles, getGuiPort, getHeritrixAdminName, getHeritrixAdminPassword, getHeritrixFiles, getHostName, getJobDescription, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - HeritrixController
```
public HeritrixController(Heritrix3Files files,
                          java.lang.String jobName)
```
    Create a BnfHeritrixController object.
    
    Parameters:
    
    files - Files that are used to set up Heritrix3.
- Method Detail
  - initialize
```
public void initialize()
```
    Initialize the JMXconnection to the Heritrix3.
    
    Throws:
    
    IOFailure - If Heritrix3 dies before initialisation, or we encounter any problems during the initialisation.
    
    See Also:
    
    IHeritrixController.initialize()
  - requestCrawlStart
```
public void requestCrawlStart()
```
    Description copied from interface: IHeritrixController
    
    Request that Heritrix start crawling. When this method returns, either Heritrix has failed in the early stages, or the crawljob has been successfully created. Actual crawling will commence at some point hereafter.
  - requestCrawlStop
```
public void requestCrawlStop(java.lang.String reason)
```
    Description copied from interface: IHeritrixController
    
    Request that the crawler stops. This makes a call to beginCrawlStop(), unless the crawler is already shutting down. In that case it does nothing.
    
    Parameters:
    
    reason - A human-readable reason the crawl is being stopped.
  - stopHeritrix
```
public void stopHeritrix()
```
    Description copied from interface: IHeritrixController
    
    Stop the heritrix process.
  - getHeritrixConsoleURL
```
public java.lang.String getHeritrixConsoleURL()
```
    Return the URL for monitoring this instance.
    
    Returns:
    
    the URL for monitoring this instance.
  - getHeritrixJobConsoleURL
```
public java.lang.String getHeritrixJobConsoleURL()
```
    Return the URL for monitoring the job of this instance.
    
    Returns:
    
    the URL for monitoring the job of this instance.
  - cleanup
```
public void cleanup(java.io.File crawlDir)
```
    Cleanup after an Heritrix3 process. This entails sending the shutdown command to the Heritrix3 process, and killing it forcefully, if it is still alive after waiting the period of time specified by the CommonSettings.PROCESS_TIMEOUT setting.
    
    Parameters:
    
    crawlDir - the crawldir to cleanup (argument is currently not used)
    
    See Also:
    
    IHeritrixController.cleanup()
  - getAdminInterfaceUrl
```
public java.lang.String getAdminInterfaceUrl()
```
    Return the URL for monitoring this instance.
    
    Returns:
    
    the URL for monitoring this instance.
  - getCrawlProgress
```
public CrawlProgressMessage getCrawlProgress()
```
    Gets a message that stores the information summarizing the crawl progress.
    
    Returns:
    
    a message that stores the information summarizing the crawl progress.
  - getFullFrontierReport
```
public FullFrontierReport getFullFrontierReport()
```
    Generates a full frontier report from H3 using an REST call (Groovy script)
    
    Returns:
    
    a Full frontier report.
  - atFinish
```
public boolean atFinish()
```
    Description copied from interface: IHeritrixController
    
    Query whether Heritrix is in a state where it can finish crawling. Returns true if no uris remain to be harvested, or it has met either the maxbytes limit, the document limit, or the time-limit for the current harvest.
    
    Returns:
    
    True if Heritrix thinks it is time to stop crawling.
  - beginCrawlStop
```
public void beginCrawlStop()
```
    Description copied from interface: IHeritrixController
    
    Tell Heritrix to stop crawling. Heritrix may take a while to actually stop, so you cannot assume that crawling is stopped when this method returns.
  - cleanup
```
public void cleanup()
```
    Description copied from interface: IHeritrixController
    
    Release any resources kept by the class.
  - crawlIsEnded
```
public boolean crawlIsEnded()
```
    Description copied from interface: IHeritrixController
    
    Returns true if the crawl has ended, either because Heritrix finished or because we terminated it.
    
    Returns:
    
    True if the CrawlEnded event has happened in Heritrix, indicating that all crawls have stopped.
  - getActiveToeCount
```
public int getActiveToeCount()
```
    Description copied from interface: IHeritrixController
    
    Get the number of currently active ToeThreads (crawler threads).
    
    Returns:
    
    Number of ToeThreads currently active within Heritrix.
  - getCurrentProcessedKBPerSec
```
public int getCurrentProcessedKBPerSec()
```
    Description copied from interface: IHeritrixController
    
    Get an estimate of the rate, in kb, at which documents are currently being processed by the crawler.
    
    Returns:
    
    Number of KB data downloaded by Heritrix over an undefined interval up to now.
    
    See Also:
    
    org.archive.crawler.framework.StatisticsTracking#currentProcessedKBPerSec()
  - getHarvestInformation
```
public java.lang.String getHarvestInformation()
```
    Description copied from interface: IHeritrixController
    
    Get harvest information. An example of this can be an URL pointing to the GUI of a running Heritrix process.
    
    Returns:
    
    information about the harvest process.
  - getProgressStats
```
public java.lang.String getProgressStats()
```
    Description copied from interface: IHeritrixController
    
    Get a human-readable set of statistics on the progress of the crawl. The statistics is discovered uris, queued uris, downloaded uris, doc/s(avg), KB/s(avg), dl-failures, busy-thread, mem-use-KB, heap-size-KB, congestion, max-depth, avg-depth. If no statistics are available, the string "No statistics available" is returned. Note: this method may disappear in the future.
    
    Returns:
    
    Some ascii-formatted statistics on the progress of the crawl.
  - getQueuedUriCount
```
public long getQueuedUriCount()
```
    Description copied from interface: IHeritrixController
    
    Get the number of URIs currently on the queue to be processed. This number may not be exact and should only be used in informal texts.
    
    Returns:
    
    How many URIs Heritrix have lined up for processing.
  - isPaused
```
public boolean isPaused()
```
    Description copied from interface: IHeritrixController
    
    Returns true if the crawler has been paused, and thus not supposed to fetch anything. Heritrix may still be fetching stuff, as it takes some time for it to go into full pause mode. This method can be used as an indicator that we should not be worried if Heritrix appears to be idle.
    
    Returns:
    
    True if the crawler has been paused, e.g. by using the Heritrix GUI.

Class HeritrixController

Nested Class Summary

Nested classes/interfaces inherited from class dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController

Field Summary

Fields inherited from class dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController

Constructor Summary

Method Summary

Methods inherited from class dk.netarkivet.harvester.heritrix3.controller.AbstractRestHeritrixController

Methods inherited from class java.lang.Object

Constructor Detail

HeritrixController

Method Detail

initialize

requestCrawlStart

requestCrawlStop

stopHeritrix

getHeritrixConsoleURL

getHeritrixJobConsoleURL

cleanup

getAdminInterfaceUrl

getCrawlProgress

getFullFrontierReport

atFinish

beginCrawlStop

cleanup

crawlIsEnded

getActiveToeCount

getCurrentProcessedKBPerSec

getHarvestInformation

getProgressStats

getQueuedUriCount

isPaused