|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object dk.netarkivet.harvester.harvesting.controller.AbstractJMXHeritrixController dk.netarkivet.harvester.harvesting.controller.BnfHeritrixController
public class BnfHeritrixController
This implementation of the HeritrixController interface starts Heritrix as a separate process and uses JMX to communicate with it. Each instance executes exactly one process that runs exactly one crawl job.
Constructor Summary | |
---|---|
BnfHeritrixController(HeritrixFiles files)
Create a BnfHeritrixController object. |
Method Summary | |
---|---|
boolean |
atFinish()
Query whether Heritrix is in a state where it can finish crawling. |
void |
beginCrawlStop()
Tell Heritrix to stop crawling. |
void |
cleanup()
Release any resources kept by the class. |
void |
cleanup(java.io.File crawlDir)
Cleanup after an Heritrix process. |
boolean |
crawlIsEnded()
Returns true if the crawl has ended, either because Heritrix finished or because we terminated it. |
int |
getActiveToeCount()
Get the number of currently active ToeThreads (crawler threads). |
java.lang.String |
getAdminInterfaceUrl()
Return the URL for monitoring this instance. |
CrawlProgressMessage |
getCrawlProgress()
Gets a message that stores the information summarizing the crawl progress. |
int |
getCurrentProcessedKBPerSec()
Get an estimate of the rate, in kb, at which documents are currently being processed by the crawler. |
FullFrontierReport |
getFullFrontierReport()
Generates a full frontier report. |
java.lang.String |
getHarvestInformation()
Get harvest information. |
java.lang.String |
getHeritrixConsoleURL()
Return the URL for monitoring this instance. |
java.lang.String |
getProgressStats()
Get a human-readable set of statistics on the progress of the crawl. |
long |
getQueuedUriCount()
Get the number of URIs currently on the queue to be processed. |
void |
initialize()
Initialize a new CrawlController for executing a Heritrix crawl. |
boolean |
isPaused()
Returns true if the crawler has been paused, and thus not supposed to fetch anything. |
void |
requestCrawlStart()
Request that Heritrix start crawling. |
void |
requestCrawlStop(java.lang.String reason)
Request that crawling stops. |
Methods inherited from class dk.netarkivet.harvester.harvesting.controller.AbstractJMXHeritrixController |
---|
getGuiPort, getHeritrixFiles, getHostName, getJmxPort, getJobDescription, processHasExited, toString, waitForHeritrixProcessExit |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public BnfHeritrixController(HeritrixFiles files)
files
- Files that are used to set up Heritrix.Method Detail |
---|
public void initialize()
HeritrixController
IOFailure
- If Heritrix dies before initialization, or we encounter any
problems during the initialization.HeritrixController.initialize()
public void requestCrawlStart()
HeritrixController
IOFailure
- if unable to communicate with HeritrixHeritrixController.requestCrawlStart()
public void requestCrawlStop(java.lang.String reason)
HeritrixController
reason
- A human-readable reason the crawl is being stopped.HeritrixController.requestCrawlStop(String)
public java.lang.String getHeritrixConsoleURL()
public void cleanup(java.io.File crawlDir)
HeritrixController.cleanup()
public java.lang.String getAdminInterfaceUrl()
public CrawlProgressMessage getCrawlProgress()
public FullFrontierReport getFullFrontierReport()
public boolean atFinish()
HeritrixController
public void beginCrawlStop()
HeritrixController
public void cleanup()
HeritrixController
public boolean crawlIsEnded()
HeritrixController
public int getActiveToeCount()
HeritrixController
public int getCurrentProcessedKBPerSec()
HeritrixController
StatisticsTracking.currentProcessedKBPerSec()
public java.lang.String getHarvestInformation()
HeritrixController
public java.lang.String getProgressStats()
HeritrixController
public long getQueuedUriCount()
HeritrixController
public boolean isPaused()
HeritrixController
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |