public class BnfHeritrixController extends AbstractJMXHeritrixController
Constructor and Description |
---|
BnfHeritrixController(HeritrixFiles files)
Create a BnfHeritrixController object.
|
Modifier and Type | Method and Description |
---|---|
boolean |
atFinish()
Query whether Heritrix is in a state where it can finish crawling.
|
void |
beginCrawlStop()
Tell Heritrix to stop crawling.
|
void |
cleanup()
Release any resources kept by the class.
|
void |
cleanup(File crawlDir)
Cleanup after an Heritrix process.
|
boolean |
crawlIsEnded()
Returns true if the crawl has ended, either because Heritrix finished or because we terminated it.
|
int |
getActiveToeCount()
Get the number of currently active ToeThreads (crawler threads).
|
String |
getAdminInterfaceUrl()
Return the URL for monitoring this instance.
|
CrawlProgressMessage |
getCrawlProgress()
Gets a message that stores the information summarizing the crawl progress.
|
int |
getCurrentProcessedKBPerSec()
Get an estimate of the rate, in kb, at which documents are currently being processed by the crawler.
|
FullFrontierReport |
getFullFrontierReport()
Generates a full frontier report.
|
String |
getHarvestInformation()
Get harvest information.
|
String |
getHeritrixConsoleURL()
Return the URL for monitoring this instance.
|
String |
getProgressStats()
Get a human-readable set of statistics on the progress of the crawl.
|
long |
getQueuedUriCount()
Get the number of URIs currently on the queue to be processed.
|
void |
initialize()
Initialize the JMXconnection to the Heritrix.
|
boolean |
isPaused()
Returns true if the crawler has been paused, and thus not supposed to fetch anything.
|
void |
requestCrawlStart()
Request that Heritrix start crawling.
|
void |
requestCrawlStop(String reason)
Request that crawling stops.
|
getFiles, getGuiPort, getHeritrixFiles, getHostName, getJmxPort, getJobDescription, processHasExited, toString, waitForHeritrixProcessExit
public BnfHeritrixController(HeritrixFiles files)
files
- Files that are used to set up Heritrix.public void initialize()
IOFailure
- If Heritrix dies before initialization, or we encounter any problems during the initialization.HeritrixController.initialize()
public void requestCrawlStart()
HeritrixController
public void requestCrawlStop(String reason)
HeritrixController
reason
- A human-readable reason the crawl is being stopped.public String getHeritrixConsoleURL()
public void cleanup(File crawlDir)
crawlDir
- the crawldir to cleanupHeritrixController.cleanup()
public String getAdminInterfaceUrl()
public CrawlProgressMessage getCrawlProgress()
public FullFrontierReport getFullFrontierReport()
public boolean atFinish()
HeritrixController
public void beginCrawlStop()
HeritrixController
public void cleanup()
HeritrixController
public boolean crawlIsEnded()
HeritrixController
public int getActiveToeCount()
HeritrixController
public int getCurrentProcessedKBPerSec()
HeritrixController
StatisticsTracking.currentProcessedKBPerSec()
public String getHarvestInformation()
HeritrixController
public String getProgressStats()
HeritrixController
public long getQueuedUriCount()
HeritrixController
public boolean isPaused()
HeritrixController
Copyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.