public class HeritrixController extends AbstractRestHeritrixController
AbstractRestHeritrixController.LaunchResultHandler
errorPrinter, files, h3handler, h3launcher, h3wrapper, heritrixBaseDir, outputPrinter
Constructor and Description |
---|
HeritrixController(Heritrix3Files files,
String jobName)
Create a BnfHeritrixController object.
|
Modifier and Type | Method and Description |
---|---|
boolean |
atFinish()
Query whether Heritrix is in a state where it can finish crawling.
|
void |
beginCrawlStop()
Tell Heritrix to stop crawling.
|
void |
cleanup()
Release any resources kept by the class.
|
void |
cleanup(File crawlDir)
Cleanup after an Heritrix3 process.
|
boolean |
crawlIsEnded()
Returns true if the crawl has ended, either because Heritrix finished or because we terminated it.
|
int |
getActiveToeCount()
Get the number of currently active ToeThreads (crawler threads).
|
String |
getAdminInterfaceUrl()
Return the URL for monitoring this instance.
|
CrawlProgressMessage |
getCrawlProgress()
Gets a message that stores the information summarizing the crawl progress.
|
int |
getCurrentProcessedKBPerSec()
Get an estimate of the rate, in kb, at which documents are currently being processed by the crawler.
|
FullFrontierReport |
getFullFrontierReport()
Generates a full frontier report from H3 using an REST call (Groovy script)
|
String |
getHarvestInformation()
Get harvest information.
|
String |
getHeritrixConsoleURL()
Return the URL for monitoring this instance.
|
String |
getProgressStats()
Get a human-readable set of statistics on the progress of the crawl.
|
long |
getQueuedUriCount()
Get the number of URIs currently on the queue to be processed.
|
void |
initialize()
Initialize the JMXconnection to the Heritrix3.
|
boolean |
isPaused()
Returns true if the crawler has been paused, and thus not supposed to fetch anything.
|
void |
requestCrawlStart()
Request that Heritrix start crawling.
|
void |
requestCrawlStop(String reason)
Request that the crawler stops.
|
void |
stopHeritrix()
Stop the heritrix process.
|
getFiles, getGuiPort, getHeritrixAdminName, getHeritrixAdminPassword, getHeritrixFiles, getHostName, getJobDescription, toString
public HeritrixController(Heritrix3Files files, String jobName)
files
- Files that are used to set up Heritrix3.public void initialize()
IOFailure
- If Heritrix3 dies before initialisation, or we encounter any problems during the initialisation.IHeritrixController.initialize()
public void requestCrawlStart()
IHeritrixController
public void requestCrawlStop(String reason)
IHeritrixController
reason
- A human-readable reason the crawl is being stopped.public void stopHeritrix()
IHeritrixController
public String getHeritrixConsoleURL()
public void cleanup(File crawlDir)
crawlDir
- the crawldir to cleanup (argument is currently not used)IHeritrixController.cleanup()
public String getAdminInterfaceUrl()
public CrawlProgressMessage getCrawlProgress()
public FullFrontierReport getFullFrontierReport()
public boolean atFinish()
IHeritrixController
public void beginCrawlStop()
IHeritrixController
public void cleanup()
IHeritrixController
public boolean crawlIsEnded()
IHeritrixController
public int getActiveToeCount()
IHeritrixController
public int getCurrentProcessedKBPerSec()
IHeritrixController
StatisticsTracking.currentProcessedKBPerSec()
public String getHarvestInformation()
IHeritrixController
public String getProgressStats()
IHeritrixController
public long getQueuedUriCount()
IHeritrixController
public boolean isPaused()
IHeritrixController
Copyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.