|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdk.netarkivet.harvester.harvesting.DirectHeritrixController
public class DirectHeritrixController
This class encapsulates one full run of Heritrix by grabbing hold of a CrawlController class. It implements the CrawlController interface.
Nested Class Summary | |
---|---|
(package private) class |
DirectHeritrixController.SimpleCrawlStatusListener
Deprecated. Class for handling callbacks from Heritrix. |
Field Summary | |
---|---|
(package private) org.archive.crawler.framework.CrawlController |
myController
Deprecated. the controller object, which initializes, starts, and stops a Heritrix crawl job. |
Constructor Summary | |
---|---|
protected |
DirectHeritrixController(HeritrixFiles files)
Deprecated. Create a new DirectHeritrixController object with a given set of files. |
Method Summary | |
---|---|
void |
addCrawlStatusListener(org.archive.crawler.event.CrawlStatusListener listener)
Deprecated. Add a listener to this crawlController. |
boolean |
atFinish()
Deprecated. Query whether Heritrix is in a state where it can finish crawling. |
void |
beginCrawlStop()
Deprecated. Tell Heritrix to stop crawling. |
void |
cleanup()
Deprecated. Release any resources kept by the class. |
boolean |
crawlIsEnded()
Deprecated. Returns true if the crawl has ended, either because Heritrix finished or because we terminated it. |
int |
getActiveToeCount()
Deprecated. Get the number of currently active ToeThreads (crawler threads). |
int |
getCurrentProcessedKBPerSec()
Deprecated. Get an estimate of the rate, in kb, at which documents are currently being processed by the crawler. |
java.lang.String |
getProgressStats()
Deprecated. Get a human-readable set of statistics on the progress of the crawl. |
long |
getQueuedUriCount()
Deprecated. Get the number of URIs currently on the queue to be processed. |
void |
initialize()
Deprecated. Initialize a new CrawlController for executing a Heritrix crawl. |
boolean |
isPaused()
Deprecated. Returns true if the crawler has been paused, and thus not supposed to fetch anything. |
void |
requestCrawlStart()
Deprecated. Request that Heritrix start crawling. |
void |
requestCrawlStop(java.lang.String reason)
Deprecated. Request that crawling stops. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
org.archive.crawler.framework.CrawlController myController
Constructor Detail |
---|
protected DirectHeritrixController(HeritrixFiles files)
files
- Files for Heritrix to use.Method Detail |
---|
public void initialize()
HeritrixController
initialize
in interface HeritrixController
HeritrixController.initialize()
public void requestCrawlStart()
HeritrixController
requestCrawlStart
in interface HeritrixController
HeritrixController.requestCrawlStart()
public boolean atFinish()
HeritrixController
atFinish
in interface HeritrixController
HeritrixController.atFinish()
public void beginCrawlStop()
HeritrixController
beginCrawlStop
in interface HeritrixController
HeritrixController.beginCrawlStop()
public int getActiveToeCount()
HeritrixController
getActiveToeCount
in interface HeritrixController
HeritrixController.getActiveToeCount()
public void requestCrawlStop(java.lang.String reason)
HeritrixController
requestCrawlStop
in interface HeritrixController
reason
- A human-readable reason the crawl is being stopped.HeritrixController.requestCrawlStop(String)
public void addCrawlStatusListener(org.archive.crawler.event.CrawlStatusListener listener)
listener
- The listener for crawlstatus messages.HeritrixController.crawlIsEnded()
public long getQueuedUriCount()
HeritrixController
getQueuedUriCount
in interface HeritrixController
public int getCurrentProcessedKBPerSec()
HeritrixController
getCurrentProcessedKBPerSec
in interface HeritrixController
HeritrixController.getCurrentProcessedKBPerSec()
public java.lang.String getProgressStats()
HeritrixController
getProgressStats
in interface HeritrixController
HeritrixController.getProgressStats()
public boolean isPaused()
HeritrixController
isPaused
in interface HeritrixController
HeritrixController.isPaused()
public boolean crawlIsEnded()
crawlIsEnded
in interface HeritrixController
public void cleanup()
HeritrixController
cleanup
in interface HeritrixController
HeritrixController.cleanup()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |