Class Summary |
ContentSizeAnnotationPostProcessor |
A post processor that adds an annotation
content-size:
for each succesfully harvested URI. |
DirectHeritrixController |
Deprecated. The JMXHeritrixController offers an implementation that's
better on almost all counts. |
DomainnameQueueAssignmentPolicy |
Using the domain as the queue-name. |
FixedUURI |
Fixed UURI which extends UURI to fix an NPE bug in getReferencedHost. |
HarvestController |
This class handles all the things in a single harvest that are not related
directly related either to launching Heritrix or to handling JMS messages. |
HarvestControllerApplication |
This application controls the Heritrix harvester which does the actual
harvesting, and is also responsible for uploading the harvested data to the
ArcRepository. |
HarvestDocumentation |
This class contains code for documenting a harvest. |
HeritrixDomainHarvestReport |
Class responsible for generating a domain harvest report from
crawl logs created by
Heritrix and presenting the relevant information to clients. |
HeritrixFiles |
This class encapsulates all the files that Heritrix gets from our system,
and all files we read from Heritrix. |
HeritrixLauncher |
A HeritrixLauncher object wraps around an instance of the web crawler
Heritrix. |
IngestableFiles |
Encapsulation of files to be ingested into the archive. |
JMXHeritrixController |
This implementation of the HeritrixController interface starts Heritrix
as a separate process and uses JMX to communicate with it. |