Package dk.netarkivet.harvester.harvesting

Interface Summary
HeritrixController This interface encapsulates the direct access to Heritrix, allowing for accessing in various ways (direct class access or JMX).
 

Class Summary
ContentSizeAnnotationPostProcessor A post processor that adds an annotation content-size: for each succesfully harvested URI.
DirectHeritrixController Deprecated. The JMXHeritrixController offers an implementation that's better on almost all counts.
DomainnameQueueAssignmentPolicy Using the domain as the queue-name.
FixedUURI Fixed UURI which extends UURI to fix an NPE bug in getReferencedHost.
HarvestController This class handles all the things in a single harvest that are not related directly related either to launching Heritrix or to handling JMS messages.
HarvestControllerApplication This application controls the Heritrix harvester which does the actual harvesting, and is also responsible for uploading the harvested data to the ArcRepository.
HarvestDocumentation This class contains code for documenting a harvest.
HeritrixDomainHarvestReport Class responsible for generating a domain harvest report from crawl logs created by Heritrix and presenting the relevant information to clients.
HeritrixFiles This class encapsulates all the files that Heritrix gets from our system, and all files we read from Heritrix.
HeritrixLauncher A HeritrixLauncher object wraps around an instance of the web crawler Heritrix.
IngestableFiles Encapsulation of files to be ingested into the archive.
JMXHeritrixController This implementation of the HeritrixController interface starts Heritrix as a separate process and uses JMX to communicate with it.