Class Summary |
ArcFilesReportGenerator |
This class generate a report that lists ARC files
along with the opening date, closing date (if file was properly closed),
and size in bytes. |
ContentSizeAnnotationPostProcessor |
A post processor that adds an annotation
content-size:
for each succesfully harvested URI. |
DomainnameQueueAssignmentPolicy |
Using the domain as the queue-name. |
HarvestController |
This class handles all the things in a single harvest that are not related
directly related either to launching Heritrix or to handling JMS messages. |
HarvestControllerApplication |
This application controls the Heritrix harvester which does the actual
harvesting, and is also responsible for uploading the harvested data to the
ArcRepository. |
HarvestDocumentation |
This class contains code for documenting a harvest. |
HeritrixDomainHarvestReport |
Class responsible for generating a domain harvest report from
crawl logs created by
Heritrix and presenting the relevant information to clients. |
HeritrixFiles |
This class encapsulates all the files that Heritrix gets from our system,
and all files we read from Heritrix. |
HeritrixLauncher |
A HeritrixLauncher object wraps around an instance of the web crawler
Heritrix. |
HeritrixLauncherFactory |
Factory class for instanciating a specific implementation
of HeritrixLauncher . |
IngestableFiles |
Encapsulation of files to be ingested into the archive. |
MetadataFile |
Wraps information for an Heritrix file that should be stored in the metadata
ARC. |
OnNSDomainsDecideRule |
Class that re-creates the SurtPrefixSet to include only domain names
according to the domain definition of NetarchiveSuite. |