Package dk.netarkivet.harvester.harvesting

This module handles defining, scheduling, and execution of harvests.

See:
          Description

Interface Summary
ArchiveFileNaming Interface for a class that implement archiveFileNaming.
JobInfo Interface for selecting partial job information necessary for constructing HeritrixFiles.
 

Class Summary
ArchiveFilenameParser  
ArchiveFileNamingFactory Factory class for instantiating a specific implementation of ArchiveFileNaming.
ArchiveFilesReportGenerator This class generate a report that lists ARC/WARC files (depending on the configured archive format) along with the opening date, closing date (if file was properly closed), and size in bytes.
ArchiveFilesReportGenerator.ArchiveFileStatus Stores the opening date, closing date and size of an ARC file.
CollectionPrefixNamingConvention Implements another way of prefixing archive files in Netarchivesuite.
ContentSizeAnnotationPostProcessor A post processor that adds an annotation content-size: for each successfully harvested URI.
DomainnameQueueAssignmentPolicy Using the domain as the queue-name.
HarvestController This class handles all the things in a single harvest that are not related directly related either to launching Heritrix or to handling JMS messages.
HarvestControllerApplication This application controls the Heritrix harvester which does the actual harvesting, and is also responsible for uploading the harvested data to the ArcRepository.
HarvestDocumentation This class contains code for documenting a harvest.
HeritrixFiles This class encapsulates all the files that Heritrix gets from our system, and all files we read from Heritrix.
HeritrixLauncher A HeritrixLauncher object wraps around an instance of the web crawler Heritrix.
HeritrixLauncherFactory Factory class for instantiating a specific implementation of HeritrixLauncher.
IngestableFiles Encapsulation of files to be ingested into the archive.
LegacyNamingConvention Implements the standard way of prefixing archive files in Netarchivesuite.
OnNSDomainsDecideRule Class that re-creates the SurtPrefixSet to include only domain names according to the domain definition of NetarchiveSuite.
SeedUriDomainnameQueueAssignmentPolicy This is a modified version of the DomainnameQueueAssignmentPolicy where domainname returned is the domainname of the candidateURI except where the domainname of the SeedURI is a different one.
WARCWriterProcessor WARCWriterProcessor.
 

Package dk.netarkivet.harvester.harvesting Description

This module handles defining, scheduling, and execution of harvests.