ArchiveFilenameParser |
|
ArchiveFileNamingFactory |
|
CollectionPrefixNamingConvention |
Implements another way of prefixing archive files in Netarchivesuite.
|
ContentSizeAnnotationPostProcessor |
A post processor that adds an annotation
content-size:
for each successfully harvested URI.
|
DomainnameQueueAssignmentPolicy |
Using the domain as the queue-name.
|
ExtendedDNSFetcher |
Processor to resolve 'dns:' URIs.
|
HeritrixFiles |
This class encapsulates all the files that Heritrix gets from our system, and all files we read from Heritrix.
|
LegacyNamingConvention |
Implements the standard way of prefixing archive files in Netarchivesuite.
|
NasCrawlMetadata |
NetarchiveSuite extension of the org.archive.modules.CrawlMetadata class.
|
NASFetchDNS |
Extended FetchDNS processor which allows the override of hosts
to be used before they are querying through a DNS server.
|
NASSurtPrefixedDecideRule |
Extended SurtPrefixedDecideRule class.
|
NasWARCProcessor |
Custom NAS WARCWriterProcessor addding NetarchiveSuite metadata to the WARCInfo records written
by Heritrix by just extending the org.archive.modules.writer.WARCWriterProcessor;
This was not possible in H1.
|
OnNSDomainsDecideRule |
Class that re-creates the SurtPrefixSet to include only domain names
according to the domain definition of NetarchiveSuite.
|
PersistentJobData |
Class PersistentJobData holds information about an ongoing harvest.
|
PersistentJobData.XmlState |
Helper class for returning the OK-state back to the caller.
|
PrerequisiteIgnoringQuotaEnforcer |
A Heritrix QuotaEnforcer which never enforces quotas on prerequisite uris: dns, robots.txt, and credentials
|
SeedUriDomainnameQueueAssignmentPolicy |
This is a modified version of the DomainnameQueueAssignmentPolicy
where domainname returned is the domainname of the candidateURI
except where the the SeedURI belongs to a different domain.
|