Package dk.netarkivet.wayback.batch

Interface Summary
DeduplicateToCDXAdapterInterface Interface describing a class which can be used to convert duplicate records in a crwal log to wayback-compatible cdx records
 

Class Summary
DeduplicateToCDXAdapter Class containing methods for turning duplicate entries in a crawl log into lines in a CDX index file.
ExtractDeduplicateCDXBatchJob This batch batch job takes deduplication records from a crawl log in a metadata arcfile and converts them to cdx records for use in wayback.
ExtractWaybackCDXBatchJob Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls.
UrlCanonicalizerFactory A factory for returning a UrlCanonicalizer