Package dk.netarkivet.wayback.batch
-
Interface Summary Interface Description DeduplicateToCDXAdapterInterface Interface describing a class which can be used to convert duplicate records in a crawl log to wayback-compatible cdx records. -
Class Summary Class Description DeduplicateToCDXAdapter Class containing methods for turning duplicate entries in a crawl log into lines in a CDX index file.DeduplicationCDXExtractionBatchJob This batch batch job takes deduplication records from a crawl log in a metadata arcfile and converts them to cdx records for use in wayback.UrlCanonicalizerFactory A factory for returning a UrlCanonicalizer.WaybackCDXExtractionARCBatchJob Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls.WaybackCDXExtractionWARCBatchJob Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls.