|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
Interface Summary | |
---|---|
DeduplicateToCDXAdapterInterface | Interface describing a class which can be used to convert duplicate records in a crawl log to wayback-compatible cdx records. |
Class Summary | |
---|---|
DeduplicateToCDXAdapter | Class containing methods for turning duplicate entries in a crawl log into lines in a CDX index file. |
DeduplicationCDXExtractionBatchJob | This batch batch job takes deduplication records from a crawl log in a metadata arcfile and converts them to cdx records for use in wayback. |
UrlCanonicalizerFactory | A factory for returning a UrlCanonicalizer. |
WaybackCDXExtractionARCBatchJob | Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls. |
WaybackCDXExtractionWARCBatchJob | Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls. |
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |