|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
Interface Summary | |
---|---|
DeduplicateToCDXAdapterInterface | Interface describing a class which can be used to convert duplicate records in a crwal log to wayback-compatible cdx records |
Class Summary | |
---|---|
DeduplicateToCDXAdapter | Class containing methods for turning duplicate entries in a crawl log into lines in a CDX index file. |
ExtractDeduplicateCDXBatchJob | This batch batch job takes deduplication records from a crawl log in a metadata arcfile and converts them to cdx records for use in wayback. |
ExtractWaybackCDXBatchJob | Returns a cdx file using the appropriate format for wayback, including canonicalisation of urls. |
UrlCanonicalizerFactory | A factory for returning a UrlCanonicalizer |
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |