Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
5.4
-
None
Description
The waybackindexer uses the dk.netarkivet.wayback.indexer.ArchiveFile.index method to index arc and warc-files.
In the case of metadata-files, it currently uses the DeduplicationCDXExtractionBatchJob batchjob to generate deduplicationCDX'es from the duplicate entries in the crawllog.
This will not work for metadata-files with a deduplicationmigration record.
Instead we should fetch the deduplicationmigration and the crawllog from the metadatafile, and
then do the replacement, as we do in the RawMetadataCache.migrateDuplicates method