[NAS-2687] Incomplete lines in the duplicationMigration are not caught in RawMetadataCache.migrateDuplicates() Created: 18/Dec/17  Updated: 24/Apr/18  Resolved: 20/Feb/18

Status: Closed
Project: NetarchiveSuite
Component/s: IndexServer
Affects Version/s: 5.2.2, 5.3.1
Fix Version/s: 5.2.3, 5.4

Type: Bug Priority: Minor
Reporter: Søren Vejrup Carlsen (Inactive) Assignee: Søren Vejrup Carlsen (Inactive)
Resolution: Fixed  
Labels: None
Remaining Estimate: Not Specified
Time Spent: 1m
Original Estimate: Not Specified

External reference:

https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1446

Sprint: NAS 5.4
Verification:

The test, presumably is to go into the archive, deliberately mess up a crawl log line, generate an index, and check the log. I think this is a lot of work for such a very localised change, and I'm not sure it is worth it on a cost/benefit/risk basis.


 Description   

In the compression project, we have managed to write incomplete lines to the duplicationMigration record in the metadata file.

This causes a ArrayIndexOutOfBoundsException:

15:46:53.179 INFO  d.n.h.indexserver.RawMetadataCache - 214466 migration records found for job 250990
15:46:53.626 WARN  d.n.h.i.d.IndexRequestServer - Unable to generate index for jobs [250990]
java.lang.ArrayIndexOutOfBoundsException: 2
        at dk.netarkivet.harvester.indexserver.RawMetadataCache.migrateDuplicates(RawMetadataCache.java:208) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.RawMetadataCache.cacheData(RawMetadataCache.java:142) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.RawMetadataCache.cacheData(RawMetadataCache.java:57) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.FileBasedCache.cache(FileBasedCache.java:146) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.FileBasedCache.get(FileBasedCache.java:174) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache.prepareCombine(CombiningMultiFileBasedCache.java:88) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.CrawlLogIndexCache.prepareCombine(CrawlLogIndexCache.java:106) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache.cacheData(CombiningMultiFileBasedCache.java:69) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache.cacheData(CombiningMultiFileBasedCache.java:43) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.FileBasedCache.cache(FileBasedCache.java:146) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer.doProcessIndexRequestMessage(IndexRequestServer.java:336) [harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer.access$000(IndexRequestServer.java:76) [harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
        at dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer$2.run(IndexRequestServer.java:238) [harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
15:46:53.627 INFO  d.n.h.i.d.IndexRequestServer - Sending failed reply for IndexRequestMessage back to sender '[Queue 'PROD_COMMON_THIS_INDEX_CLIENT_130_226_228_70_VP_8094']'.


 Comments   
Comment by Søren Vejrup Carlsen (Inactive) [ 09/Jan/18 ]

Patch is found here: https://sbforge.org/fisheye/changelog/NetarchiveSuite-Github?cs=7893086204b19d58cf2eaf66fb83f1be116f42a0

Generated at Sat Apr 20 04:30:05 CEST 2024 using Jira 9.4.15#940015-sha1:bdaa9cbecfb6791ea579749728cab771f0dfe90b.