Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2687

Incomplete lines in the duplicationMigration are not caught in RawMetadataCache.migrateDuplicates()

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.2.2, 5.3.1
    • Fix Version/s: 5.4, 5.2.3
    • Component/s: IndexServer
    • Labels:
      None
    • Verification:
      Hide

      The test, presumably is to go into the archive, deliberately mess up a crawl log line, generate an index, and check the log. I think this is a lot of work for such a very localised change, and I'm not sure it is worth it on a cost/benefit/risk basis.

      Show
      The test, presumably is to go into the archive, deliberately mess up a crawl log line, generate an index, and check the log. I think this is a lot of work for such a very localised change, and I'm not sure it is worth it on a cost/benefit/risk basis.

      Description

      In the compression project, we have managed to write incomplete lines to the duplicationMigration record in the metadata file.

      This causes a ArrayIndexOutOfBoundsException:

      15:46:53.179 INFO  d.n.h.indexserver.RawMetadataCache - 214466 migration records found for job 250990
      15:46:53.626 WARN  d.n.h.i.d.IndexRequestServer - Unable to generate index for jobs [250990]
      java.lang.ArrayIndexOutOfBoundsException: 2
              at dk.netarkivet.harvester.indexserver.RawMetadataCache.migrateDuplicates(RawMetadataCache.java:208) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.RawMetadataCache.cacheData(RawMetadataCache.java:142) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.RawMetadataCache.cacheData(RawMetadataCache.java:57) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.FileBasedCache.cache(FileBasedCache.java:146) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.FileBasedCache.get(FileBasedCache.java:174) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache.prepareCombine(CombiningMultiFileBasedCache.java:88) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.CrawlLogIndexCache.prepareCombine(CrawlLogIndexCache.java:106) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache.cacheData(CombiningMultiFileBasedCache.java:69) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache.cacheData(CombiningMultiFileBasedCache.java:43) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.FileBasedCache.cache(FileBasedCache.java:146) ~[harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer.doProcessIndexRequestMessage(IndexRequestServer.java:336) [harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer.access$000(IndexRequestServer.java:76) [harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
              at dk.netarkivet.harvester.indexserver.distribute.IndexRequestServer$2.run(IndexRequestServer.java:238) [harvester-core-5.3.2-RC1.jar:20fc2e1cb5158341eee04029fe1920934ed38048]
      15:46:53.627 INFO  d.n.h.i.d.IndexRequestServer - Sending failed reply for IndexRequestMessage back to sender '[Queue 'PROD_COMMON_THIS_INDEX_CLIENT_130_226_228_70_VP_8094']'.
      

        Attachments

          Activity

            People

            • Assignee:
              svc Søren Vejrup Carlsen
              Reporter:
              svc Søren Vejrup Carlsen
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 1m
                1m