Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1891

Optimize how the deduplication indexes are generated

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Major
    • 3.18.0, I49
    • 3.14.0, 3.16.1
    • None
    • None
    • SB/KB
    • Confident
    • Hide

      Install NetarchiveSuite on a single machine.
      Get 100-200 metadata-arc files from your production environment
      Store them on the machine where the IndexserverApplication is installed.
      Change the settings of the Indexserverapplication to use the LocalArcrepositoryClient
      Start the IndexserverApplication. Ignore the other apps.
      Send a message to the Indexserver asking for a dedup index for the 100-200 jobs
      This shouldn't be sent back to the sender

      Show
      Install NetarchiveSuite on a single machine. Get 100-200 metadata-arc files from your production environment Store them on the machine where the IndexserverApplication is installed. Change the settings of the Indexserverapplication to use the LocalArcrepositoryClient Start the IndexserverApplication. Ignore the other apps. Send a message to the Indexserver asking for a dedup index for the 100-200 jobs This shouldn't be sent back to the sender

    Description

      The generation of the deduplication indices is very timeconsuming, and inefficient.
      One should maybe to parallelize some of the processing done in the dk.netarkivet.archive.indexserver-CrawlLogIndexCache

      Attachments

        Activity

          People

            svc Søren Vejrup Carlsen (Inactive)
            svc Søren Vejrup Carlsen (Inactive)
            Colin Rosenthal Colin Rosenthal
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 35h
                35h
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 13h Time Not Required
                13h