Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1930

It is difficult to see which data the indexserver cannot retrieve when generating the deduplication index

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 3.18.0, I49
    • 3.16.1
    • IndexServer
    • None
    • Timeboxed
    • Hide

      Install the netarchivesuite
      Go the machine (Machine X)where the IndexserverApplication is deployed.
      Configure the IndexServerApplication to use a localArcRepositoryClient
      Copy a few (10) metadata-arc-files to the storage used by localArcRepositoryClient
      Send a message to the IndexServerApplication asking for a dedup-index for 14 jobs, where only the 10 are available in storage. (Specific java-class are available that does this in the tests part of netarchivesuite trunk (r2177: trunk/tests/SendDedupIndexRequestToIndexserver.java)

      copy SendDedupIndexRequestToIndexserver.java to machine X
      
      echo on machine X 
      export INSTALLDIR=/home/test/????
      

      In file $INSTALLDIR/conf/settings_IndexServerApplication.xml
      insert the following

      <settings><common>
      ...
      <arcrepositoryClient>
                  <class&gt;dk.netarkivet.common.distribute.arcrepository.LocalArcRepositoryClient</class&gt;
                  <fileDir>/home/test/metadataArkiv/</fileDir> 
      </arcrepositoryClient>
      
      ...
      </common>
      </settings>
      
      
      javac -cp $INSTALLDIR/lib/dk.netarkivet.archive.jar SendDedupIndexRequestToIndexserver.java
      

      Call program using settings of viewerproxy, so the correct JMSbroker is used

      export CLASSPATH=$INSTALLDIR/lib/dk.netarkivet.archive.jar:/home/test/
      java -Ddk.netarkivet.settings.file=$INSTALLDIR/conf/settings_ViewerProxyApplication.xml  
      SendDedupIndexRequestToIndexserver <file-with-jobids>
      
      Show
      Install the netarchivesuite Go the machine (Machine X)where the IndexserverApplication is deployed. Configure the IndexServerApplication to use a localArcRepositoryClient Copy a few (10) metadata-arc-files to the storage used by localArcRepositoryClient Send a message to the IndexServerApplication asking for a dedup-index for 14 jobs, where only the 10 are available in storage. (Specific java-class are available that does this in the tests part of netarchivesuite trunk (r2177: trunk/tests/SendDedupIndexRequestToIndexserver.java) copy SendDedupIndexRequestToIndexserver.java to machine X echo on machine X export INSTALLDIR=/home/test/???? In file $INSTALLDIR/conf/settings_IndexServerApplication.xml insert the following <settings><common> ... <arcrepositoryClient> < class& gt;dk.netarkivet.common.distribute.arcrepository.LocalArcRepositoryClient</ class& gt; <fileDir>/home/test/metadataArkiv/</fileDir> </arcrepositoryClient> ... </common> </settings> javac -cp $INSTALLDIR/lib/dk.netarkivet.archive.jar SendDedupIndexRequestToIndexserver.java Call program using settings of viewerproxy, so the correct JMSbroker is used export CLASSPATH=$INSTALLDIR/lib/dk.netarkivet.archive.jar:/home/test/ java -Ddk.netarkivet.settings.file=$INSTALLDIR/conf/settings_ViewerProxyApplication.xml SendDedupIndexRequestToIndexserver <file-with-jobids>

    Description

      The log only states, which data it can see, and not the data it cannot see:

      Aug 16, 2011 11:46:47 AM dk.netarkivet.archive.indexserver.distribute.IndexRequestServer doGenerateIndex
      WARNING: Failed generating index of type 'DEDUP_CRAWL_LOG' for the jobs [127191,127192], only the jobs [127191] are available.
      

      For very big deduplication indices based on thousands of jobs, this makes it difficult
      to see which data it cannot retrieve from the archive.

      This is fixed in 3.17.0, but it seems important enough for me to be included in another patch release of 3.16 branch

      Attachments

        Activity

          People

            svc Søren Vejrup Carlsen (Inactive)
            svc Søren Vejrup Carlsen (Inactive)
            Jonas Lindberg Frellesen Jonas Lindberg Frellesen (Inactive)
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified