Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2122

Utility to Retry Failed Wayback Indexing

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Major
    • I53, 4.0
    • 3.21
    • Wayback
    • None
    • SB/KB
    • Hide
      1. Start an instance of TEST12
      2. (log into a bitarchive machine (e.g. netarkiv@sb-test-bar-001 ) and find an arcfile. Edit the file in vi by adding a blank line at the start (so that indexing will fail) and copy it into /netarkiv/0002/TEST12/filedir
      3. Use the bitpreservation GUI to copy it over KB)
      4. Use the procedure from TEST3A: Upload a Small Bitarchive to upload a bad arcfile instead.
      5. Now restart the WaybackIndexer application (this is quicker than waiting for its timer thread to come around). You should see "Error indexing file '1-1-20090220082156-00000-kb-test-har-002.kb.dk.arc'"
      6. Restart the app several times. Use "grep unindexed log/WaybackIndexerApplication0.log.0". You should see something like

        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '0' unindexed files to queue (if they are not already queued).
        INFO: Will now add '0' unindexed files to queue (if they are not already queued).

        showing that the indexer has given up on that file.

      7. Now reset the file:

        [test@kb-test-way-001 TEST12]$ java -cp lib/dk.netarkivet.wayback.jar -Ddk.netarkivet.settings.file=conf/settings_WaybackIndexerApplication.xml -Dsettings.common.applicationInstanceId=CSR_TEST dk.netarkivet.wayback.indexer.ResetFailedFiles 1-1-20090220082156-00000-kb-test-har-002.kb.dk.arc

        (Using the actual name of your file, obviously.)

      8. Restart the indexer. You should see it begin to try again:

        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '0' unindexed files to queue (if they are not already queued).
        INFO: Will now add '0' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).
        INFO: Will now add '1' unindexed files to queue (if they are not already queued).

      Show
      Start an instance of TEST12 (log into a bitarchive machine (e.g. netarkiv@sb-test-bar-001 ) and find an arcfile. Edit the file in vi by adding a blank line at the start (so that indexing will fail) and copy it into /netarkiv/0002/TEST12/filedir Use the bitpreservation GUI to copy it over KB) Use the procedure from TEST3A: Upload a Small Bitarchive to upload a bad arcfile instead. Now restart the WaybackIndexer application (this is quicker than waiting for its timer thread to come around). You should see "Error indexing file '1-1-20090220082156-00000-kb-test-har-002.kb.dk.arc'" Restart the app several times. Use "grep unindexed log/WaybackIndexerApplication0.log.0". You should see something like INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '0' unindexed files to queue (if they are not already queued). INFO: Will now add '0' unindexed files to queue (if they are not already queued). showing that the indexer has given up on that file. Now reset the file: [test@kb-test-way-001 TEST12] $ java -cp lib/dk.netarkivet.wayback.jar -Ddk.netarkivet.settings.file=conf/settings_WaybackIndexerApplication.xml -Dsettings.common.applicationInstanceId=CSR_TEST dk.netarkivet.wayback.indexer.ResetFailedFiles 1-1-20090220082156-00000-kb-test-har-002.kb.dk.arc (Using the actual name of your file, obviously.) Restart the indexer. You should see it begin to try again: INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '0' unindexed files to queue (if they are not already queued). INFO: Will now add '0' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued). INFO: Will now add '1' unindexed files to queue (if they are not already queued).

    Description

      In the wayback indexer, if a file reaches maxFailedAttempts then it is never indexed. The task is to create a utility which can reset the counter on selected files so that they will be reattempted.

      Attachments

        Activity

          People

            csr Colin Rosenthal
            csr Colin Rosenthal
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 7h
                7h
                Remaining:
                Remaining Estimate - 7h
                7h
                Logged:
                Time Spent - Not Specified
                Not Specified