Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1935

Improve indexing statistics and logging

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • 3.18.0
    • 3.16.1
    • None
    • None
    • SB/KB
    • Confident
    • Hide

      Install netarchivesuite.
      Make a standard selective harvest of the netarkivet.dk domain with schedule one-week.
      Activate harvest
      When the first harvest-job is finished, update the nextdate, so it begins the next harvest-job at once.
      When the last job is finished, go to the indexserver, and notice in the logs, that you now can see how many entries the index contains, and how big the index is in either bytes,Kbytes, Mbytes, Gbytes according to the size of the index.
      look for the log lines starting with:

      Completed combining a dataset with
      
      Show
      Install netarchivesuite. Make a standard selective harvest of the netarkivet.dk domain with schedule one-week. Activate harvest When the first harvest-job is finished, update the nextdate, so it begins the next harvest-job at once. When the last job is finished, go to the indexserver, and notice in the logs, that you now can see how many entries the index contains, and how big the index is in either bytes,Kbytes, Mbytes, Gbytes according to the size of the index. look for the log lines starting with: Completed combining a dataset with

    Description

      The logging around the indexing should at least include the size of the generated indices (# entries), and how many jobs a given index is based on.

      The size of the crawl-logs will not be necessary of itself, as this is indirectly given by the number of entries in the indices.

      Attachments

        Activity

          People

            svc Søren Vejrup Carlsen (Inactive)
            svc Søren Vejrup Carlsen (Inactive)
            Colin Rosenthal Colin Rosenthal
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 8h
                8h
                Remaining:
                Time Spent - 1h Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - 1h Remaining Estimate - 2h Time Not Required
                1h