Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2857

Heritrix and NAS version statements in WARC metadata and data files are inconsistent

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • 5.5
    • WARC
    • None
    • BNF

    Description

      In 5.5, there are some inconsistencies in NAS and Heritrix version statements used in WARC metadata and data files:

      For metadata files:
      in the warcinfo record, we have:
      software: NetarchiveSuite/Version: 5.6-SNAPSHOT (https://github.com/netarchivesuite/netarchivesuite/commit/5d5a7efda045707d65ac77c28c2a59fc771bd5b3)/https://sbforge.org/display/NASin warc records:
      WARC-Target-URI: metadata://netarchivesuite.bnf.fr/crawl/setup/crawler-beans.cxml?heritrixVersion=3.3.0-BDB-5.0.x-NAS-1.0-SNAPSHOT&harvestid=72&jobid=26324

      For data files:
      in warcinfo, we have:
      software: Heritrix/3.3.0-BDB-5.0.x-NAS-5.5 http://crawler.archive.org
      #added by NetarchiveSuite Version: 5.6-SNAPSHOT (https://github.com/netarchivesuite/netarchivesuite/commit/5d5a7efda045707d65ac77c28c2a59fc771bd5b3)
       

      In the current version,

      For metadata:
      in warcinfo, we have:
      software: NetarchiveSuite/Version: 5.4.2 (https://github.com/netarchivesuite/netarchivesuite/commit/4ab2e28e0788d0c21e69ab2373312a9bcabdb372)/https://sbforge.org/display/NAS

      in warc records:
      WARC-Target-URI: metadata://netarchivesuite.bnf.fr/crawl/setup/crawler-beans.cxml?heritrixVersion=3.3.0-BDB-5.0.x&harvestid=55&jobid=30321

      For data:
      in warcinfo:
      software: Heritrix/3.3.0-BDB-5.0.x http://crawler.archive.org
      #added by NetarchiveSuite Version: 5.4.2 (https://github.com/netarchivesuite/netarchivesuite/commit/4ab2e28e0788d0c21e69ab2373312a9bcabdb372)

       

      Proposition1: remove NAS versions on Heritrix
      heritrixVersion=3.3.0-BDB-5.0.x-NAS-1.0-SNAPSHOT => heritrixVersion=3.3.0-BDB-5.0.x-NAS-5.6-SNAPSHOT
      Heritrix/3.3.0-BDB-5.0.x-NAS-5.5 => Heritrix/3.3.0-BDB-5.0.x-NAS-5.6

       

      Proposition2: make NAS versions consistent

      -heritrixVersion=3.3.0-BDB-5.0.xNAS-1.0-SNAPSHOT => heritrixVersion=3.3.0-BDB-5.0.x-NAS-5.6-SNAPSHOT
      Heritrix/3.3.0-BDB-5.0.x-NAS-5.5 => Heritrix/3.3.0-BDB-5.0.x-NAS-5.6-SNAPSHOT

      Attachments

        Activity

          People

            Unassigned Unassigned
            sara Sara Aubry
            Wiatrowski Wiatrowski (Inactive)
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: