Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
5.5
-
None
-
BNF
Description
In 5.5, there are some inconsistencies in NAS and Heritrix version statements used in WARC metadata and data files:
For metadata files:
in the warcinfo record, we have:
software: NetarchiveSuite/Version: 5.6-SNAPSHOT (https://github.com/netarchivesuite/netarchivesuite/commit/5d5a7efda045707d65ac77c28c2a59fc771bd5b3)/https://sbforge.org/display/NASin warc records:
WARC-Target-URI: metadata://netarchivesuite.bnf.fr/crawl/setup/crawler-beans.cxml?heritrixVersion=3.3.0-BDB-5.0.x-NAS-1.0-SNAPSHOT&harvestid=72&jobid=26324
For data files:
in warcinfo, we have:
software: Heritrix/3.3.0-BDB-5.0.x-NAS-5.5 http://crawler.archive.org
#added by NetarchiveSuite Version: 5.6-SNAPSHOT (https://github.com/netarchivesuite/netarchivesuite/commit/5d5a7efda045707d65ac77c28c2a59fc771bd5b3)
In the current version,
For metadata:
in warcinfo, we have:
software: NetarchiveSuite/Version: 5.4.2 (https://github.com/netarchivesuite/netarchivesuite/commit/4ab2e28e0788d0c21e69ab2373312a9bcabdb372)/https://sbforge.org/display/NAS
in warc records:
WARC-Target-URI: metadata://netarchivesuite.bnf.fr/crawl/setup/crawler-beans.cxml?heritrixVersion=3.3.0-BDB-5.0.x&harvestid=55&jobid=30321
For data:
in warcinfo:
software: Heritrix/3.3.0-BDB-5.0.x http://crawler.archive.org
#added by NetarchiveSuite Version: 5.4.2 (https://github.com/netarchivesuite/netarchivesuite/commit/4ab2e28e0788d0c21e69ab2373312a9bcabdb372)
Proposition1: remove NAS versions on Heritrix
heritrixVersion=3.3.0-BDB-5.0.x-NAS-1.0-SNAPSHOT => heritrixVersion=3.3.0-BDB-5.0.x-NAS-5.6-SNAPSHOT
Heritrix/3.3.0-BDB-5.0.x-NAS-5.5 => Heritrix/3.3.0-BDB-5.0.x-NAS-5.6
Proposition2: make NAS versions consistent
-heritrixVersion=3.3.0-BDB-5.0.xNAS-1.0-SNAPSHOT => heritrixVersion=3.3.0-BDB-5.0.x-NAS-5.6-SNAPSHOT
Heritrix/3.3.0-BDB-5.0.x-NAS-5.5 => Heritrix/3.3.0-BDB-5.0.x-NAS-5.6-SNAPSHOT