Details
-
New Feature
-
Resolution: Fixed
-
Minor
-
5.1
-
None
-
None
-
NAS 5.3
Description
In 4.X versions, archivefiles-report.txt was compiled by NAS two ways:
1) extraction of opening and closing GMT dates from heritrix.out
2) list all W/ARC files from the job directory and add the possibly missing files to the report and the sizes
In 5.1, the archivefiles-report.txt has an opening but local time date (not GMT) and no closing date, which makes is inconsistent with the header structure:[ARCHIVEFILE] [Opened] [Closed] [Size]
WARC/1.0
WARC-Type: resource
WARC-Record-ID: <urn:uuid:278c51d9-c8ef-4d77-bbe2-ef094318201d>
WARC-Date: 2016-07-25T11:11:01Z
Content-Length: 127
Content-Type: text/plain
WARC-Block-Digest: sha1:2KUD7DDELLB7MC2GAWPIYQZ5VOR2SNNB
WARC-IP-Address: 172.20.20.41
WARC-Target-URI: metadata://netarchivesuite.bnf.fr/crawl/reports/archivefiles-report.txt?heritrixVersion=3.3.0-LBS-2014-03&harvestid=2&jobid=20
WARC-Warcinfo-ID: <urn:uuid:0af59ceb-7321-4240-b1b8-b1ee869a60f8>
[ARCHIVEFILE] [Opened] [Closed] [Size]
BnF-20-2-20160725110956304-00000-menelas.bnf.fr.warc.gz 2016-07-25T13:10:34.000Z 239068
Absence de la date de fermeture, contenu/entêtes de colonnes décalés, date locale et pas GMT