Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2546

archivefiles-report.txt missing GMT dates and closing date

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.1
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None
    • Sprint:
      NAS 5.3

      Description

      In 4.X versions, archivefiles-report.txt was compiled by NAS two ways:
      1) extraction of opening and closing GMT dates from heritrix.out
      2) list all W/ARC files from the job directory and add the possibly missing files to the report and the sizes

      In 5.1, the archivefiles-report.txt has an opening but local time date (not GMT) and no closing date, which makes is inconsistent with the header structure:[ARCHIVEFILE] [Opened] [Closed] [Size]

      WARC/1.0
      WARC-Type: resource
      WARC-Record-ID: <urn:uuid:278c51d9-c8ef-4d77-bbe2-ef094318201d>
      WARC-Date: 2016-07-25T11:11:01Z
      Content-Length: 127
      Content-Type: text/plain
      WARC-Block-Digest: sha1:2KUD7DDELLB7MC2GAWPIYQZ5VOR2SNNB
      WARC-IP-Address: 172.20.20.41
      WARC-Target-URI: metadata://netarchivesuite.bnf.fr/crawl/reports/archivefiles-report.txt?heritrixVersion=3.3.0-LBS-2014-03&harvestid=2&jobid=20
      WARC-Warcinfo-ID: <urn:uuid:0af59ceb-7321-4240-b1b8-b1ee869a60f8>

      [ARCHIVEFILE] [Opened] [Closed] [Size]
      BnF-20-2-20160725110956304-00000-menelas.bnf.fr.warc.gz 2016-07-25T13:10:34.000Z 239068
      Absence de la date de fermeture, contenu/entêtes de colonnes décalés, date locale et pas GMT

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                csr Colin Rosenthal
                Reporter:
                sara Sara Aubry
                Inspector:
                .
              • Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: