Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2146

The Index generating batchjob RawMetadataCache fails after having added warc-info record to the metadata file

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 4.0
    • None
    • IndexServer
    • None
    • Hide

      Verified in revision 2593

      Show
      Verified in revision 2593

    Description

      07-01-2013 17:40:45 dk.netarkivet.archive.bitarchive.Bitarchive batch
      INFO: Finished batch job dk.netarkivet.harvester.indexserver.RawMetadataCache$GetMetadataArchiveBatc
      hJob
      with result: 1 failures in processing 1 files at 172.17.0.53_BitApp_2

      KB-TEST-BAR-014 BitarchiveServer BitApp_2 KBN 1
      07-01-2013 17:40:45 dk.netarkivet.harvester.indexserver.RawMetadataCache$GetMetadataArchiveBatchJob
      processRecord
      INFO: null - application/warc-fields

      We know that it worked before in the 3.21 release, and from the quote above, it is evident, that it now tries to look at the warc-info record (which was not there in 3.21) during the extraction of cdx, and crawllogs from the metadata-warc-file.
      We must tell the batchjob to ignore warc-info records.
      We have already done that in other batchjobs used inside netarchivesuite

      Attachments

        Activity

          People

            svc Søren Vejrup Carlsen (Inactive)
            svc Søren Vejrup Carlsen (Inactive)
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: