Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2686

Alway create a metadata-warcfile even if Heritrix3 doesn't create any (w)arc files

    XMLWordPrintable

Details

    • NAS 5.4
    • Hide

      To test:

      1. Upload a bad template (e.g. invalid xml)
      2. Create a harvest using the template
      3. Wait for the job to fail
      4. Check that a metadata file is created and uploaded. Browse the file and look for e.g. heritrix output complaining about the bad xml syntax.
      Show
      To test: Upload a bad template (e.g. invalid xml) Create a harvest using the template Wait for the job to fail Check that a metadata file is created and uploaded. Browse the file and look for e.g. heritrix output complaining about the bad xml syntax.

    Description

      You typically receive two mail notifications whenever a harvest fails
      1)

      Host: narcana-webdanica01.statsbiblioteket.dk
      Date: Thu Nov 23 06:51:04 CET 2017
      dk.netarkivet.harvester.heritrix3.PostProcessing.storeFiles(PostProcessing.java:269)
      Probable error in Heritrix job setup. No arcfiles or warcfiles generated by Heritrix for job 1204
      

      2)

      Host: narcana-webdanica01.statsbiblioteket.dk
      Date: Thu Nov 23 06:51:04 CET 2017
      dk.netarkivet.harvester.heritrix3.PostProcessing.doPostProcessing(PostProcessing.java:165)
      Trouble during postprocessing of files in '/opt/webdanica/WEBDANICA/harvester_focused/1204_1511416193560'. Errors accumulated during the postprocessing: Metadata file /opt/webdanica/WEBDANICA/harvester_focused/1204_1511416193560/metadata/1204-metadata-1.warc does not exist
      
      dk.netarkivet.common.exceptions.IllegalState: Metadata file /opt/webdanica/WEBDANICA/harvester_focused/1204_1511416193560/metadata/1204-metadata-1.warc does not exist
              at dk.netarkivet.harvester.heritrix3.IngestableFiles.getMetadataArcFiles(IngestableFiles.java:183)
              at dk.netarkivet.harvester.heritrix3.PostProcessing.storeFiles(PostProcessing.java:281)
              at dk.netarkivet.harvester.heritrix3.PostProcessing.doPostProcessing(PostProcessing.java:159)
              at dk.netarkivet.harvester.heritrix3.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:457)
      

      The problem is that if Heritrix3 doesn't create any (w)arc files, no metadata-warc is created
      And you really should, as the reports are still being written by Heritrix, and they contain valuable information

      Attachments

        Issue Links

          Activity

            People

              svc Søren Vejrup Carlsen (Inactive)
              svc Søren Vejrup Carlsen (Inactive)
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 17m
                  17m