Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2686

Alway create a metadata-warcfile even if Heritrix3 doesn't create any (w)arc files

    Details

    • Type: New Feature
    • Status: Ready for release test
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.3, 5.2.2, 5.3.1
    • Fix Version/s: 5.4, 5.2.3
    • Labels:
      None
    • Sprint:
      NAS 5.4

      Description

      You typically receive two mail notifications whenever a harvest fails
      1)

      Host: narcana-webdanica01.statsbiblioteket.dk
      Date: Thu Nov 23 06:51:04 CET 2017
      dk.netarkivet.harvester.heritrix3.PostProcessing.storeFiles(PostProcessing.java:269)
      Probable error in Heritrix job setup. No arcfiles or warcfiles generated by Heritrix for job 1204
      

      2)

      Host: narcana-webdanica01.statsbiblioteket.dk
      Date: Thu Nov 23 06:51:04 CET 2017
      dk.netarkivet.harvester.heritrix3.PostProcessing.doPostProcessing(PostProcessing.java:165)
      Trouble during postprocessing of files in '/opt/webdanica/WEBDANICA/harvester_focused/1204_1511416193560'. Errors accumulated during the postprocessing: Metadata file /opt/webdanica/WEBDANICA/harvester_focused/1204_1511416193560/metadata/1204-metadata-1.warc does not exist
      
      dk.netarkivet.common.exceptions.IllegalState: Metadata file /opt/webdanica/WEBDANICA/harvester_focused/1204_1511416193560/metadata/1204-metadata-1.warc does not exist
              at dk.netarkivet.harvester.heritrix3.IngestableFiles.getMetadataArcFiles(IngestableFiles.java:183)
              at dk.netarkivet.harvester.heritrix3.PostProcessing.storeFiles(PostProcessing.java:281)
              at dk.netarkivet.harvester.heritrix3.PostProcessing.doPostProcessing(PostProcessing.java:159)
              at dk.netarkivet.harvester.heritrix3.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:457)
      

      The problem is that if Heritrix3 doesn't create any (w)arc files, no metadata-warc is created
      And you really should, as the reports are still being written by Heritrix, and they contain valuable information

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                svc Søren Vejrup Carlsen
                Reporter:
                svc Søren Vejrup Carlsen
              • Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 17m
                  17m