Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2494

Sanity check needed of harvest names

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 5.1
    • Fix Version/s: None
    • Component/s: GUI
    • Labels:
      None

      Description

      Currently there is no sanity checking of harvestnames.
      So when I by mistake entered the name

      <SNH
      

      I got errors like these from the harvestcontroller:

      Host: kb-test-har-003.kb.dk
      Date: Wed Feb 03 13:20:25 CET 2016
      dk.netarkivet.harvester.heritrix3.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:462)
      Fatal error while operating job 'Job 2 (state = SUBMITTED, HD = 2, channel = LOWPRIORITY, snapshot = true, forcemaxcount = 2000, forcemaxbytes = 1000000, forcemaxrunningtime = 0, orderxml = default_orderxml, numconfigs = 13, created = Wed Feb 03 13:20:07 CET 2016, submitted = Wed Feb 03 13:20:13 CET 2016)'
      dk.netarkivet.common.exceptions.IOFailure: Error during crawling. The crawl may have been only partially completed.
              at dk.netarkivet.harvester.heritrix3.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:455)
      Caused by: java.lang.RuntimeException: Exception during crawl
              at dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.doCrawl(HeritrixLauncher.java:138)
              at dk.netarkivet.harvester.heritrix3.HarvestJob.runHarvest(HarvestJob.java:102)
              at dk.netarkivet.harvester.heritrix3.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:450)
      Caused by: dk.netarkivet.common.exceptions.HeritrixLaunchException: The job '2_1454502014157' could not be built. Last loglines are 2016-02-03T13:20:19.295+01:00 SEVERE Line 512 in XML document from URL [file:/home/devel/TEST6/harvester_low/2_1454502014157/heritrix3/./jobs/2_1454502014157/crawler-beans.cxml] is invalid; nested exception is org.xml.sax.SAXParseException; lineNumber: 512; columnNumber: 59; The value of attribute "value" associated with an element type "null" must not contain the '<' character.
      2016-02-03T13:20:19.295+01:00 SEVERE Line 512 in XML document from URL [file:/home/devel/TEST6/harvester_low/2_1454502014157/heritrix3/./jobs/2_1454502014157/crawler-beans.cxml] is invalid; nested exception is org.xml.sax.SAXParseException; lineNumber: 512; columnNumber: 59; The value of attribute "value" associated with an element type "null" must not contain the '<' character.
              at dk.netarkivet.harvester.heritrix3.controller.HeritrixController.requestCrawlStart(HeritrixController.java:159)
              at dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.doCrawl(HeritrixLauncher.java:108)
              ... 2 more
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                svc Søren Vejrup Carlsen
              • Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: