Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2734

SeedUriDomainnameQueueAssignmentPolicy.getKeyFromSeed() throws org.apache.commons.httpclient.URIException during standard test harvest of netarkivet.dk

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • 5.4
    • 5.4
    • H3-extensions
    • None

    Description

      During a standard harvest of domain netarkivet.dk the following is written thousands of times (the count was 41866 at 85% jobcompletion) to the file heritrix3_err.log

      org.apache.commons.httpclient.URIException: Relative URI but no base: www.netarkivet.dk
              at org.archive.url.UsableURIFactory.fixup(UsableURIFactory.java:419)
              at org.archive.url.UsableURIFactory.create(UsableURIFactory.java:275)
              at org.archive.url.UsableURIFactory.create(UsableURIFactory.java:265)
              at org.archive.net.UURIFactory.getInstance(UURIFactory.java:44)
              at dk.netarkivet.harvester.harvesting.SeedUriDomainnameQueueAssignmentPolicy.getKeyFromSeed(SeedUriDomainnameQueueAssignmentPolicy.java:126)
              at dk.netarkivet.harvester.harvesting.SeedUriDomainnameQueueAssignmentPolicy.getClassKey(SeedUriDomainnameQueueAssignmentPolicy.java:77)
              at org.archive.crawler.prefetch.FrontierPreparer.getClassKey(FrontierPreparer.java:266)
              at org.archive.crawler.frontier.AbstractFrontier.getClassKey(AbstractFrontier.java:252)
              at org.archive.crawler.frontier.WorkQueueFrontier.findEligibleURI(WorkQueueFrontier.java:681)
              at org.archive.crawler.frontier.AbstractFrontier.next(AbstractFrontier.java:455)
              at org.archive.crawler.framework.ToeThread.run(ToeThread.java:134)
      

      Attachments

        Issue Links

          Activity

            People

              svc Søren Vejrup Carlsen (Inactive)
              svc Søren Vejrup Carlsen (Inactive)
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: