Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1612

Maximum time for a crawljob wanted for broad crawls

    XMLWordPrintable

Details

    Description

      In the case of broad crawls the staff responsible for broad crawls in Netarkivet.dk wants to be able to set a maximum time for each broad crawl as it is their experience that after about 24 hours the crawljob doesn't do anything useful.
      Heritrix has a "max-time-sec" attribute which can be used to enforce such a maximum time.
      Therefore a work-around already exists sort of:
      Just edit the "max-time-sec" attribute in the Heritrix order.xml (placed just below the "max-document-download" attribute) and replace the "0" (meaning no limit) with "86400" (meaning a limit of 24 hours) or whatever is required.
      A complete solution require that a "max-time-sec" attribute is added to various tables (eg. jobs)

      Attachments

        Issue Links

          Activity

            People

              svc Søren Vejrup Carlsen (Inactive)
              svc Søren Vejrup Carlsen (Inactive)
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 0.15h
                  0.15h