Details
-
New Feature
-
Resolution: Fixed
-
Critical
-
3.14.0
-
None
Description
In the case of broad crawls the staff responsible for broad crawls in Netarkivet.dk wants to be able to set a maximum time for each broad crawl as it is their experience that after about 24 hours the crawljob doesn't do anything useful.
Heritrix has a "max-time-sec" attribute which can be used to enforce such a maximum time.
Therefore a work-around already exists sort of:
Just edit the "max-time-sec" attribute in the Heritrix order.xml (placed just below the "max-document-download" attribute) and replace the "0" (meaning no limit) with "86400" (meaning a limit of 24 hours) or whatever is required.
A complete solution require that a "max-time-sec" attribute is added to various tables (eg. jobs)
Attachments
Issue Links
- Trackbacks
-
2011-08-09 Netarkiv møde DK møde Tidspunkt: 9. aug 11:00 12:00 Kort information (Mikis) Workshop i December hos BnF https://sbforge.org/display/NAS/2011DecemberworkshopatBnF. Ansøgningsrunde til fuldtids WARC udvikler i gang....