Details
-
Task
-
Resolution: Fixed
-
Major
-
None
-
None
Description
The issue is to find which jobs are harvesting a given domain - even if the domain is not in the seedlist. This boils down to running a crawl-job-regexp-search across all Running Jobs instances. This should be possible from within NAS - we have links to all relevant heritrix instances so we just need to make a series of REST calls to each of them using the https://webarchive.jira.com/wiki/display/Heritrix/Heritrix+3.x+API+Guide#Heritrix3.xAPIGuide-ExecuteShellScriptinJob method. (I think we need to do a scan of the job directory of each Heritrix first to identify the Job id.)