[NAS-2801] Job Queues should be configurable at HarvestConfig level Created: 24/Sep/18  Updated: 24/Sep/18

Status: Triage
Project: NetarchiveSuite
Component/s: GUI, Harvest Definition, HarvestJobManager
Affects Version/s: 5.4.2
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Colin Rosenthal Assignee: Unassigned
Resolution: Unresolved  
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The Issue: Suppose we have a HarvestDefinition (HD) based on some logical curatorial category (e.g. newspaper frontpages). Perhaps the curators wish to send only some of the HarvestConfigurations (HC) in this definition to a particular JobQueue (JQ) - say some especially difficult domains should go to a browser-based harvester. Currently this isn't possible because the only mapping available is (HD <-> JQ). [The workaround is to split the HD into two HDs.]

The Solution: add a new set of mappings (in a new three-column database table) HD<>HC<>JQ . Now when a HarvestJobManager schedules an HD it groups the HCs according to their JQ mappings and if necessary creates multiple jobs to send to different JQs. The logic should be that for each HC the triple mapping (HD,HC,JQ) takes precedence over the double mapping (HD, JQ) and if both are null then the default JQ is used. 

The solution will require

  1. a database schema modification
  2. new Entity/DAO classes to implement CRUD functionality for the triples
  3. new GUI elements on the Selective Harvest Definition page to enable setting/unsetting the triples (and also the HD<->JQ mapping which currently is set on a separate page)
  4. altered logic in HarvestJobManager to split HDs into multiple job (perhaps using the current mechanism which splits on various other fields)

Estimate: 1mm

 


Generated at Fri Apr 26 11:51:48 CEST 2024 using Jira 9.4.15#940015-sha1:bdaa9cbecfb6791ea579749728cab771f0dfe90b.