Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.1
-
None
-
BNF
-
Confident
Description
Currently, crawlers can accept either focused or snapshot jobs. The implementation refers to these groups as priorities, though they are rather "channels" accepting certain types of jobs.
BnF's production team would like to be able to define several "crawler pools" and to dispatch crawls to a specific pool, for example :
- a general snapshot harvest pool
- a general purpose focused harvest pool
- a focused harvest pool dedicated to digital press
- in the case of an event crawl, materialize a pool
Since the current job priorities are actually materialized by JMS channels, I'm proposing to use the term of "harvest channel" instead of "crawler pool".
A harvest channel is uniquely named and can process only focused or only snapshot jobs. There is a default channel for both types of harvests (snapshot and focused). Harvest channels are datamodel objects, they are stored in the harvest database.
Every HarvestController declares which harvest channel it will listen on, through configuration.
Harvests will have a new field declaring on which channel their jobs should be sent. If no channel is explicitely declared then the default one is used.
The UI will offer a new page on which existing harvest channels are listed, and new channels can be created (deletion and edition will not be treated in this issue).
Additionally, this new UI page will allow the user to map harvests to a harvest channel. These mappings are also stored in the harvest database.