Details
-
New Feature
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.19.0, 3.20.0
-
None
-
BNF
-
Confident
Description
In an earlier version, BnF had introduced the possibility to take object count in account, instead of object size, for splitting domain configurations in jobs. However this algorithm leads to unsatisfactory results.
Rather than having jobs generated with a goal of foreseeable crawl time, BnF curators wish to be able to split jobs by a fixed number of domain configurations, so they can plan crawler allocation in advance, to ensure proper execution of daily crawls, urgent crawls, etc...
This implies:
- to refactor the current job generation code and extract it from model objects (which is better design-wise anyway)
- to implement the BnF desired job generation algorithm
- to have a factory that allows to choose between job generation models based on configuration properties