Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
5.5
-
None
-
BNF
Description
The list of filters in the job configuration file (crawler-beans.cxml) is huge and does not contain provenance information (are they global/specific ? if specific, from which domain?).
The proposal is to add a comment with provenance information:
<!-- crawlertraps from RejetGeneral --> individual group name for global crawler traps.
<!-- crawlertraps from bnf.fr -->for domain specific crawler traps.
Consequence of this grouping is that global crawler traps cannot be deduplicated.