Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2858

Group and give provenance information on crawler traps in crawler-beans.cxml

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • 5.6
    • 5.5
    • HarvestJobManager
    • None
    • BNF

    Description

      The list of filters in the job configuration file (crawler-beans.cxml) is huge and does not contain provenance information (are they global/specific ? if specific, from which domain?).

      The proposal is to add a comment with provenance information:

      <!-- crawlertraps from RejetGeneral --> individual group name for global crawler traps.

      <!-- crawlertraps from bnf.fr -->for domain specific crawler traps.

      Consequence of this grouping is that global crawler traps cannot be deduplicated.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sara Sara Aubry
            Wiatrowski Wiatrowski (Inactive)
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: