Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2212

Replace the 2 job priorities with a configurable harvest channel system

    XMLWordPrintable

Details

    • BNF
    • Confident

    Description

      Currently, crawlers can accept either focused or snapshot jobs. The implementation refers to these groups as priorities, though they are rather "channels" accepting certain types of jobs.

      BnF's production team would like to be able to define several "crawler pools" and to dispatch crawls to a specific pool, for example :

      • a general snapshot harvest pool
      • a general purpose focused harvest pool
      • a focused harvest pool dedicated to digital press
      • in the case of an event crawl, materialize a pool

      Since the current job priorities are actually materialized by JMS channels, I'm proposing to use the term of "harvest channel" instead of "crawler pool".

      A harvest channel is uniquely named and can process only focused or only snapshot jobs. There is a default channel for both types of harvests (snapshot and focused). Harvest channels are datamodel objects, they are stored in the harvest database.

      Every HarvestController declares which harvest channel it will listen on, through configuration.

      Harvests will have a new field declaring on which channel their jobs should be sent. If no channel is explicitely declared then the default one is used.

      The UI will offer a new page on which existing harvest channels are listed, and new channels can be created (deletion and edition will not be treated in this issue).

      Additionally, this new UI page will allow the user to map harvests to a harvest channel. These mappings are also stored in the harvest database.

      Attachments

        Issue Links

          Activity

            People

              nicl@kb.dk Nicholas Clarke (Inactive)
              ngiraud Nicolas Giraud (Inactive)
              Mikis Seth Sørensen Mikis Seth Sørensen (Inactive)
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 35h
                  35h
                  Remaining:
                  Remaining Estimate - 35h
                  35h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified