Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2069

Allow an alternative job generation algorithm

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Critical
    • I53, 4.0
    • 3.19.0, 3.20.0
    • Harvest Definition
    • None
    • BNF
    • Confident

    Description

      In an earlier version, BnF had introduced the possibility to take object count in account, instead of object size, for splitting domain configurations in jobs. However this algorithm leads to unsatisfactory results.

      Rather than having jobs generated with a goal of foreseeable crawl time, BnF curators wish to be able to split jobs by a fixed number of domain configurations, so they can plan crawler allocation in advance, to ensure proper execution of daily crawls, urgent crawls, etc...

      This implies:

      • to refactor the current job generation code and extract it from model objects (which is better design-wise anyway)
      • to implement the BnF desired job generation algorithm
      • to have a factory that allows to choose between job generation models based on configuration properties

      Attachments

        Activity

          People

            ngiraud Nicolas Giraud (Inactive)
            ngiraud Nicolas Giraud (Inactive)
            Søren Vejrup Carlsen Søren Vejrup Carlsen (Inactive)
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 70h
                70h
                Remaining:
                Remaining Estimate - 70h
                70h
                Logged:
                Time Spent - Not Specified
                Not Specified