Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1987

Change HarvestJobManager to submit jobs based on Harvester ready events

    XMLWordPrintable

Details

    • SB/KB
    • Rough
    • Hide
      1. Configure 5 domains with different harvest templates. Set the max object limit to 100.
      2. Create a selective harvest with the 5 domains
      3. Activate the Harvest. 5 jobs will be created inside of a minute
      4. All job should now run through the normal New, Submitted, Started, Done phases. Only 2 job should be in the Submitted + Started state at a time. The first 2 jobs should start after at most 30 seconds. It should take less than 30 seconds from a job is set to Done to the next job is started.
      Show
      Configure 5 domains with different harvest templates. Set the max object limit to 100. Create a selective harvest with the 5 domains Activate the Harvest. 5 jobs will be created inside of a minute All job should now run through the normal New, Submitted, Started, Done phases. Only 2 job should be in the Submitted + Started state at a time. The first 2 jobs should start after at most 30 seconds. It should take less than 30 seconds from a job is set to Done to the next job is started.

    Description

      New strategy

      We have decided to abandon the current attempt to maintain a Harvester state in the HarvestJobManager, which is very difficult to achieve in a asynchronous system. Instead we are going to try a more event based approach, which should be simpler and more robust. The job dispatching will be as follows:

      1. All Harvesters sends a 'Ready for Job' message every 'JOB_STATUS_MESSAGE_SEND_INTERVAL' if it is ready to process a job. The Harvester will not sending any messages if is is running a harvest.
      2. Each time the HarvestJobManager receives a 'Ready for Job' message it will check the job database for jobs ready to be run. If a ready job is found it is dispatched.

      This means that:

      • We don't have to have a JobMessage Scheduler running in the HarvestDispatcher.
      • We don't have to have to try to maintain a Harvester state in the HarvestDispatcher.
      • Harvesters don't need to send 'Not readys messages
      • Harvester shutdowns don't need to be handled.

      Speciel cases are:

      • Shutdown of a harvester: If a Harvester stops it will stop sending 'Ready for jobs' messages, and will therefore not cause new jobs to be dispatched.
      • The HarvestJobManager is restarted: The HarvestJobManager will inside of a 'JOB_STATUS_MESSAGE_SEND_INTERVAL' receive 'Ready for job' messages from all ready harvester, on will therefore quickly have send jobs to idle harvesters.
      • The HarvesterJobDistapacher is slow to send a new Job, which causes the Harvester to send 2 'Ready for job' messages before a Job is received. This might cause jobs to slowly aggregate on the Job queue, so we need a mechanism so Harvesters check the job queue for any outstanding jobs before.
        So we should introduce a extra feature:

      3. When a Harvester becomes ready it should wait 'TIME_TO_LISTEN_FOR_NEW_JOB_BEFORE_SENDING_READY_MESSAGE' before .....

      Attachments

        Issue Links

          Activity

            People

              mss Mikis Seth Sørensen (Inactive)
              mss Mikis Seth Sørensen (Inactive)
              Søren Vejrup Carlsen Søren Vejrup Carlsen (Inactive)
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 6h
                  6h
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h
                  6h