SBPROJECTS will be offline Wednesday between 7:30 and 8:30

SBForge with all its applications will be down for security updates during a time interval of about 10-20 minutes in the interval mentioned above.

Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »


  • The code is still a mess even though it has been converted from an ant to a maven project.

  • The Harvester-core module is just a trashcan location for anything remotely harvester related.
  • Unit tests are separated into their own modules making code coverage close to impossible.

  • Impractical to use 1-2 weeks of running test cases which should be covered much better by unit testing. (every time anything major is released)

  • Registering/deregistering JMS listeners just seems wrong and it is surprising it has worked so far. Nothing in the JMS specifiction suggests that this is even a good idea.

Overall solutions

  • Isolate storage implementation completely. It is going to happen sooner or later anyway.
  • Remove JMS as much as possible. (Except from legacy code like the bitarchive implementation)
  • Split the different parts of NAS into separate modules with little or no dependencies.


  • It is easier to build/test/deploy. With improved and localized unit testing you could most likely...
    • release much quicker without having to use 1-2 weeks testing.
    • release only the module(s) you changed.
  • People who don not want to use NAS could instead just use the parts that fit their needs.
    • For example people could use the harvest controllers and nothing else.





Harvest Control Managers



- Remove the JMS dependency from the controller.

  - Instead use a REST interface or some other means of exposing an API.

- Remove the notion of channels from the controller.

  - The management of organizing controllers into groups is left to the user of these APIs.

- Make the code independent of the rest of NAS so it can be used not only by NAS.

  - Controllers can be deployed independently of the rest of NAS.

- Use plugins for core functionality. (Use classloaders)

  - build progress reports

  - build metadata files when the job is complete

  - upload data

- A controller is built for a specific harvester; H1, H3, API

- The API should include all required functions to control the harvest manager.

  - Extendable using custom commands that the plugins add to the controller. (Thinking beyond H3...)

  - Offer base client implemention. (Used by a job manager/monitor)

  - Submit job.

    - Upload configuration files.

    - Upload addititional files; indexes etc.

  - Start job.

  - Get progress/report.

  - Stop job.

  - Initiate metadata generation.

  - Initiate upload.

    - Try to avoid harvests being ... manual bla.


Netarkivet would then migrate existing code into plugins and other users could use these as a reference to adapt then to their own infrastructure.



  • No labels