Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • No more JMS!
  • Harvesters can be added dynamically either in the GUI or by periodic scans.
  • Provide a simple service to talk to all harvest control managers and handle their state.
    • Idle.
    • Harvest workflow In progress.
    • Software update.
  • Provide GUI configuration for managing "channels" without reconfiguring/redeploying.
  • Expose an API for the most common operations.
    • Polling a database constantly is just BAD practice.


  • Move from JSP to template driven servlets.
    • Ability to unit test almost everything instead of time consuming manual release tests.
    • Easier for institutions to customize or create their own interface/layout.

Indexing Server

  • Is very very very slow at indexing.
  • CDX generation should be done on the harvest controller after metadata generation.
  • Should instead use an improve multithread batch system.
  • Remove JMS.
  • Add API.
  • Maybe add some GUI to show status of indexing and other minor tasks.


  • Cleanup H3 harvest control manager.
    • Remove JMS.
    • Split H3 management code and Netarkiv code apart.
  • Provide abstract to manage harvest control managers through an API/REST/NIO.
  • Rewrite Job Scheduler to use the new control interface.
  • Cleanup harvester-core package into more appropriate modules.
  • Cleanup indexing server.
  • Cleanup deploy package.
  • Isolate storage completely exposing only simple interfaces.

Component platform

  • Applications should run on the same basic "framework".
  • Interact through an API/REST/NIO to make it as independent as possible.
  • Encapsulate the application in a classloader context that can be updated/restarted by the application through the API.
  • Find or construct an appropriate "framework" using as few small 3rd party components as possible.


Since it is a daunting project to just refactor everything it would be best to just start with the harvest control manager and see how easily it can be turned into an independent component.

Depending of how this works more of the restructuring tasks can be done in an other that makes sense. However reworking the Job Scheduler for work with the new control manager component is probably the most logical step.

The order should also consider refactoring parts of the code that would eliminate long known bugs or other areas that have not been focused on. Such as manual workflows.