Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ConceptChallengesOpportunities
Dedicated harvesters with Umbra support. Specific harvesting channel for umbra. New class of harvests parallel to snapshot/selective.

Code-heavy (both back-end and front-end).

Doesn't scale automagically - number of umbra-enabled harvesters must be defined in advance.

Good monitoring because different types of harvest are easily separable in GUI.

Initially can be done on a small number (1?) of specially configured harvesters.

Fully containerised (umbra+broker). Spin up as needed if crawler beans include umbra extensions.

Requires docker + docker-compose available on every harvest machine.

Requires docker skillz from developers.

Status of docker/docker-compose integration with java not fully known.

Flexible and scalable.

Development only in NetarchiveSuite backend.

Containerised solution useful for testing in a consistent portable environment.

Flexible and scalable.

Development only in NetarchiveSuite backend.

Fully containerised (umbra+broker) but persistent - reuse the same Umbra installation for any given harvester.

Requires docker + docker-compose available on every harvest machine.

Requires some docker skillz from developers.

Must make sure that each Umbra is stateless between jobs.

Containerised solution useful for testing in a consistent portable environment.

Doesn't really require java-docker integration. (Can launch Umbra from NetarchiveSuite start-scripts.)

Development only in NetarchiveSuite backend.

Containerised solution useful for testing in a consistent portable environment.


Native (non-container) umbras per harvester. Single broker.

Must make sure that each Umbra is stateless between jobs (empty queue).

Need to find out how to do multiple umbras per machine (one per HarvestController).

Leverage broker's "default exchange" ability to enable automatic routing.

Heavy on harvester configuration. Require native-umbra-per-harvester running everywhere (so at least python, headless chrome, dummy X display, on every harvest machine).

No need to learn docker.

Development only in NetarchiveSuite backend.














...