Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Excerpt

The Bitrepository platform is used for longterm preservation of the newspaper data

The bitrepository for newspapers will consist of:

  • A nearline pillar, managed by JHLJ. The pillar's cache will act as processing area the jpeg2000 datafiles.
  • A offline pillar, managed by LFM.
  • A checksum pillar, managed by JHLJ

In-house wiki description of the Bitrepository: https://sbprojects.statsbiblioteket.dk/display/DIGSAM/4.5+Bitbevaring+Avis

Ingester

The bitrepository ingester takes care of the archiving of the jp2 files into the bitrepository archive. This is done by traversed the batch structure and for each jp2 files perform the following steps:

  1. Ensure that files are kept online for processing during the ingest process.
  2. Generated a unique FileID identifying the file in the repository. 
  3. Ingest the file into the bit repository verifying the ingest using the checksum for the file.
  4. Register the archived file in the DOMS system.

 

Releaser

The releaser handles the task of releasing files from their forced online state.

This should happen when a batch has been approved and all currently needed processing of the files has been completed.

Data processing

As part of the general architecture files are processed when they have been ingested in the repositories. I.e. Bitrepository for datafiles (jpeg2000) and DOMS for metadata.

A tape based pillar normally gives no guarantees of when files are online, which poses a problem when using it for processing. To solve this problem the tape backend api (layer 2, python) is being extended with additional methods so that:

  • We can control which files are kept in cache while processing.
  • We can minimize the number of files that has been rejected that ends up on tape storage.

The api methods for the tape backend is:

  • force-online <prefix>
    • Method to keep files with <prefix> online. The method should be called prior to ingesting any files with the prefix, to prevent them from being rolled out on tape.
    • A call to the method may fail if the qouta for online files has been exceeded. 
    • In the future, a call to this method with a prefix for which files already is on tape may bring the files online.
  • release-online <prefix>
    • Method to release files with a prefix to be rolled out on tape.
  • status-online
    • Method to list which prefixes are kept online.
    • May in the future provide more information of the online/offline state.

Additionally the contract is that if a force-online call fails (i.e. the online files qouta is exceeded), it will still be possible to ingest files in the bitrepository, they will however be rolled out on tape by the usual policy.