Appendix 2F – nomenclature of files and file structure

 
Revised version for second bid.

Definitions

[avisID] is a unique ID which the State and University Library provides for each title incorporated in the project, e.g. "aarhusstiftstidende".
[batchID] is the unique barcode that the State and University Library puts on its boxes containing microfilm, e.g. "1234567890"
[Roundtrip] is a delivery number which starts at 1 for the first delivery of a batch. For each replacement supplied, the value should be incremented by 1.
[filmID] is [batchID] followed by a suffix that the Supplier chooses for each film in a box, e.g. "1234567890-15".
[dato] is specified in ISO 8601: YYYY-MM-DD. Corresponds to "Issue Date" in MODS.
[udgaveLbNummer] is sequential on a given date and corresponds to "Edition Order" in edition metadata XML. [udgaveLbNummer] has 2 digits (prefixed zeros).
[billedID] is the image number of the film concerned and has 4 digits (prefixed zeros). In the case of images that are split (images of double-page spread), the Supplier shall add A/B after the number, e.g. 0010 for a full image or 0010A / 0010B for images of pages that both originate from physical image number 10 on the film.
[targetSerialisedNumber] Contains 6 digits (prefixed zeros). At every work shift an ISO target should be scanned. For every scanning of a work shift target [targetSerialisedNumber] should be incremented by 1.
In the file structure and filenames there should be a difference between upper case and lower case letters.
The time stamp on the files should be the actual and correct time. The time stamps should be in UTC.

The Supplier should denominate JP2 files according to the following syntax

The Supplier should denominate JP2 files of symbols according to the following syntax

Note that the symbol file is always a full image, i.e. for a double-page spread the symbol file contains symbols for both pages.

The Supplier should denominate JP2 files of workshift-iso-targets according to the following syntax

The Supplier should denominate page metadata files in MODS format according to the following syntax

The Supplier should denominate page metadata files in MIX format according to the following syntax

The Supplier should denominate work shift iso targets metadata files in MIX format according to the following syntax

The Supplier should denominate page metadata files in ALTO format according to the following syntax

The Supplier should denominate edition metadata files in accordance with the following syntax

The Supplier should denominate film metadata files in accordance with the following syntax

The Supplier should deliver all files in accordance with the following file structure

B[batchID]-RT[Roundtrip] (contains files from the same batch)
└─── WORKSHIFT-ISO-TARGET (contains files from the work shift target film)
└───[filmID] (contains files from the same film)
├─── FILM-ISOTEST-[SUFFIX]-target (iso targets on the filmfiles from the ISO test film)
├─── UNMATCHED (files that cannot be matched to a particular edition,
│ e.g. blanks, test images, duplicate images)
└───[dato]-[udgaveLbNummer] (files from a single edition)

Example of file structure

B400022028241-RT1

├───WORKSHIFT-ISO-TARGET
│ Target-000387-0001.jp2
│ Target-000387-0001.jp2.md5
│ Target-000387-0001.mix.xml
│ Target-000387-0001.mix.xml.md5
│ Target-000387-0002.jp2
│ Target-000387-0002.jp2.md5
│ Target-000387-0002.mix.xml
│ Target-000387-0002.mix.xml.md5
│ Target-000388-0001.jp2
│ Target-000388-0001.jp2.md5
│ Target-000388-0001.mix.xml
│ Target-000388-0001.mix.xml.md5
│ Target-000388-0002.jp2
│ Target-000388-0002.jp2.md5
│ Target-000388-0002.mix.xml
│ Target-000388-0002.mix.xml.md5

└───400022028241-14
│ Berlingske-400022028241-14.film.xml
│ Berlingske-400022028241-14-film.xml.md5

├───FILM-ISO-target
│ Berlingske-400022028241-14-ISO-1.jp2
│ Berlingske-400022028241-14-ISO-1.jp2.md5
│ Berlingske-400022028241-14-ISO-2.jp2
│ Berlingske-400022028241-14-ISO-2.jp2.md5

├───UNMATCHED
│ Berlingske-400022028241-14-0001.jp2
│ Berlingske-400022028241-14-0001.jp2.md5
│ Berlingske-400022028241-14-0002A.jp2
│ Berlingske-400022028241-14-0002A.jp2.md5
│ Berlingske-400022028241-14-0002B.jp2
│ Berlingske-400022028241-14-0002B.jp2.md5
│ Berlingske-400022028241-14-0132.jp2
│ Berlingske-400022028241-14-0132.jp2.md5

└───1860-10-18-01
Berlingske-1860-10-18-01.edition.xml
Berlingske-1860-10-18-01.edition.xml.md5
Berlingske-1860-10-18-01-0003-brik.jp2
Berlingske-1860-10-18-01-0003-brik.jp2.md5
Berlingske-1860-10-18-01-0003A.jp2
Berlingske-1860-10-18-01-0003A.jp2.md5
Berlingske-1860-10-18-01-0003A.mods.xml
Berlingske-1860-10-18-01-0003A.mods.xml.md5
Berlingske-1860-10-18-01-0003A.mix.xml
Berlingske-1860-10-18-01-0003A.mix.xml.md5
Berlingske-1860-10-18-01-0003A.alto.xml
Berlingske-1860-10-18-01-0003A.alto.xml.md5
Berlingske-1860-10-18-01-0003B.jp2
Berlingske-1860-10-18-01-0003B.jp2.md5
Berlingske-1860-10-18-01-0003B.mods.xml
Berlingske-1860-10-18-01-0003B.mods.xml.md5
Berlingske-1860-10-18-01-0003B.mix.xml
Berlingske-1860-10-18-01-0003B.mix.xml.md5
Berlingske-1860-10-18-01-0003B.alto.xml
Berlingske-1860-10-18-01-0003B.alto.xml.md5

batchID with Roundtrip

files from the work shift target film

















filmID
film metadata
checksum

iso targets on the film
image 1 from film
checksum
image 2 from film
checksum

non-attributable pages
test capture from the film – not split
checksum
test capture from the film – split
checksum
test capture from the film – split
checksum
test capture from the film – split
checksum

date and udgaveLbNummer
edition 01 metadata
checksum
symbol file for spread on image 3
checksum
image of left page on image 3 – split
checksum
mods metadata for left page on image 3
checksum
mix metadata for left page on image 3
checksum
alto metadata for left page on image 3
checksum
image of right page on image 3 – split
checksum
mods metadata for right page on image 3
checksum
mix metadata for right page on image 3
checksum
alto metadata for right page on image 3
checksum