The goal of this document is to describe the work of transforming data files and accompanying metadata files in a tree structure on disk to a Content-Model-less object tree in DOMS.
For this to be generic some assumptions needs to be taken:
- Data files and metadata files that belongs together have the same prefix
- Data files can be recognized by their file suffix.
- Checksum files is not supposed to be represented directly, but rather as a property of the data the are associated with. They will thus be skipped as objects in the tree, but used in the ingest of the data they belong to.
- Fedora Objects correspond to file system directories
- Sub-directories are represented by a "hasPart" relation to the subdirectory object.
- Each object will, as an identifier, have the file system path to the directory
- A data file is a file containing data. The actual data is stored outside DOMS. The data file is represented as a file object in doms.
- A metadata file is a file containing metadata. The file is stored inside DOMS.
- A grouping of files (files having a common prefix) is represented as a object with:
- Datastreams for each metadata file
- "hasFile" relations to data files (hasFile is a specialization of hasPart)
Pseudo code expressing the above rules
The following pseudo code is meant to express the above rules on a more formal basis.
In the codes, the methods:
- groupByPrefix(): returns a list of lists of files, grouped by their common prefix.
- isDataFile(): returns a boolean telling if the given file is a datafile.