Intellectual scope of SIPs

In the previous text, the complicated nature of the representation was highlighted. To understand the complexity in our organizational architecture better we first need to examine how our metadata management systems are modeled.

Intellectual entities in the metadata management systems

Intellectual entities (IE) is a concept we find in the various metadata systems outside the digital preservation environment. In these systems, we tend to operate with a lot of different IEs, usually organized in some sort of hierarchy.

In use-case examples of PREMIS and E-ARK, it is usually the highest level entity from these hierarchies, that is referred to as the IE and used to define intellectual scope of packages/SIPs, i.e. a work or expression. However, we have to define scope differently, using an entity that sits at a lower level of description:

  • SIP scope is defined by the metadata management system IE that holds the UID linking the IE to the SIP.

This is a necessity for keeping all components of our system architecture in sync. The UID sits at specifically defined IEs in our metadata management systems.

Hierarchies and flatness

A change in architecture could open for using a different key UID placed at a different location of these metadata hierarchies. However, we believe it is impractical to do so as it introduces multiple issues related that prohibits scale across our systems.

The higher level entities in such metadata hiearchies tend to describe abstract concepts. Further down the hierarchy the intellectual entities describe increasingly concrete and tangible concepts. The lowest level IE in these hierarchies is thus the most specific and tend to describe tangible physical or digital objects.

Intellectual scope defined by abstract high-level entities introduce different challenges with:

  • Vast package sizes (tens of terabytes)
  • Huge number of representations within SIPs
  • Complex representation-to-representation relationhips
  • Content description metadata changes leading to restructuring of stored data
  • Preservation of unidentified digital objects having no relationships to IEs holding the key UID
  • Increased complexity in keeping our three system domains in sync

Some of this complexity stems from complex metadata structures found in our metadata management systems. We do not want to mirror and manage entire hierarchical catalogs in the preservation environment. This is after all what the metadata managements systems excels at!

We prefer keeping the structure of information packages in a flat structure, with a 1:1 relationship between information packages and an IE in the metadata management systems describing something tangible. This is to avoid operating with unique metadata entities or “sizes” in the preservation environment, as well as to ease the metadata ingest processes.

Intellectual package scope

The ruleset we end up with is thus:

  • A digital object is described as an IE in the metadata management systems.
  • The digital object is packaged as the primary representation in a SIP
  • The SIP and the IE share a URN, managed in the metadata management system.
  • Any new digital object is described as a new IE, which receives a new URN, and thus has to be represented as a primary representation in a new SIP

Due to how URNs link everything together, the ruleset deciding whether you should create a new SIP or a new representation, lies in the metadata management domain.

  • A single URN = a single SIP
  • Multiple URNs = multiple SIPs

If you only have a single URN but still want to preserve a second derivate digital object in the DPS, you will create a second representation in the same SIP as the primary digital object.

Diagram showing relationship between IE in metadata management system and SIP in DPS

IE to SIP relationship

UID, IE and SIP

In our systems the essential UID that holds all our systems together sits or refers to the lowest level of description in our asset management systems.

Our more complex metadata management systems (e.g. Axiell Collections), are advanced asset management systems and describe the actual digital object in technical detail using a carrier1 IE. The URN identifies the carrier IE, and in extension the SIP.

Our MARC-based metadata management systems (e.g. Alma), use the URN to link to the SIP and it’s primary representation, while not actually describing the digital object in the metadata management system. The URN identifies the SIP, but not the record holding the URN.

Using the least abstract IE of description has multiple positive side effects:

  • Package size is kept small
  • Representation ruleset is kept simple
  • Package scope definition sits in producer environment
  • Ingest requirements are lowered (we only need to describe the technical asset, not its intellectual contents)
  • Package restructuring due to descriptive metadata changes are kept to a minimum

  1. The carrier entity in our metadata management systems, is the most specific described entity in these systems. The carrier describes the actual, tangible object - the thing. This goes for both analogue and digital objects. I refer to these spesific, tangible IEs as “carrier”, even though it is not necessarily called “carrier” in all of these systems. ↩︎

Last updated on - Github commit history ↗