PREMIS (PREservation Metadata: Implementation Strategies)
Table 4-A - PREMIS Data Model Entities: Objects, Events, Agents, Rights, and Intellectual Entities. Objects and Agents exist as the two primary nodes, connected to each other through Events and Rights.
The ''Objects'' are what are actually stored and managed in the preservation repository. Most of PREMIS is devoted to describing digital objects, concentrating primarily on its technical characteristics. The information that can be recorded includes:
- a unique identifier for the object (type and value);
- fixity information such as a checksum (message digest) and the algorithm used to derive it;
- the size of the object;
- the format of the object, which can be specified directly or by linking to a format registry;
- the original name of the object;
- information about its creation;
- information about inhibitors;
- information about its significant properties;
- information about its environment;
- where and on what medium it is stored;
- digital signature information;
- relationships with other objects and other types of entities.
PREMIS defines three different kinds of objects and requires implementers to make a distinction between them. These are file objects (the primary unit), representation objects (which are made up of file objects), and bitstream objects (which make up file objects). Some semantic units defined in the PREMIS Data Dictionary are applicable to all three types of object, while others are applicable to only one or two types of object.
The ''Event'' entity aggregates information about actions that affect objects in the repository. An accurate and trustworthy record of events is critical for maintaining the digital provenance of an object, which in turn is important in demonstrating the authenticity of the object. The information that can be recorded about events includes:
- a unique identifier for the event (type and value);
- the type of event (creation, ingestion, migration, etc.);
- the date and time the event occurred;
- a detailed description of the event;
- a coded outcome of the event;
- a more detailed description of the outcome;
- agents involved in the event and their roles;
- objects involved in the event and their roles.
The Data Dictionary entry for Type provides a 'starter list' of events to help guide implementation.
''Agents'' can be people, organizations, or software applications. PREMIS defines only a minimum number of semantic units necessary to identify agents, since there are several external standards that can be used to record more detailed information. A repository could choose to use a separate standard for recording additional information about agents, or it could use the agent identifier to point to externally recorded information. The Data Dictionary includes:
- a unique identifier for the agent (type and value);
- the agent's name;
- designation of the type of agent (person, organization, software).
The ''Rights'' entity aggregates information about rights and permissions that are directly relevant to preserving objects in the repository. Each PREMIS rights statement asserts two things: acts that the repository has a right to perform, and the basis for claiming that right. The information that can be recorded in a rights statement includes:
- a unique identifier for the rights statement (type and value);
- whether the basis for claiming the right is copyright, license or statute, more detailed information about the copyright status, license terms, or statute, as applicable;
- the action(s) that the rights statement allows;
- any restrictions on the action(s);
- the term of grant, or time period in which the statement applies;
- the object(s) to which the statement applies;
- agents involved in the rights statement and their roles.
Most of the information is designed to be actionable (that is, recorded in a controlled form that can be acted upon by computer program).
The ''Intellectual Entities'' connect directly to the object. They are conceptual, and might be called 'bibliographic entities'. PREMIS defines an Intellectual Entity as "a set of content that is considered a single intellectual unit for purposes of management and description: for example, a particular book, map, photograph, or database." PREMIS does not actually define any metadata pertaining to Intellectual Entities because there are plenty of descriptive metadata standards to choose from.
National Library of New Zealand Preservation Metadata Extract Tool
Diagram 4-B - NLNZ Data Model Entities: Object, Process, File, Metadata Modification.
''Object'' contains eighteen elements describing the logical object, which may exist as a file or aggregation of associated files. These elements identify the object and describe those characteristics relevant to preservation management.
''Process'' contains thirteen elements that record the complete history of actions performed on the objects. It includes the objectives of a process, who has given permission for the process, critical equipment used, and the outcomes of the actions taken. An audit trail of date/time stamps and responsible persons and/or agencies is constructed.
''File'' contains technical information about the characteristics of each of the files that comprise the logical object identified in Entity 1. Nine elements are common to all file types, and further elements are specified for certain categories of file (e.g., image, audio, video, text).
''Metadata modification'' contains five elements and records information about the history of changes made to the preservation metadata. This acknowledges that the record is itself an important body of data that must be secure and managed over time.
The NLNZ model specifies the following relationship rules:
- An Object may have one or more Processes associated with it;
- An Object may have one or more Metadata Modifications associated with it;
- An Object must have one or more Files associated with it;
- A Process must always be associated with a single Object;
- A Metadata Modification must always be associated with a single Object;
- A File must always be associated with a single Object.
Media Specific Standards for Technical Metadata
NISO Standard Z39.87: Technical Metadata for Digital Still Images
This standard defines a set of metadata elements for raster images only. It does not address other image formats (e.g. vector, animated raster, motion picture). The elements document digital images created through digital photography or scanning, as well as those that have been altered through editing or image transformation. Early versions of the document referred to images maintained in TIFF. The most recent version of the standard has been expanded to include other raster image file formats. The dictionary has been designed to facilitate interoperability between systems, services, and software as well as to support the long-term management of and continuing access to digital image collections. Use of the data dictionary is accomplished primarily through XML encoding. The metadata describes the entire file (including header and other information) rather than the bitstream level.
There are four sections of the data dictionary:
- ''Basic Digital Object Information:'' Contains a cluster of data elements which apply to all digital object files, not just digital image files. This kind of information may be considered more general preservation metadata.
- ''Basic Image Information:'' The items in this section are fundamental to the reconstruction of the digital object as a viewable image on electronic interfaced displays.
- ''Image Capture Metadata:'' This section can best be described as descriptive technical metadata or administrative metadata. Some of the information may be harvested from the file itself while other information will need to be provided by the institution managing the image capture process.
- ''Image Assessment Metadata:'' The operative principle in this section is to maintain the attributes of the image inherent to its quality. These elements serve as metrics to assess the accuracy of output (today's use) and of preservation techniques, particularly migration (future use).
Although Z39.87 itself was designed to be agnostic in terms of implementation, the NISO Metadata for Images in XML Schema (MIX), commissioned by NISO and created by the Library of Congress, has been the dominant form of use for the data dictionary. Because MIX is a METS extension schema, implementation and use of the data dictionary on a local level has been fairly easy to manage. Care has also been taken to ensure that NISO Z39.87 harmonizes with PREMIS.
AudioMD: Audio Technical Metadata Extension Schema
AudioMD is a XML schema to describe the technical characteristics of digital audio archival objects. AudioMD contains five top level elements:
- ''bits_per_sample:'' Number of bits in a digital audio sample i.e. quantization, e.g. 16, 24;
- ''channel:'' Number and information about channels/tracks, e.g., 2trk, 4trk, 8trk, etc.;
- ''data_rate:'' Information about the mode and data rate of audio files in Kb/s, e.g. 16, 44.1, 96 etc.;
- ''duration:'' Duration of audio source material in time, i.e. HH:MM:SSSS format;
- ''sampling_frequency:'' The rate at which the audio was sampled e.g. 44.1KHz, 96KHz, etc.
VideoMD: Video Technical Metadata Extension Schema
VideoMD is a XML schema to describe the technical characteristics of digital video objects. VideoMD contains eight top level elements:
- ''color:'' Information describing color characteristics and specifications;
- ''compression:'' The type and amount of digital compression, e.g. Predictive 10:1, RLE 2:1;
- ''data_rate:'' The data rate of the video source item in Mb/s, e.g. 4.0, 8.25, 100.0, etc.;
- ''duration:'' Duration of video source item in time, i.e. HH:MM:SSSS format;
- ''frames:'' The number of frames and frame rate of video source item;
- ''resolution:'' The horizontal and vertical dimensions in pixels and aspect ratio of the frame;
- ''sound_field:'' The digital sound format used in the video source item, e.g. mono, stereo, DTS, etc.
- ''video_format:'' Information describing the format specifications of the video.