5. Metadata

Key Concepts

types and functions

Metadata creation and implementation are resource-intensive processes. Balance costs and benefits in developing a metadata strategy, taking into consideration the needs of current and future users and collection managers. Identify metadata requirements at the onset of an imaging initiative. These requirements should be tightly linked to functions that must be supported (e.g., rights management, resource discovery, and long-term care).

Consider the following issues:

  • Although some metadata elements are static (e.g., date of creation, scanning resolution), certain fields (e.g., migration information) may continue to evolve and require continuous updating and maintenance.
  • The creation and management of metadata is accomplished through manual (creating a Dublin Core record) and automated (generating a keyword index from OCR'ed text) techniques. Similarly, metadata quality control will be based on a mix of manual (evaluating the quality of subject access categories and keywords) and automated (using an SGML parser to validate tags) processes.
  • Metadata can be internal (file naming, directory structuring, file headers, OCR, SGML) or external (external indexes and databases). The key factor in decision making is evaluating whether the location supports functionality and resource management. For example, TIFF file headers are instrumental in recording metadata internally; however, this metadata is usually lost when the TIFF files are converted to other file formats, such as JPEG or GIF.

  • There are several standards in development to facilitate interoperability among different metadata schemes. The Resource Description Framework (RDF) is an XML-based application to provide a flexible architecture for managing diverse metadata in the networked environment. The goal of the Digital Imaging Group's Metadata For Digital Images (DIG 35) initiative is to define a standard set of metadata that will improve interoperability between devices, services, and software, thus making it easier to process, organize, print, and exchange digital images. The MPEG-7 (Moving Picture Experts Group) initiative targets audio-visual content description and aims to standardize a set of description schemes and descriptors, a language to specify description schemes, and a scheme for coding the description. The Interoperability of Data in E-Commerce Systems (<indecs>) project is an international collaboration to develop a metadata framework that supports network commerce of intellectual property.
What kinds of metadata will be created for a journal collection that is converted as 600 dpi, 1-bit TIFF 6.0 images? The following metadata tasks might be undertaken. Each is identified by its principle metadata type (S = structural, D=descriptive, A=administrative). Note: The RLG Model RFP provides an example of metadata requirements for a text imaging project.

  • Assign file names and directory structures to the image files and the associated metadata files. (S)
  • Create or update MARC records (Fields 100, 110, 245, 260, 440, 650, etc.). (D)
  • Create Dublin Core records. (D)
  • Use MARC Field 007 to record digital preservation and reformatting information. (A)
  • Use appropriate TIFF 6.0 file headers to record technical information, e.g., ImageWidth, ImageLength, Compression, StripOffsets, RowsPerStrip, StripByteCounts, Xresolution, Yresolution, Resolution Unit; BitsPerSample. (A)
  • Assign persistent, globally-unique, and location-independent file names (PURL or Handle). (D)
  • Use appropriate TIFF 6.0 file headers for image description (Field 270) to record descriptive elements essential for identifying the file (e.g., project ID, institution, collection, year of publication, title, author, image sequence number). (D)
  • Create a database to store and manage bibliographic information from the cumulative journal indexes to enable structured vocabulary search (e.g., journal volume, issue, title, author, beginning and ending page number). (D, S)
  • Use TEI Lite SGML encoding to map the basic structural elements of the journals, such as volume, issue, title, author name, beginning and ending pages for each article, to facilitate online searching and browsing. (S)
  • OCR images to provide free-text key word access. (D)
  • Create HTML tags with Dublin Core information to facilitate resource discovery. (D)
  • Register the Web site with relevant subject directories, specialized subject portals, and gateways to increase coverage by Web search engines. (D)


Example 2
What kinds of metadata will be collected and recorded for a collection of photographs?

In addition to many of the elements suggested above, consider whether to·

  • Enhance an existing finding aid, and SGML-encode it using the EAD (Encoded Archival Description) Document Type Definition to create a map of the collection for searching and presentation. This will facilitate interoperability with other EAD-encoded finding aids (D, S, A)


Reality Check

Which of the following metadata would be important for preservation reasons? Select all correct answers.

Unique identifiers
Structuring tags
Physical description of source document
Scanner profile


