NISO/CLIR/RLG Technical Metadata for Images Workshop
April 18-19, 1999


Back to main page

Report by David Bearman

Click here to download this report in PDF Format

Background

Workshop Purpose and Goals

Reports from Image Metadata Projects

Charges to the Working Groups

Working Group Reports

Final Session


Background

NISO (The National Information Standards Organization), CLIR (The Council on Library and Information Resources) and RLG (The Research Libraries Group) sponsored an invitational workshop on April 18th and 19th in Washington DC to examine technical information needed to manage and use digital still images that reproduce a variety of pictures, documents and artifacts.

The meeting was attended by about 60 individuals with a wide range of diverse interests and perspectives on the problem of metadata information. Attendees represented libraries, universities, museums, archives, the digital library community, the government, and the digital imaging vendor community.


Workshop Purpose and Goals:

Libraries, academic institutions, archives, museums, historical societies, local/state governments and private industry are increasingly engaging in document and image conversion activities and preservation projects that result in the creation of digital images depicting textual, visual, and artifact collections. There is, however, little consistency in the metadata that accompanies these images. Other activities have focused on descriptive or intellectual metadata, but there are relatively few efforts addressing the types of information, sometimes included in the categories of structural and administrative metadata, which describe aspects of the capture process and certain technical characteristics of the digital images. The consistent recording of such information, whether in file headers or in separate associated files, will be critical to ensure that image files are and remain usable and will require the development of specialized software and tools.

The primary goal of the meeting was:

  1. To define a comprehensive set of metadata elements
  2. To reach consensus on a subset of metadata elements that should be required in the documentation of digital images.

A secondary goal was to reach a consensus on how to structure and format the metadata, including a determination of which elements belong in image headers and which should be external to the image files.

The meeting planners recognized that this meeting would not cover all types of metadata that may be useful in managing image files and image collections. For example, one area left for future consideration is the full set of metadata needed to describe the structural or hierarchical relationships among sets or groups of image files. Furthermore, although some of the metadata defined at this meeting will support the transformation of image files for migration, preservation, and access purposes, a comprehensive examination of these requirements will await another discussion in the future. The organizers believe that focussing attention on a subset of technical metadata will facilitate the development of standard uses of those metadata and open paths for a detailed examination of related metadata. The findings of this meeting will be communicated to librarians, museum and archive specialists, and digital imaging professionals in order to receive their comments and advice.


I. Reports from Image Metadata Projects

The first session was opened by Jennifer Trant, the Executive Director of the Art Museum Image Consortium, facilitator of the workshop and one of the organizing committee members. Howard Besser, Associate Professor of Information Science at UCLA and a member of the organizing committee, spoke to the group about the importance of metadata..

Besser

Link to Besser's Presentation

Besser argued that metadata was needed for interoperability of systems and longevity of preservation of materials. He described the various categories of metadata and its potential uses, including discovery, administrative, structural, and rights management. He then discussed the problems of images being separated from their associated metadata and the need for ways to link them together. Likewise, longevity issues were discussed, such as old file formats that can no longer be read (or even decoded) due to their being created by hardware and software that is no longer existent. He went on to describe two broad approaches to solving this problem: emulation of old software and hardware, and a cycle of migration of images from old hardware and file formats to current ones. Besser stated that metadata is the key to the problem and that for metadata to be useful there need to be standards and consensus of what kind of metadata needs to be collected and how it should be collected. He finally proposed a few areas in which the group might be able to come to agreement during the meeting: metadata fields, rules for field content, a core set of required fields, and a possible syntax for expression of data.

  1. Current Projects Using Image Metadata

The following projects illuminate how image metadata is being defined and used. The intent of these presentations was to give the audience a perspective on current activities, problems encountered, and choices made to arrive at solutions. These presentations also elucidate what is common among some of the major initiatives making use of technical image metadata.

Fleischhauer

Fleischhauer highlighted challenges the Library of Congress faced in image metadata. The initial focus of the project was to broaden access to the Library of Congress’s collections; the project has shifted to address preservation concerns. With this change, the importance of image metadata became apparent.

At the Library of Congress, the cataloging traditions have always created separate metadata (catalog) and holdings, so the practice of embedding metadata in digital objects was initially novel. In reviewing results of a number of projects in which the Library of Congress has been involved in recent years, Fleischhauer reflected that metadata is buried in various places, including headers, meaningful names, lengthy prose description, etc. He noted the benefits that would accrue if all metadata could be explicitly externalized. He stated that the Library "believes that not all the metadata will be in the same place" and thus the Library will need to address architecture issues. Disagreeing with Besser regarding the pertinence of relationships, Fleischhauer explained that the fundamental purpose served by such embedded metadata was to record structural metadata that was crucial to access objects linked to each other and other metadata files. He also noted that at the Library of Congress, layers of entities and their relations have, until recently, been related to each other largely by naming conventions. However, Fleischhauer believes this should not necessarily be considered a good practice.

According to Fleischhauer, the basic issues in selecting metadata are to assess cost/benefits (what can be done with it and at what price), and to examine which elements can be or must be parsed to be useful (e.g. useful to machines). In each case, he recommended identifying for whom and for what purpose the metadata is intended. He noted that the Library of Congress typically has not recorded details about the equipment configuration because it doesn't appear to have a great enough effect on future usability to recommend the effort involved in its capture. Fleischhauer contrasted this judgement about cost/benefits of low benefit data, with an example of metadata not captured because of excessively high cost, though its value is well appreciated. Because the Library of Congress has been doing a lot of negatives scanning, it would like to track image enhancements. While it can easily record automatic routines, it finds that documenting the image by image decisions (craftsmanship of the imaging technicians) is prohibitive. He asked, somewhat rhetorically, if metadata can track the rationale for certain decisions - as in printed half tones where decisions need to be made about how to avoid Moray patterns. He then suggested that the benefits of agreeing on metadata might include commercial offerings that also included tools to enable such documentation.

Fleischhauer also introduced participants to the "digitization practices preface," an extended prose narrative focussed on technical metadata documentation which has been written for each digitized Library of Congress collection. The kinds of information conveyed in this way could, in theory, be placed in image files, but Fleischhauer asked, again rhetorically, where should we put this kind of project level management decision?

Dale

The second presentation was by Robin Dale of the Research Libraries Group. Dale reported on the metadata used by several collaborative imaging projects which RLG sponsored or participated in over the past several years. Her handouts presented lists of elements recommended by the guidelines for "Studies in Scarlet" (1996), the Working Group Preservation and Reformatting Information (Feb. 97) and the Working Group on Preservation of Images (March 1998). Essentially, Dale reported, the preservation community wanted to capture information on file quality and veracity, and thus their recommendations should be understood in the context of that intent. Also, they wanted to identify a "core" set of elements — that are essential - rather than a comprehensive set or an architecture for extensible elements. In these projects they decided that, given the state of technology, it was still too early to decide where to place the metadata.

Dale presented the 16-data-element set to which the Preservation and Reformatting group agreed. She illustrated some of the early efforts by preservation administrators to identify and record metadata elements considered important for the long-term retention and care of digital files. She showed a dated example of HTML dot syntax qualified DC and an example of how the proposed metadata elements could be recorded in the USMARC 533 field - a field designed to information about reproductions - and the traditional location for notes about a reformatted item.

The Preservation of Images project’s metadata set was smaller (12 elements) and included data about sound. Dale noted a variety of other projects with similar or overlapping metadata. Some elements appear to have become a "best practice," and the concept of developing a core list, especially with reference to core lists developed by others, is proposed as a modus operandi. Dale noted that CEDARS and the National Library of Australia have further refined the 1998 RLG elements and this set could be seen as a "standard" set of element "content."

Bearman and Newman

Link to Bearman's Presentation     Link to Newman's Presentation

In the third presentation, David Bearman of AMICO and Alan Newman of the Art Institute of Chicago discussed the Art Museum Image Consortium’s metadata for documenting a large body of art representations and making them available for use by educational institutions. The Art Museum Image Consortium (AMICO) is a not for profit association of institutions with collections of art, and it currently is building a joint digital library.

Bearman primarily spoke about image metadata used by AMICO (a publisher of a secondary art database) and Newman discussed metadata used by a member of the consortium, the Art Institute of Chicago, whose needs are most typical of internal management requirements of museum collections. The point was to examine why each created and managed different metadata about images and why this was necessary to satisfy their different purposes.

AMICO's metadata practices are highly pragmatic. They are based on data that is available from members, cost effective to acquire and use, and designed to serve end users of the AMICO Library. The metadata consists of a "cataloging" record describing each work of art that is a "source" and one media file metadata record describing each multimedia file associated with the work. In the AMICO dataset, there is one metadata record for each analog or digital resource. All media files and metadata files contain relationship elements and links. The metadata in each file is extended Dublin Core with qualification related to format elements. Methods of representing these in XML/RDF are being explored but they currently are carried in simple label: value pairs in external delimited files.

Newman reported that the Art Institute of Chicago collects metadata about its digital files and its analog image resources, for asset management. The information is often detailed. It uses authority files (one for "view" was shown), and includes machine readable, human readable, and judgmental information. Much of the metadata is necessary to support the next stage of decision making in internal work flow but is not very relevant to end users.

In summary both the Consortium and the museum keep the data their users need, creating one metadata record for each digital file, and maintaining explicit relations between metadata and the images and between all images and the source. Like Fleischhauer, Bearman stressed that relationships between metadata sets and the objects they represent and unique identifiers of objects and metadata sets, were critical issues in the architecture of image metadata.

Hurley

Link to Hurley's Presentation

In the final presentation, Bernie Hurley, the Chief Scientist U.C. Berkeley libraries, discussed the "Making of America 2" (MOA2) project in which the University of California at Berkeley is a participant.

MOA2 is a collaborative project with metadata practices emphasizing interoperability, scalability and digital preservation. Hurley stated that standards must encode metadata from four categories: descriptive, structural, administrative, and technical. MOA2 metadata is published in an XML DTD. The MOA2 project is exploring how to capture metadata concerning built in methods in their objects. They are also trying to assess the value is of metadata and what services the tools can provide with images (e.g. a Java applet for viewing).

Among the lessons they have learned are that collaborators have a great fear of capture costs, and that no clear understanding of metadata uses exists in terms of end-user needs and of program services that the metadata will support. (He noted wryly that resistance to "required" metadata disappears if its capture is automatic, it can be obtained elsewhere, or when it only needs to be entered once per project.) Other lessons include that the "best practices" for data values are unclear. Hurley also noted that since you can't put all the metadata in the image header, every project needs to decide what is an appropriate transfer syntax (obviously an arena for standards). MOA2 created a digitization software management application that generates XML DTD, with "structure" being used to define relationship between a group of images, not necessarily between the images and the real world, though both are necessary at times. Some encoding practices of the MOA2 approach are significant only internally. For example, the cluster of metadata to be inherited based on declared value "hirez jpeg" includes a defined file size, resolution, color space, etc.

B. Discussion

Trant invited questions for discussion in order to clarify and shape the agendas of the three breakout groups.

Several speakers noted that certain metadata can be captured automatically. Identifying the metadata that can be made in this way and developing tools to get additional useful metadata automatically captured would be a valuable product of the workshop. [Alan Newman, Stephen Chapman, Howard Bussey]

Most participants generally agreed that relationship links were crucial aspects of metadata and that their persistence and utility was dependent on permanent/ unique naming conventions and persistent services. [Lou Sharpe, Alan Newman, David Bearman, Bernie Hurley]

A number of participants spoke in depth about functional requirements.

Comments included:

Stephen Chapman: Metadata needs to provide a way to get back from the digital to analog consistently

Oya Rieger: What are the kinds of work we want to have metadata do?

"On-the-fly" conversion needs to be supported (in addition to preservation, viewing, etc.)

Bearman: How do we address problems like that of statistical files which can create artifacts if there isn't a trace to the source?

Lou Sharpe: Image key has a problem, as it creates "privacy" problems.

Thorny Staples - Absolute identifier for an image - a registry method

Cliff Lynch - How much if anything is unique to images?

- Are we going to be satisfied with recording allegation of metadata authorship or do we need to know in a more definite way (cryptography etc.)?

Nick Eiteljorg — Regarding the unspoken assumption that serious use will be made of images that have passed through multiple hands: is it true or will they always return to the original source? Isn't scholarship dependent on only having one source?

Calls were made for more industry involvement [Kevin Burns, Lou Sharpe, Robin Wendler, Alan Newman]

Some general principles were discussed as follows:

Lou Sharpe: One class of problems is that metadata needs to be "objectized" so that it is passed through to next use.

Howard Bussey. If you don’t understand it, pass it through. If you do understand it but you've changed the image data do you have a responsibility to update it?

Cost effectiveness/cost avoidance was identified as a significant issue. The group discussed problems such as the cost of opening images to find metadata, the creation of multiple sets of metadata, or omitting metadata that could possibly lead to costly problems of locating that metadata in the future. [Bernie Hurley, Oya Rieger, Steve Chapman, Barry Schlesinger, Steven Puglia, Karla Youngs, David Green, Eric Landsberg].

The group discussed functional requirements related to maintaining metadata over time and across copies/migrations. [John Eyre, Ric Foster, Tom Shephard, Lou Sharpe, Scott Houchin]

Often noted was the importance of actual use and user requirements, although no one provided a concrete method for identifying and resolving these issues. However, Christie Stephenson suggested that a matrix consisting of User, Manager, and System would be useful, particularly in distinguishing how a given metadata value might need to be expressed. [Joyce Ray, David Austin, John Eyre, Stephen Chapman, Jennifer Trant, Christie Stephenson]

Participants discussed the relationships between image metadata problems/solutions and those being created for other metadata. [Eric Miller, Angela Spinnaze, Paul Miller, Scott Houchin].

Participants also raised the possibility of simplifying metadata creation by moving away from highly structured metadata and using images (targets) as metadata [Jennifer Trant, Anne Womack, Don D'Amato, Betsy Fanning].


II. Charges to the Working Groups

Afternoon - April 18

Following the discussions, the group subdivided into three working groups:

Group 1: Characteristics and Features of Images
(Steve Puglia session leader)
Group 1 focused on features or aspects of images typically recorded in headers or filename extensions, such as MIME type, spatial and tonal resolution, color space, references to color management information, orientation, compression, dimensions, and special viewer or printer requirements. The group examined elements that would promote interoperability and functionality. The group worked on defining a basic set of core elements, especially ones that could be automatically generated and used by software.

Group 2: Image Production and Reformatting Features
(Stephen Chapman session leader)
Group 2 focused on features describing the provenance of the image, particularly features of the imaging system that generated it, including information about the agent authorizing the production of the image, the make and model of the equipment used, targets or reference to targets used, the calibration of the system, and measures of its capacities in terms of resolution, density, and means of color encoding. The group primarily examined two areas: 1) metadata that would provide information about how an image was made such as hardware, software, and various settings, and 2) metadata that would help managers assess the value of images for particular applications and help them decide whether they are worth keeping and refreshing over time as hardware, software, and file formats change.

Group 3: Image Identification and Integrity
(John Weise session leader)
Image Identification and Integrity relates to names, maker/ manager identification, dates made or modified, check-sum or other file integrity devices, and external references (e.g., to intellectual metadata). Group 3 discussed the information needed from a digital image to ensure that the filename, the format, and data can be successfully maintained as the image file goes through a series of processes, and over time.

Morning April 19

The group continued to examine the issues that are unique to images. The breakout groups were asked to identify other groups that are working on metadata issues relevant to digital files in general, and to determine if any such work could be useful to those working on technical metadata for images.

In drafting a list of possible metadata elements, the groups defined:

  • The purpose of the element
  • The requirements filled by the element
  • Who might be collecting it and when (this helped to determine if the proposed User/Manager/System matrix would be useful)
  • How the metadata elements should be represented

In addition, the groups addressed cost/benefit questions, including:

  • How easy/difficult is a particular element of metadata to get/use?
  • What is absolutely necessary regardless of costs?
  • At what point in the process is the metadata capture is most cost-efficient?

Trant suggested that groups could also address where metadata should be stored (e.g. in a header, external file, in a referenced specification). Although this won't come from the identification of elements, it needs to be addressed in the requirements.

The groups primarily focused on problems rather than prematurely identifying solutions, as there are many possible solutions. The primary challenge was to focus on IMAGES and distinguish IMAGE requirements from requirements that are general to all digital objects.

The working groups met for approximately three hours and then shared the results of their deliberataions.


III. Working Group Reports

Group 1: Characteristics and Features of Images

The group established that it had been given these issues to address:

  • Controlled representation moving from digital to analog.
  • Level of requirements.
  • Cost of omission of data.
  • Source of derived image.
  • Reprocessing.
  • Privacy issues.
  • Absolute image identifier.
  • Minimum required elements, sets and structures.
  • End user issues.
  • Image layering.
  • Interoperability with other domains, use of already existing elements from other domains.

Furthermore, the working group established the following assumptions:

  1. They should focus on long term utility
  2. If an element was not actionable it did not necessarily belong in a header
  3. They were defining elements for dealing with the image at hand
  4. They should consider tradeoffs between cost effectiveness and cost of omission
  5. An item would only be made mandatory if there was no equivalent in current file format headers

The breakout session then examined the metadata comparison charts presented in the main session and considered those elements that were common to the different lists. The group proceeded with compiling a list of common metadata elements and of any additional technical elements necessary for defining characteristics and features. During discussions, the group tried to keep a variety of issues in mind, including:

  • What function is this element useful for?
  • What need will it meet over the short and long term?
  • Who will use it and for what?
  • How is it useful for managers, systems, users, and programs, respectively.
  • How easy is it to get the information?
  • What is the cost-benefit?

 

Metadata that were considered irrelevant, out-of-scope, or defaultable were not included.

The following guiding principles were agreed upon:

  • Metadata that is not directly "actionable" should not necessarily be in the file header, because of the concern that information will be "lost in translation" during file format conversions.
  • Metadata should have long-term utility.
  • Metadata should specifically deal with the image "at hand."
  • Principles should not be cast in stone.
  • Guidelines should be sensitive to the costs of omission.
  • Those metadata elements described as mandatory are only mandatory if they are not an inherent part of the file format.

The following groups, specific metadata elements and prioritization were agreed upon and identified as Mandatory (M), Desired (D), Mandatory if Applicable (MA), or Optional (O):

Format Issues:

MIME Type (M)


File Format (M)
Class ID / Genotype (D)

Resolution Issues:

Pixel Array Size / Count (M)


Spatial Resolution at Capture (MA)
Orientation (MA)

Encoding:

Tonal Resolution (M)


Channels and Layers (M)
Byte Order (M)
Photometric Interpretation (M)
Color Space (M)
Color Management (M)
Gamma Correction (O)
White-point / Black-point (O)

Compression:

Compression (M)


Layering (M)
Sub-sampling (MA)

Others:

Date & Time (M) — (also forwarded to both of the other groups)


Watermark (MA)
Encryption (MA)
ID (MA)
Fill or Padding (MA)
External Metadata (MA)
File Size (O)
Test Charts (O)
Platen Color (O)
Image Quality (O)
Thumbnail (O)

 

The following issues and elements were passed to the other groups:

Date & Time (to both)


Image Enhancements (to both)
Audit Trail (to both)
Dimensions of Objects ("descriptive")
Reflective / Transmissive (to production)
Lamp / Sensor (to production)
Identification (to identification)

Group 2: Image Production and Reformatting

Link to Group 2 Report PDF Format

Several issues were directed to this breakout session from the larger group. Discussion focused on production processes associated with copy digital photography, rather than creating original ("born digital") images. Erik Landsberg provided a step-by-step description of the production process for digital photography at the Museum of Modern Art. This on-the-spot case study prompted the discussion group to list the system features that contributed to image production and controlled image quality. Decisions of whether or not to document these features as technical metadata elements were based upon the evaluation criteria suggested in the full session:

  • What’s the element going to do? For whom?
  • Will data elements recorded now be useful later? What are the long-term needs that are to be supported?
  • Does the metadata need to be system and human readable? Is it to be used by applications, managers, or users?
  • Will the costs of creating this metadata be justified by benefits — either near term or long term? Are tools available to create this metadata?

In addition to beginning a list of proposed data elements (see Section IV), the discussion produced three outcomes: a set of principles (Section II), functional requirements for production metadata (Section III), and general agreement on the categories in which system features should be documented, if not actually classed as metadata elements with controlled vocabulary. The group concluded that documentation of systems and documentation of rationale (the methodology that informed the production) were both key elements to assessing system performance and controlling image transformations in the future.

The five categories for documenting production metadata are:

  • source attributes
  • hardware
  • software (including the filters/tone curves that were applied to "raw" scanning data)
  • the viewing environment
  • operator judgment/decisions

On the topic of source attributes, the group agreed that these should be limited strictly to the features that are relevant to machine processing of metadata for output — such as h x w dimensions, orientation, or polarity. All other information about the source should reside with the intellectual metadata that is associated with the image.

The group also noted that workflow metadata, such as the name of the scanning operator, might be relevant, although it would be important to justify the long-term benefit of collecting and retaining this information in electronic form.

 

Principles

If one can create persistent metadata inside the raster image, it will be possible to resolve a major architectural problem intrinsic to other electronic objects: keeping metadata with files over time. The potential solution to use targets or other embedded metadata is unique to digital images.

Point to existing practice in other domains whenever possible. (Note: specific examples were not cited during the discussion; this preliminary list was created by the moderator.)

Document the system and the rationale for how it was used. in order to provide means for assessment (by managers and users) and means for controlled representation and transformations (by systems).

Functional Requirements

Production/technical metadata must facilitate digital-to-analog conversions (by systems) to meet desired objectives for representation. This is essential for images to be distributed widely and to be retained for long periods in repositories. It must be assumed that devices will change over time in unforeseeable ways. When controls were used at input (during photography), it is highly desirable to avoid "artifacts" such as color shifts, elimination of highlight/shadow detail at output — to all analog formats (screen, print, film).

Production/technical metadata must facilitate image assessment (by managers and users). Rationale is a key element to assess system performance and image quality.

In this category, members of the breakout session saw a great deal of potential in recommending two practices - photographing targets with the artwork and documenting project specifications as a free-text narrative that would be associated with the image - rather than the traditional method of recording metadata as data elements with controlled vocabulary. During the discussion, the group concluded that targets can be a useful tool in documenting the performance of the complete system (all variables working in concert).

Element List

Drawing upon the MoMA case study, the list of issues recommended to the breakout session, and several conference handouts, as well as a suggestion by Carl Fleischhauer, the group drafted data element lists in two categories: metadata that are image-specific and project-specific.

Image Production Element List (Image-Specific)

(R) = required

(O) = optional

In-image target(s):

Name (R)


Defined values (O)
Calibration values (O)

System target(s), associated with object:

Name (R)


Defined values (O)
Calibration values (O)

Responsible agent

Rationale:

Free text (see also, Project Narrative section below)

Hardware:

Make, model, serial # of scanner


CCD
Camera lens settings (aperture, focal length)
Camera/scanner color space information (e.g., ICC profile)
Illuminant (type, temperature, filters)

Software:

Driver, model, version

Filters (R)
- discussion about cases where these might be saved as files external to the image


Shadow (black) point
Highlight (white) point
Threshold
Histogram (?)

Image Production Narrative (Project Specific)

The group discussed recording project- or collection-level production metadata and agreed that free prose might be the most cost-effective format to document why imaging was done in a particular way. Carl Fleischhauer used the analogy of scope and content notes as a way of thinking about how to present this information to the user/manager in an ordered, if not actually fully controlled way. The moderator drafted the following list for further review and discussion:

Description of the source material

Entire object scanned, or only a portion?


e.g., covers and blank pages excluded from books

Single type of material or an aggregate of multiple types


e.g., text only, text + illustrations

Description of the approach that was used in scanning

  • Were materials categorized into different production workflows or scanned in the same way regardless of format and content?
  • Which agent(s) was responsible for scanning?
  • When were materials scanned?
  • What system(s) were used?
  • Name(s) of scanner(s), description of its use, quality control
  • Name of processing software, description of its use, quality control
  • Description of filters/enhancements used during or after scanning
  • Were targets used? Which ones and why?
  • Was the viewing environment controlled? Calibrated to a standard?
  • How many digital images were produced for each item scanned?
  • Were texts converted to machine-readable form?
  • To what level of accuracy?
  • Was text marked up? According to guidelines or standards?
  • Was there an intended output for each copy?
  • Examples include: 1:1 publication-quality print, 8x10 photographic print, facsimile reprint, full-screen view, thumbnail view, etc.

Many items of technical metadata were identified beyond those defined by Group 1 (Characteristics and Features of Image). These were crucial in communicating to each actor in a chain of work how the system had been set for each prior step. Many elements might not travel beyond the institution making the decisions, but all were considered important for long-term understanding of metadata. This implied that they would be kept locally and that referral back to the source of the image was a long-term requirement.

Group 3: Image Identification and Integrity

Group 3 focused on the following issues:

  • The problems associated with Identification and Integrity are not unique to images. However, the particular solutions to solve these problems are unique.
  • Purpose and process dictate selection of metadata elements. It is difficult to pull apart technical elements from descriptive elements because they are viewed as a "unit". (Many "groups" contribute metadata. Example: process group might contribute elements about how the image was made (technical details) in addition, the organization/ institution that caused the image file to be created are responsible for other elements [descriptive].)
  • The need to expand the focus to include the general public. It is clear that there will be substantial growth in the quantity and type of original digital images that are created by the general public and then collected by or donated to cultural heritage organizations. Who is responsible for creation of the metadata for these files? How do we build a bridge to enable creation of this content?

The group identified the following elements associated with technical integrity

  • Format of the image
  • Format of the metadata
  • Consistent access to both of these is important throughout lifecycle of image
  • Links to descriptive metadata
  • Links to other images that are related. The type of link created implies certain architectural decisions. For example: link to descriptive metadata presupposes that that data is located outside of the image
  • Intrinsic characteristics of the image

 

Essential elements

  • Identification
  • Universal identifier
  • Why
  • For what purpose?
  • Education
  • Preservation
  • Access
  • Provides a means for defining methodology including documentation and rational
  • Who is involved with the file
  • Who created the image file?
  • Who commissioned the creation of the image file? (chartering entity)
  • Compared to:
  • Who is the responsible agency?
  • Who is the owner?
  • Where
  • Via subcontractor, photo lab
  • Correlation between who and where
  • What
  • Formatting issues may fit here
  • When
  • Set of necessary dates including: capture date/time, modification date/time, etc.
  • Checksum
  • Integrity measure can aid in answering questions such as -Does the compressed version of the image file contain the same bits as the original file?
  • Helps in articulating the relationship between metadata and image data
  • Navigational aid
  • Where are the technical metadata located?
  • How to get there
  • Encoding tools
  • What software was used to encode the image file?
  • Encoding needed in order to properly decode the image file
  • Parameters that are used by encoding tool
  • What defaults or other constraints/ parameters were used by or interpreted by the encoding tool? For example: bit depth, color range, etc.

The following issues were beyond the scope of Group 3:

  • Syntax for encoding
  • Migration issues
  • Defining "the work"
  • Logical digital object
  • Descriptive metadata

Points at which these elements can be applied

Only when "the work" has been defined because this provides the framework through which the appropriate metadata elements can be invoked. Also, integrity issues are defined by this.

Generational issues

  • It is essential that we identify relationships between generations of the image. It is possible that through the lifecycle of the image the identifier will have changed, the format will have changed, or the intrinsic characteristics may have changed. It is necessary to identify why these changes occur.
  • What is in the "Bag of Bits"?
  • raw image file
  • pixels — horizontal/ vertical
  • color scheme?
  • What is required? A canonicalization of the bitmap is crucial for technical integrity of image over time.
  • It is the means for validating that the compressed version of the file contains the same bits that are in the uncompressed original.
  • It will aid in articulating relationship between metadata and image data
  • What is required? A language for expressing derivations.
  • There is a need for a shared vocabulary that can be used to describe the chain of provenance for the image divided into logical levels
  • Layers

 


Final Session - Areas of Common Ground

The group agreed that:

  • A preliminary list of technical metadata elements is needed (listed above).
  • Elements should be categorized as mandatory or optional (proposed were date, time, external files, test charts, etc.).
  • Metadata can help to evaluate the utility of an image for a particular application or use. Such metadata would likely explain the rationale for and methodology of the creation of the image.
  • Industry standard metrics, when available, should be used for assessing images.
  • Methods of pointing at external test charts are needed.
  • Mechanisms for referring to external metadata file are important.
  • There is a need for image specific metadata — as well as metadata at higher levels such as metadata that described the image production process, metadata that described the object, and metadata that described the project of which the image was a part. Methods for creating this metadata are needed.
  • The ability to control the transformations of an image is important and that metadata that enables this is important.
  • In many circumstances, the metadata required and its content is dependent on the definition of what the underlying work the metadata being used to describe is, and that the choice of metadata flows from making that discussion.
  • Any solutions devised had to work in a broad array of contexts, not just in the context of formal archives and institutions creating images.

Next steps

The following action items were identified as next steps to be taken :

  1. Take the set of metadata elements the Characteristics and Features of Images Working Group drafted and create a document that contains definitions of each element, examples of its use, a rationale for why it was included, examples of related elements from current file formats, and a description of what category of user it is intended for. Steve Puglia and the working group will create the initial draft and to be reviewed by the Workshop attendees. The goal for having this done was set to be June 1, 1999. Add definitions and examples to the chart of Group 1.

  2. Build examples - show the example also in the various image file headers where it exists
    (Lou Sharpe suggests trying it to corporate intranet to show cost benefits?)
    Make metadata at the creation of files and accept that it will probably get divorced from the files.
    Chart of elements Framework:
    Element
    Definition
    Examples including existing file types/ implementations
    Used by (who)
    Used for (what method - person/system/…)
    Corollary in non-image file
  3. Articulate what tools need to be developed to assess how well an image was made.

  4. Stephen Chapman - tools needed to assess how well images are made - e.g. its utility (this is where it was important to understand the rationale for creation of the file)
  5. Desire to rely on industry metrics for characterizing tone, color, detail, scale etc. — tools needed (industry metrics) for each attribute we consider as important.
  6. Explore the viability of creating an integrated test chart.
  7. Do an inventory of existing tools and metadata standards(Lou Sharpe will write about the tools that exist and are needed (Don D'Amato will help).
  8. Develop guidelines and a template for the kind of data that should go into a project description statement that defines the options chosen and decisions made in a particular imaging project. There is a place for image specific metadata as well as project/ batch metadata. Controlling transformations is a critical issue. How can we best record these - in a target (written by hand), in a header, in a separate metadata file, or in a parsable field? Should we be creating in a parsable way metadata that is not currently usable (no instruments exist)? (ICC Profiles exist and can do this)
  9. Attempt a first cut at a canonical image format that will express equivalence of data that may have been stored in multiple image formats. Such a canonical format is useful for such things a checksumming images and digitally signing them. The creation of one or more "canonical" formats that will enable existing image files to translate from current formats for execution of checksums etc. The purpose is to keep track of fact that several files are actually the same thing which have been "saved" in different file formats. Perceived utility of this is that there are likely to be numerous versions (Cliff will do this within a month).
  10. Begin to look at what it would take to define a vocabulary that could be used to express the relationships between images. Note that project management recording of various data is something Group 2, Image Production and Reformatting, discussed; some of that data related back to usable parsable metadata. Vocabularies to express relationships between images? Formalisms of relation-types? (Howard & John Weise will write problem statement)
  11. Develop a guidelines document that would express the guidelines for images and image metadata that were enumerated during the meeting. Guidelines for "Preservation method rationale statement" as in a scope and content note relating to methods of practice.

Other "Next steps" included:

  • Identification
  • Audit trails
  • Enhancements
  • Tools to understand how images are made

A Process was proposed for moving beyond this initial Workshop. First, a meeting report will be drafted for comment, discussion, and review by the organizing committee and participants. The focus would be on how to include other stakeholders.

There was agreement that everyone needs to do more work. Each of the group sub-products needed further development and articulation. The organizing committee will take responsibility for directing the next steps.