Many libraries in the United States and in Japan have not yet cataloged allof their holdings, and may not have all of these records in machine-readableformat. A shortage of staff, sometimes rocky transitions from manual toautomated cataloging work flows, and the "information explosion," have createdarrearages or backlogs in cataloging departments that most institutions do notpublicize. The discipline of cataloging has devised methods and policies todescribe physical artifacts such as books, periodicals, microforms, soundrecordings, and maps. These descriptions are largely based on the physical"container" in which the information resides, and thus are consideredformat-based description. (For more on bibliographic description, see AACR2.)Intellectual description, that is, data about the subject of the information inthe "container" and a classification number reflecting subject analysis, iscreated by catalogers. The Library of Congress develops and maintains a verylarge thesaurus of controlled subject headings with documentation of referencesand related terms (see LC Subject Headings). The Library of Congress alsomaintains the LC Classification Schedule and the Dewey DecimalClassification Schedule. Variants of these classification schemes are beingused by libraries in Japan. The library community's cooperation in developmentof these systems has made possible interoperability and interchange ofbibliographic information on a global basis. Cooperatively built databases ofbibliographic information such as OCLC and RLIN in the United States, andNACSIS in Japan, provide economies of scale as libraries collectively createand share the world's bibliography.
A discussion of "granularity," or the level at which an item is described,is a conceptual key for understanding digital information organization. "Itemlevel" cataloging is probably most familiar to readers as they use onlinelibrary catalogs to find monographs and multimedia materials. That is, onecataloging record is made for one work. Archives and special collections oftencatalog at the "collection level," insofar as it is not feasible toindividually describe every letter in a huge archive or assign meaningfulclassification numbers to millions of photographs. With indexed journalarticles, the "item" to be cataloged might be the title of the journal alongwith an accounting of the individual issues, or holdings. Article-levelindexing information gives further description of the intellectual content of"pieces" of each issue of the journal. Conversely, one catalog entry mightexist only at the title level of the serial publication without the morein-depth indexing information. Clearly, the article-level indexing providesgreater access and description; it is also more expensive and labor-intensiveto create and maintain. The topic of granularity of description is importantbecause the creation of cataloging data is one of the more expensive aspects oftraditional library methods of providing access to materials.
The digital world requires this same cataloging data as well as informationnecessary to structure and present electronic documents. The library communityis cooperating with professionals from the computer science, text encoding, andmuseum communities to develop the Dublin Core metadata standard, afifteen-element set to describe digital resources (see Dublin Core Metadata).Generally, metadata as discussed in this context falls into three majorcategories.
Physical/structural metadata is information about the digital object and itsrelationship to other digital objects in a repository. Structural metadatamight include file location on a server or in a repository; file format; filesize; relationship to other files; sequence, or date of creation. For example,a sequence of 35 mm photographic negatives may have been imaged. To present thenegatives in the order in which the photographer created them, information isneeded to structure the images in the original sequence in addition to theformat and size of the file, date of creation, and internal numbering scheme.To extend the analogy to books on library shelves would be data about the shelfnumber (physical location); number of pages (size); the fact that the pages arenumbered (sequencing); and binding (defining the item). This indication is alogical way to structure these materials as well as a means to indicateprovenance of the materials.
The term intellectual metadata refers to information that provides access tothe subject or content of a digital object. Intellectual metadata can bethesaurus terms associated with a file or item; indexing achieved by full textsearch and retrieval as described in Dr. Croft's Chapter 5; classificationaccording to standard schema; and associations with related sources ofintellectual data such as bibliographies, archival finding aids, or catalogingrecords. Again using the example of 35 mm negatives from a roll of film,intellectual metadata would consist of who created the images(photographer/author), controlled or uncontrolled vocabulary terms describingthe images, and perhaps a classification number. The related data might be thephotographer's captions or references to a work in which the images werepublished.
Rights and permission/access management metadata functionally describe thegoal of the encoding of rights and permissions information at the computer filelevel for digital objects. For example, an archival collection of photographsmay have been made available to researchers, one at a time, in a specialcollections reading room for many years. However, the literary trustee of thatcollection may nonetheless object to widespread dissemination of these imageson the World Wide Web. Thus, rights and permissions data must be associatedwith each image to indicate its status for distribution.
As mentioned previously, the creation of metadata has traditionally been oneof the most expensive aspects of making library materials available. At theLibrary of Congress, one of the key decision factors when selecting collectionsfor digitization is whether or not cataloging information already exists for acollection, especially in terms of intellectual metadata. The costs of scanningand even having text keyed and proofed are minimal in comparison with payingsubject experts and professional catalogers to describe materials according tostandardized methodology. Strategies for minimal level cataloging and reducingcataloging access points abound, but the activity remains very expensive.
There are aspects of current library practices in the United States andJapan that impact the potential reality of a global digital library at manylevels. The costs and complexity of creating appropriate data needed to presentmaterials reformatted digitally have been discussed. Additionally, if thisinformation is not created accurately and with future presentation needs inmind, the digital materials can be unusable. For example, if a book ismislabeled or misshelved in library stacks, it is still available by inspectionby a deck attendant or users. Conversely, if a digital file is not linkedcorrectly to its related bibliographic record, finding aid, or previous pagesin a sequence, it is essentially irretrievable. Thus, data must be created andchecked for quality at a high level to ensure usability (see LC RFP 96-18).