EXECUTIVE SUMMARY
INTRODUCTION
Digital information organization (DIO) refers to methods of rendering large
amounts of information into digital form so it can be stored, retrieved and
manipulated by computer. An example of digital information organization is the
digital library, a storehouse of largely unstructured text documents that is
useful only if it can be searched readily. Another example is the digital
museum, which contains pictorial and three-dimensional objects that are much
more difficult to digitize and search than text, and that are susceptible to
scanning and optical character recognition (OCR). Many other requirements for
DIO exist, including corporate databases, videotape collections, map
information, census statistics and financial data.
The rapid rise in computer and Internet use has resulted in the creation of
vast quantities of digital information being created and transmitted. For
example, virtually all business documents are now created in digital form,
either by computers directly (in the case of machine-generated forms ) or by
humans using word processing software. The fact that this material is digitized
makes it amenable to automated storage and retrieval. The sheer volume of it,
however, makes it imperative to develop suitable organizational techniques.
The very health of institutions depends on their ability to manage
information effectively, whether for educational, research, business, military
or governmental purposes. Therefore DIO is a critical technology for entities
of all sizes, from small corporations to government departments and even entire
nations.
DIO systems employ an amalgam of various technologies, including scanning,
OCR, digital storage techniques, data compression, indexing and search
algorithms, display devices and the Internet. These technologies must be
integrated properly and scaled to enormous proportions to allow humans to deal
effectively with the flood of digital information now being made available.
STUDY OBJECTIVES AND PROCESSES
The purpose of this WTEC study was to investigate Japanese hardware and
systems for DIO, with a focus on digital libraries. The study was supported by
the National Science Foundation (NSF) and the Defense Advanced Research
Projects Agency (DARPA). The panel members were divided into two teams, one of
which visited primarily academic and library sites, while the other focused on
commercial organizations. The WTEC panel visited 18 sites during its one-week
visit to Japan (March 23-27, 1998): nine corporations, five universities, three
libraries and one museum.
The teams were composed of professionals from different disciplines who made
observations in the following areas:
- technological developments in hardware and software
-
- system architectures
- fast search techniques
- multilingual capabilities
- multimedia systems
- practical DIO applications
- cooperation among government, industry and universities
- economic and policy issues
-
- government funding
- models for fee collection for use of digital materials
- legal issues, particularly copyright
UNITED STATES - JAPAN COMPARISONS
The WTEC team was interested in comparing the relative progress of the
United States and Japan on issues relating to DIO. The panel's conclusions are
informal in nature only, as team members, after a literature review, visited 18
sites over a period of a single week during 1998 and cannot claim to be
familiar with even a plurality of developments in the United States and Japan.
However, some patterns emerged that are summarized in Table ES.1
Table ES.1
State of the Art of Digital Information Organization in Japan Compared to the
United States
|
State of the Art of Digital Information Organization in
Japan
|
Japan Status
|
Trend
|
|
Systems
|
0
|
Ý
|
|
Display technology
|
+
|
Ý
|
|
Virtual reality, immersive technology
|
+
|
Ý
|
|
Architecture
|
0
|
Ý
|
|
Digitization of content
|
+
|
|
|
Utilization of digitized content
|
-
|
|
|
Catalog accessibility
|
-
|
|
|
Catalog scalability
|
0
|
|
|
Text search
|
-
|
|
|
Translingual search
|
0
|
|
|
Image/video processing
|
-
|
|
|
K-12 education using digital techniques
|
-
|
|
|
Commercialization of digital libraries
|
+
|
Ý
|
|
Digital library policy
|
+
|
|
The notation "+" means Japan is perceptibly beyond the United States in
capability; "- " means Japan is perceptibly behind; "0" means no significant
difference was observed and blank means no conclusion could be drawn. An upward
arrow indicates that any observed difference is likely to increase in the
future, or, if there is no difference, that Japan is believed to be improving
over the United States.
For both the United States and Japan, the following issues must be addressed
if digital library efforts are to progress expeditiously:
- digital library policy
- intellectual property rights
- scalability
- translingual and multiple character set capabilities
- architecture for global indexing, search and access
- sharing content; unless institutions are able and willing to share digital
materials, a worldwide library can never be realized
CONCLUSIONS
Systems and Architecture
- DIO systems in the United States and Japan are based on common, integrated
technologies to provide a spectrum of services such as information capture,
cataloging (metadata), indexing, storage, search/query, retrieval, asset/rights
management, security and distribution.
- Japanese businesses are reengineering their operations to take advantage of
advances in information processing, which in turn drives the development of new
architectures.
- Japan concentrates on customized system developments rather than
off-the-shelf or reusable components.
- Japanese corporations are building hardware and system components
specifically directed to capturing the DIO market, incorporating advanced
security features such as digital watermarking, digital notaries and secure
archivers.
- Mission-specific, well-funded digital library efforts are driving
development of commercial systems and architectures. Many different systems and
architectures are employed in Japan's digital libraries and museums, some of
which are comprehensive and innovative.
- Japan's Next Generation Digital Library Project is developing a
multi-tiered reference architecture for future distributed libraries that uses
agent technology, messaging middleware and CORBA object management.
Text Processing
- Japan is producing extremely fast large-capacity hardware/software search
systems.
- Japan attempts to support Chinese, Japanese and Korean (CJK) text
retrieval. Cross-lingual retrieval, however, is limited.
- While Unicode is sometimes used, perceived deficiencies in its Asian
language support lead to the use of specialized representations.
- The Japanese are successfully combining search and browsing technologies
such as text clustering and thesaurus generation.
- Japan lacks significant information retrieval (IR) datasets.
- There is little sharing of IR technology in Japan between organizations,
resulting in much reinvention.
Digital Imaging and Multimedia
- Japan leads the United States in digital display development by about two
years.
- Japan is on a par with the United States in digital image acquisition.
- Japan lags behind the United States in Internet use by about two
years.
- Entire businesses and business units in Japan are devoted to multimedia
development. A major theme of this development is kansei, a Japanese
term meaning roughly that the look and feel of the system must harmonize with
the task being performed.
- Japan possesses advanced video storage and retrieval systems, including
such functions as scene segmentation, face tracking, caption recognition,
similarity matching and keyword-based retrieval.
- Japan leads in virtual reality and immersive experience environments.
Cataloging: Description, Access and Scalability
- Japanese libraries are digitizing catalogs on a grand scale: Kyoto
University expects to have over a million items in its catalog by the year
2000.
- Japanese are more willing than Americans to handle and scan rare and
fragile documents.
- Reluctance of publishers to make content available digitally, even for a
fee, is a severe barrier to access.
- Digitization efforts in Japan are performed independently by different
organizations; resources are rarely shared.
- Japan is participating in developing international metadata standards.
- Content production and metadata generation, which require substantial human
effort, are inherently non-scalable.
- The cost of scanning and indexing is minimal compared with the cost of
metadata creation.
- Keyword-based search methods do not scale well because the number of query
hits increases with the size of the collection.
Education Using Digital Libraries
- Distributed digital libraries will revolutionize education and learning,
particularly in the area of distance education.
- Digital libraries will provide the following:
-
- resources for teaching and curriculum development
- environments for learning and exploration
- environments for publishing and broadcasting (digital journals)
- Japan has mandated and funded the development of all-digital libraries.
Nara Institute of Science and Technology (NAIST) maintains an operational
all-digital library.
- Japan is establishing centers that will focus on the creation of digital
multimedia content.
- Japan is creating programs to train professional digital librarians. NAIST
is discussing the possibility of establishing a graduate program for digital
librarians.
- The impact of digital libraries on K-12 education in Japan seems to be
minimal.
Policy, Intellectual Property and Economics
- Japan has a clearly articulated national information infrastructure policy
that views DIO as crucial to an "advanced information society." The policy
attempts to establish the following:
-
- worldwide information access in each Japanese home
- library networks
- multimedia centers
- high definition television (HDTV)
- Japan allows and promotes cooperation among different government agencies,
universities and corporations.
- Japan's copyright system is amenable to the digitization and distribution
of digital information for the following reasons:
-
- multimedia copyright is more highly developed in Japan
- fair use is broader in Japan than in the United States
- Japanese law provides for extensive compulsory licensing
- more licenses can be obtained from Japanese performing rights
societies
- Japan is more accepting of automated "meter-click" charging mechanisms
Published: February 1999; WTEC
Hyper-Librarian