Site: Fujitsu Laboratories, Ltd.
Multimedia Systems Laboratories (MSL)
4-1-1 Kamikodanaka
Nakahara-ku
Kawasaki 211-88, Japan
http://www.fujitsu.co.jp/hypertext/flab/index-e.html

Date Visited: 23 March 1998

WTEC Attendess: M. Shamos (report author), T. Ager, L. Goldberg, R.D. Shelton

Hosts:

BACKGROUND

Fujitsu Limited is a $36 billion diversified company focused on personal computers, server systems, network computing and electronic devices, including mainframes and disk drives. Fujitsu Laboratories, Ltd. was formed in 1968 as a wholly owned subsidiary of Fujitsu. It has 1,500 employees and seven laboratory divisions. The panel visited the Multimedia Systems Laboratories, which develops high-speed, large capacity multimedia processing and infrastructure technology. Development of networked multimedia information service products takes place at the Personal Systems Laboratories in Akashi and Fukuoka. This laboratory was represented by Mr. Tanahashi.

Mr. Akimoto gave an introduction to Fujitsu's Media Integration Laboratory. Fujitsu management's view of a digital library goes beyond the academic setting to encompass corporate information management. Digital library development takes place within Fujitsu, not Fujitsu Laboratories.

DIGITIZING CORPORATE INFORMATION

Dr. Naoi presented the research being performed in the Media Integration Laboratory. Fujitsu is working on corporate information management, part of which involves digitizing archival paper records. It is not sufficient to perform OCR on such materials because they must also be indexed. Fujitsu has developed a system for identifying document titles from font, location size and separation data. It requires about one second per page on a 166 MHz Pentium and identifies the title correctly more than 90% of the time. The correct title is among the first three candidates 97% of the time.

If the documents are in tabular, rather than free structure, such as business forms with known or learnable field locations, the system is able to achieve nearly 100% recognition from type in 1.5 seconds at any resolution of 200 dpi or greater, even in the presence of noise or incomplete input. It operates by building a forms dictionary, recognizing pairs of parallel lines and extracting keywords that identify fields. It then matches lines and scanned keywords from the input document to isolate the form type.

SECURE ARCHIVER

Mr. Muramatsu explained Fujitsu's digital security technology, which relates to four areas:

Transmission of reliable documents is performed by secure archivers at the transmitting and receiving ends. The devices code and confirm transmission. If necessary, the recipient's copy can now be encoded as an "original" while the sender's version becomes a copy. Through internal clock records, the original of any document can be tracked, and a list of recipients and retransmitters generated. Revisions can be controlled and portions modified after initial creation can be identified.

The secure archiver is the first commercial technology of its kind. Such technology is critical for digital libraries because of the need to detect impostor documents and those that have been altered without the consent of the original author.

Japan is considering establishment of an electronic notarization office, primarily for dating and archiving contractual agreements between corporations. While it will be inexpensive to have documents electronically notarized, the office is not intended for routine commercial transactions, such as credit card purchases, whose volume would be overwhelming.

DIGITAL LIBRARY PRODUCTS

Mr. Horii explained Fujitsu's digital library program, which consists of government projects, joint research with universities and product development. The Electronic Library Research Group, chaired by Prof. Nagao (now President of Kyoto University) was formed in 1990 to study the functions and problems of digital libraries. Fujitsu joined in 1992 and developed Ariadne, a prototype digital library search and retrieval system, in 1994. The remainder of the group includes 3-4 universities (including the University of Library Information Science) and about nine companies.

Fujitsu's digital library product is called iLis and incorporates concepts from Prof. Nagao's Ariadne system. An iLis search was demonstrated in a test database of several thousand documents. The system searches bibliographic data, tables of contents and body text. It has the capability of searching for synonyms, translated words and inflected forms. The user's personal search history is maintained. Search results can be viewed via a page-turning system and can be read horizontally or vertically, the latter being more convenient for Japanese. Output can be read aloud by the computer, although the voice is typically mechanical. iLis provides a mechanism for users to communicate with librarians so they can be assisted in search functions. This is referred to as a "question-and-answer" system, although the answering is performed by humans, not software.

A MITI-funded next-generation digital library project at Fujitsu employs about 20-30 people. The project is scheduled to last until 1999. Fujitsu's role is to develop retrieval technologies and integrate them into a new prototype system.

Mr. Matsui presented Fujitsu's Terass (Terabyte Search Server) product, which is able with inverted indexing to search a gigabyte file in about 1/20 second. Japanese full-text search is complicated by the fact that the language does not use word separators. Terass can search for any string, regardless of whether it consists of complete words. Part of the Terass technology is its index structure algorithms, which store the pointer of index, which includes the low frequency of occurrence of Japanese character strings. This method would not work on English words. Japanese kanji characters do not repeat nearly as often, so storing gaps is efficient. On a Japanese-language patent database of size 1.6 GB, Terass was able to create an index in 4.3 hours that allowed searching in .23 seconds for OR queries. For the previous system, an index was created in 22.3 hours in which searching took 19.2 seconds for OR queries.

Mr. Ushioda explained cross-lingual information retrieval. The problem being worked on is to retrieve English documents related to a given Japanese document. The terms appearing in the source document are ordered by frequency of appearance. Each term is then mapped to a corresponding English term (translation). The set of terms in the source document can be regarded as a vector in a vector space of the corresponding English terms. The magnitude of each coordinate of the vector is related to the frequency of occurrence. Each English document in the collection to be searched can also be viewed as a vector in the same coordinate system. The inner product of the source vector with each vector of the target population is calculated and the resulting hits ordered by decreasing magnitude. This method succeeds in retrieving highly relevant material.


Published: February 1999; WTEC Hyper-Librarian