Site: Fujitsu Laboratories, Ltd.
Multimedia Systems Laboratories (MSL)
Kawasaki 211-88, Japan
Date Visited: 23 March 1998
WTEC Attendess: M. Shamos (report author), T. Ager, L. Goldberg, R.D.Shelton
Fujitsu Limited is a $36 billion diversified company focused on personalcomputers, server systems, network computing and electronic devices, includingmainframes and disk drives. Fujitsu Laboratories, Ltd. was formed in 1968 as awholly owned subsidiary of Fujitsu. It has 1,500 employees and seven laboratorydivisions. The panel visited the Multimedia Systems Laboratories, whichdevelops high-speed, large capacity multimedia processing and infrastructuretechnology. Development of networked multimedia information service productstakes place at the Personal Systems Laboratories in Akashi and Fukuoka. Thislaboratory was represented by Mr. Tanahashi.
Mr. Akimoto gave an introduction to Fujitsu's Media Integration Laboratory.Fujitsu management's view of a digital library goes beyond the academic settingto encompass corporate information management. Digital library developmenttakes place within Fujitsu, not Fujitsu Laboratories.
Dr. Naoi presented the research being performed in the Media IntegrationLaboratory. Fujitsu is working on corporate information management, part ofwhich involves digitizing archival paper records. It is not sufficient toperform OCR on such materials because they must also be indexed. Fujitsu hasdeveloped a system for identifying document titles from font, location size andseparation data. It requires about one second per page on a 166 MHz Pentium andidentifies the title correctly more than 90% of the time. The correct title isamong the first three candidates 97% of the time.
If the documents are in tabular, rather than free structure, such asbusiness forms with known or learnable field locations, the system is able toachieve nearly 100% recognition from type in 1.5 seconds at any resolution of200 dpi or greater, even in the presence of noise or incomplete input. Itoperates by building a forms dictionary, recognizing pairs of parallel linesand extracting keywords that identify fields. It then matches lines and scannedkeywords from the input document to isolate the form type.
Mr. Muramatsu explained Fujitsu's digital security technology, which relatesto four areas:
Transmission of reliable documents is performed by secure archivers at thetransmitting and receiving ends. The devices code and confirm transmission. Ifnecessary, the recipient's copy can now be encoded as an "original" while thesender's version becomes a copy. Through internal clock records, the originalof any document can be tracked, and a list of recipients and retransmittersgenerated. Revisions can be controlled and portions modified after initialcreation can be identified.
The secure archiver is the first commercial technology of its kind. Suchtechnology is critical for digital libraries because of the need to detectimpostor documents and those that have been altered without the consent of theoriginal author.
Japan is considering establishment of an electronic notarization office,primarily for dating and archiving contractual agreements between corporations.While it will be inexpensive to have documents electronically notarized, theoffice is not intended for routine commercial transactions, such as credit cardpurchases, whose volume would be overwhelming.
Mr. Horii explained Fujitsu's digital library program, which consists ofgovernment projects, joint research with universities and product development.The Electronic Library Research Group, chaired by Prof. Nagao (now President ofKyoto University) was formed in 1990 to study the functions and problems ofdigital libraries. Fujitsu joined in 1992 and developed Ariadne, a prototypedigital library search and retrieval system, in 1994. The remainder of thegroup includes 3-4 universities (including the University of LibraryInformation Science) and about nine companies.
Fujitsu's digital library product is called iLis and incorporates conceptsfrom Prof. Nagao's Ariadne system. An iLis search was demonstrated in a testdatabase of several thousand documents. The system searches bibliographic data,tables of contents and body text. It has the capability of searching forsynonyms, translated words and inflected forms. The user's personal searchhistory is maintained. Search results can be viewed via a page-turning systemand can be read horizontally or vertically, the latter being more convenientfor Japanese. Output can be read aloud by the computer, although the voice istypically mechanical. iLis provides a mechanism for users to communicate withlibrarians so they can be assisted in search functions. This is referred to asa "question-and-answer" system, although the answering is performed by humans,not software.
A MITI-funded next-generation digital library project at Fujitsu employsabout 20-30 people. The project is scheduled to last until 1999. Fujitsu's roleis to develop retrieval technologies and integrate them into a new prototypesystem.
Mr. Matsui presented Fujitsu's Terass (Terabyte Search Server) product,which is able with inverted indexing to search a gigabyte file in about 1/20second. Japanese full-text search is complicated by the fact that the languagedoes not use word separators. Terass can search for any string, regardless ofwhether it consists of complete words. Part of the Terass technology is itsindex structure algorithms, which store the pointer of index, which includesthe low frequency of occurrence of Japanese character strings. This methodwould not work on English words. Japanese kanji characters do not repeat nearlyas often, so storing gaps is efficient. On a Japanese-language patent databaseof size 1.6 GB, Terass was able to create an index in 4.3 hours that allowedsearching in .23 seconds for OR queries. For the previous system, an index wascreated in 22.3 hours in which searching took 19.2 seconds for OR queries.
Mr. Ushioda explained cross-lingual information retrieval. The problem beingworked on is to retrieve English documents related to a given Japanesedocument. The terms appearing in the source document are ordered by frequencyof appearance. Each term is then mapped to a corresponding English term(translation). The set of terms in the source document can be regarded as avector in a vector space of the corresponding English terms. The magnitude ofeach coordinate of the vector is related to the frequency of occurrence. EachEnglish document in the collection to be searched can also be viewed as avector in the same coordinate system. The inner product of the source vectorwith each vector of the target population is calculated and the resulting hitsordered by decreasing magnitude. This method succeeds in retrieving highlyrelevant material.