In this section, the overall impression of the text and IR research beingdone in Japan is summarized, and then it is compared to the work being done inthe United States.
The first observation is that the Japanese community of computer andinformation scientists working in the IR and text-related areas is smaller thanthe comparable communities in the United States and Europe. As a result,Japanese research in these areas tends to follow directions and initiativesbegun in the United States. Individual projects are of good quality and areproducing interesting technology, but progress has been somewhat impeded by alack of a Japanese version of TREC or equivalent test collections. Although thevalue of recall/precision measurements is hotly debated in the IR community,there is no doubt that the culture of experiment and comparison in IR and TREChas led to significant improvements in both the understanding and performanceof text access techniques. There have been some efforts to develop testcollections for Japanese and this has resulted in a recent Call forParticipation for IREX (Japanese Information Retrieval and Extraction Exercise,http://cs.nyu.edu/cs/projects/proteus/irex). IREX is organized by a committeeof people from Japanese companies and universities, and is modeled on theTIPSTER and TREC programs. In addition, because TREC has made Chinesecollections available, there have been a large number of recent papers onChinese text retrieval.
Text-related research in Japan covers essentially the same areas as theUnited States, although there continues to be a strong emphasis on indexingtechniques and speed. The differences that arose from the language-dependentaspects of Japanese text are rapidly disappearing.
Japanese companies appear to be focusing on developing the best commercialAsian language search systems for applications in Japanese, Chinese and Korean.There is, however, considerable competition even in this area in thatconsiderable research and development of Chinese IR is underway in China,Singapore, Taiwan and Hong Kong, and Korea has a substantially longer historyof IR research than Japan. One general criticism is that there seems to be toomuch reinvention of basic IR technology in Japan. Nearly every group visitedwas developing its own search engine (or engines). Licensing of U.S. searchengines with Japanese capability such as Verity or Infoseek is limited but mayincrease as it is demonstrated that search technology is essentiallylanguage-independent.
Current Japanese research and text search techniques do not offersignificant benefits for English applications. The research is complementary tothat being done in the United States, and the results tend to be incremental innature. As the community of researchers in this area increases, however, we mayexpect to see more innovation and exploration of new ideas.
A number of groups in Japan are studying information visualization,architectures for scalable IR systems, and the application of natural languageprocessing (NLP) techniques to IR. These are areas that could have asignificant impact on the development of text-based systems. For example, theuse of NLP techniques for IR has been studied in the United States for sometime because of the obvious potential benefits of a system that "understands"the query better than a word-based system. Despite those potential benefits,research using quantitative evaluation based on test collections such as TREChas never demonstrated any retrieval effectiveness improvements from NLP. Onthe other hand, there is some evidence that language-based techniques may workbetter in Japanese than in English (Fujii 1997), and this may lead to a betterunderstanding of text retrieval in general. Information visualization isanother area where the opportunity exists for substantial innovation andsynergy between Japanese research groups. An example of a visualizationinterface being developed and deployed by IBM Japan is shown in Figure 5.1.
In conclusion, the WTEC panelists' view was that text-related research inJapan has been lagging behind that of the United States and Europe, but thatsubstantial recent investments by companies and universities in this area meanthat this gap is rapidly narrowing. One should expect to see substantially morenew techniques and research directions originating in Japan in the nearfuture.
Fig. 5.1. IBM information outlining: search, extraction, categorization, andabstraction.