Jaime G. Carbonell, Carnegie Mellon University (Panel Chair)
Elaine Rich, MCC (Panel Cochair)
David Johnson, IBM
Masaru Tomita, Carnegie Mellon University
Muriel Vasconcellos, Pan American Health Organization
Yorick Wilks, New Mexico State University
The goal of the JTEC report on machine translation is to provide an overview of the state of the art of machine translation (MT) in Japan, and to compare Japanese and U.S. technology in this area. The term "machine translation" as used here includes both the science and technology required for automating the translation of text from one human language to another.
In Japan, machine translation is viewed as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. As a result, several of Japan's largest industrial companies are developing MT systems, and many are already marketing their systems commercially. There is also an active MT and natural language processing (NLP) research community at some of the major universities and government/industrial consortia.
The principal use for MT today is in translating technical documentation for products to be sold abroad. The volume is still relatively small but appears to be growing steadily. There is also an increasing use of MT embedded in other applications, such as database retrieval systems, electronic mail, and (in the prototype stage) speech-to-speech translation systems.
Users have reported varying degrees of success with MT. While a few users have actually experienced lower productivity using MT compared to conventional approaches, productivity gains of 30 percent appear average. Higher numbers are typical for restricted domains and lower numbers for broader domains. Most uses of MT require some human pre- or post-editing to produce acceptable quality translations.
In both the U.S. and Japan, total funding for MT appears to be on a gradual but steady rise. Japanese commitment to MT is greater than that of the U.S., though the U.S. commitment is by no means insignificant.
In both Japanese and U.S. markets, MT is gaining gradual acceptance (Fig. 13), with Japan having and maintaining a lead. The same situation and trends are present for the integration of MT systems into other text processing software (Fig. 14).
Figure 13. Acceptance of MT
Figure 14. Integration of MT
Improved accuracy appears to be the single most important factor in determining how widely MT will be accepted. Japanese and U.S. efforts are expected to show steady improvement in accuracy between now and the mid- to late-1990s (Fig. 15).
MT requires multiple knowledge sources, which are large and expensive to build and maintain. Consequently, they are valued resources in MT research and are even more important in successful MT system deployment. Japan is currently leading the U.S. in private knowledge sources, and this lead may be widening (Fig. 16).
Figure 15. Accuracy of MT
Figure 16. Private Knowledge Sources
Although Japan also leads in shared knowledge bases (Fig. 17), the gap may narrow assuming continued funding from the Defense Advanced Research Projects Agency (DARPA) and other U.S. government agencies that are targeting some funds specifically at building shareable knowledge sources.
The basic science and technology underlying MT is natural language processing (or computational linguistics), which is the study of computer processing of language. Traditionally the U.S. has been a bastion of scientific research in this area, but research funds in the U.S. have been decreasing. Funding in Japan and Europe has been increasing and will surpass the U.S. level, if it has not already done so. Thus, the U.S. risks being surpassed (Fig. 18) in the one area where it has traditionally led: computational linguistics, both the basic theory and computational methods.
The U.S. is ahead of Japan in some areas. For example, the U.S. currently leads Japan in technological diversity, that is, the variety of approaches to MT (Fig. 19) and linguistic diversity, that is, the number of languages being developed (Fig. 20). Present trends indicate that although the U.S. will maintain its lead in technical diversity, the gap will narrow in linguistic diversity.
The U.S. also maintains a lead in other related research areas. For example, the U.S. leads in speech recognition technology (Fig. 21), but both the U.S. and Japan are working on the early integration of speech technology into speech-to-speech MT. The U.S. also has a narrow lead in natural language processing technologies (Fig. 22) such as automatic extraction of knowledge from text, NLP-based human-computer interfaces, routing and classification of texts for assimilation, etc.
Figure 17. Shared Knowledge Sources
Figure 18. Funding for Basic Research in Natural Language Processing
Figure 19. Technological Diversity
Figure 20. Linguistic Diversity
A substantial amount of research is being conducted in Japan. Figure 23 shows that funding for MT R&D in Japan is substantially higher than in the U.S., although U.S. funding is expected to increase. New Japanese corporate funding is more focused on productivity and commercialization. Figure 24 indicates the expected increase in commercial MT in Japan in response to this trend.
Figure 21. R&D in Speech Recognition and Speech-to-Speech MT
Figure 22. R&D in Other Natural Language Processing Technologies
Figure 23. Funding for R&D in MT Technology
Figure 24. Commercial Use of MT
While there are unlikely to be any major technology breakthroughs in MT during the next five years, steady progress is expected, especially in the quality of machine translations. As knowledge bases grow in quantity, quality, and comprehensiveness, the sharing of these intellectual properties will become more common. User interfaces are also improving, partially as a result of the positive feedback from the growing community of MT system users. As a result, the Japanese fully expect to see a return on the substantial investment that they have made and are continuing to make in MT.