Site: NTT
Human Interface Laboratories
1-2356 Take Yokosuka-Shi
Kanagawa 238-03, Japan

Date Visited: May 25, 1995

Report Author : R. E. Kraut



S. Chipman
J. Foley
E. Glinert
J. Hollan
R. E. Kraut
T. Sheridan
T. Skelly


Rikuo Takano, PhD
Vice President and Executive Manager NTT Human Interface Laboratories
Hiroshi Ishii, PhD
Senior Research Engineer and Supervisor Advanced Video Processing Laboratory NTT Human Interface Laboratories
Akihito Akutsu
Research Engineer, Advanced Video Processing Laboratory, NTT Human Interface Laboratories
Gen Suzuki
Leader, Visual Communication Environment Group NTT Human Interface Laboratories


NTT, one of the largest telecommunications companies in the world, has research depth in virtually all areas of telecommunications, including human-computer (and communication) interface research and development. Overall, NTT employs about 3,000 research engineers in 13 laboratories. The total research and development budget was approximately $2.4 billion in 1994. NTT attempts to advance science and engineering generally, and encourages its researchers to publish in both Japanese and international conferences and journals. The JTEC team visited the Human Interface Laboratories in Kanagawa, home to approximately 400 researchers. The Human Interface Laboratories are oriented towards developing telecommunications services and systems that customers want, are natural, and are easy to use, as are some of the underlying human interface technologies for realizing these services. The goal is not to improve people's ability to control computers per se but rather to facilitate the transfer of information or communication with other people through computing and telecommunications.

The Human Interface Laboratories comprise six laboratories: the Visual Communications Laboratory (teleconferencing, computer supported cooperative work, picture coding, virtual reality, and interactive digital video); the Speech and Acoustics Laboratory (speech coding, speech recognition, speech synthesis, acoustic processing, techniques for the evaluation of human hearing, speech communication); the Multimedia Systems Laboratory (video on demand, multimedia communication, document image processing, and advanced facsimile systems); the Advanced Video Processing Laboratory (architectures for the combination of broadcast and telecommunication services, video compression coding, techniques for retrieving and handling video, neural networks); the Autonomous Robot Systems Laboratory (robot systems, robot mechanisms, robot vision, and robot action planning), and the Furui Laboratory (speech recognition, speaker identification).


Much of the work in these laboratories is on projects that are close to service deployment or product development (1-3 years), and researchers often work closely with developers in the implementation process. Few of the projects in these laboratories are basic research. Many of the service-oriented projects show substantial imagination, and have the potential to add generally useful techniques to people's ability to handle information or to communicate with other people enabled by telecommunications networks. The JTEC team saw several projects that were especially attractive or interesting:

Video Handling

While video provides the ability to record massive amounts of information in detail, it is difficult to skim or handle the video after it has been recorded. Akihito Akutsu has developed two techniques for creating still (or paper) representation of video for indexing and skimming. The PaperVideo Project uses image processing on the abrupt changes in color and patterns at scene cuts in movies to automatically detect the beginnings of scenes. Frames at the beginning of the scenes can be printed out to provide a storyboard for the movie or can be used in a hypertext-like system to index and jump to excerpts of the movie. The Video Tomography Project (Akutsu and Tonamura 1994) uses motion detection and tomographic methods to follow a moving image against a fixed background in the face of camera pans, zooms, and tilts. Through these techniques, one can construct a spatiotemporal index to a sequence of video, for example, allowing a strobe-like sequence of images of a central figure within a constant background to serve as the user interface to a video playback system.


MUDs or multi-user dungeons are real-time computer-based communication systems, often set within an imagined setting and configured as a set of rooms. In a typical MUD, people log in, assume identities, and navigate through the setting by commands like "go north" and "go through the door"; manipulate computational objects by executing commands like "look," "take," and "read"; and communicate with other individuals or groups by executing commands like "whisper to" (to communicate with a single, named person) or "talk" (to communicate with everyone present in a single room). Communication is done by typing.

NTT has implemented a prototype, multimedia MUD-like service called Interspace, that allows people to navigate through a graphically rendered space and communicate with other people by text, telephone or video. The system is designed to enable distance education, as participants "attend" lectures and discussions, or catalog shopping, as participants wander through a virtual shop and talk to sales people. The system is implemented as a client running on a PC that downloads scene renderings from a central server over an ISDN link. Narrow bank ISDN is also used to support voice and visual communication among people visiting the same room. In its current implementation, up to 20 people can simultaneously visit the same room, and see small (approximately 1 in. x 1 in.) live video images of each other while talking to each other over a telephone handset. Because participants can communicate by voice, the system controls the potential cacophony in crowded rooms, by weighting the sound volume between people as a function of their closeness in the simulated space. NTT is prototyping a network service in collaboration with several Japanese universities and with retailers.


Hiroshi Ishii and his colleagues have implemented a series of collaboration systems (Ishii, Kobayashi, and Arita 1994), including TeamWorkStation and ClearBoard, in order to integrate videoconferencing with shared workspace. ClearBoard-2, the latest system, integrates interpersonal space and shared workspace by overlaying the live video images of remote participants with the bitmaps of electronic shared workspaces. The effect is of two people conversing through a window, or more accurately, through the glass on the computer monitor; they can see and talk through the window, but also draw or type over each others' images (and each see all images and text in the appropriate spatial orientation). Studies suggest that people use the system with more whimsy than they do many other computer-supported work applications, for example by drawing mustaches and beards over the faces of their partners. People use the eye contact features of the system to remain aware of what their partner is attending to. This gaze awareness is useful for managing turn taking and references to topics of conversation. The various iterations of this prototype from 1988 to 1994 demonstrated gaze awareness, shared drawing, and improved naturalness. At least one product has come from this stream of research: TeamWorkStation has become a product available for use over ISDN and LAN connections.

Systems For The Hearing Impaired

The Human Interface Laboratories have built systems to test and evaluate human hearing, with applications for diagnosis and the evaluation of hearing aids. They are also working on hearing aids with signal processing that can boost the intensity of speech signals without simultaneously amplifying noise and other nonspeech sounds.


The Human Interface Laboratory is conducting a wide range of imaginative services research on human-to-human telecommunication. In terms of quantity and quality, the work is the equal of that done at AT&T Bell Labs and Bellcore in the United States. Compared to its U.S. counterparts, however, the research is less influenced by psychological and social theories and studies of human behavior, and is less likely to use behaviorally-oriented, empirical techniques. The focus is on relatively short-term service development. Again, compared to the United States, the commercialization of services derived from the research seems to happen more rapidly.


Akutsu, A., and Y. Tonomura. 1994. Video tomography: An efficient method for camerawork extraction and motion analysis. Proceedings, ACM Multimedia 94 Oct 15-20:349-356.

Ishii, H., M. Kobayashi, and K. Arita. 1994. Iterative design of seamless collaboration media. Communications of the ACM 37(8):83-97.

Published: March 1996; WTEC Hyper-Librarian