Robert E. Kraut
This chapter describes some major differences in network-based human-computer interaction research in Japan and the United States. It considers the use of computers as communications devices among people, in support of both small groups and a networked nation. It concludes that the United States is leading Japan in both domains.
Human-computer interaction involves the use of computers to support not just individuals but also groups. Computer-supported cooperative work (CSCW) is the research domain investigating the use of computers as if they were telecommunications devices to support small-group, human-to-human communication. Research on this topic is important because groups are a major mechanism that organizations use to get work accomplished. The importance of groups and computer technology to support them is growing in response to two organizational trends. One is the spatial distribution of organizations; the wide geographic distribution of much research and development is typical of this trend. During the 1980s, for example, software development at AT&T often involved system engineering and requirements analysis performed by professionals in New Jersey who would send their specification documents 2,000 miles away to Colorado for software engineers to code. The success of the software development process depended on the success of communication between those groups, as they tried to set priorities and resolve ambiguities in the specifications (cf. Curtis, Krasner, and Iscoe 1988). Software development that is split between the United States and India illustrates a need to coordinate communication across time zones as well as space. The other major social change that is occurring is a disaggregation of organizations at many levels. Downsizing, out-sourcing, and using temporary workers or professional service firms are all examples of the trend. These trends accentuate the need to coordinate communication across organizational boundaries as well as within them. In addition, they place a special demand on organizations to better capture organizational memory -- the knowledge, procedures, and skills that organizations routinely accumulate as they go about their business. This knowledge can be as mundane as knowing whom to call to have a purchase order expedited or as critical to the mission as knowing why particular development projects have failed in the past. During a time in which workers are only loosely tied to their employers, organizations should not want their important information to reside solely in the heads of workers who may not be with them for very long.
Most computer-supported cooperative work applications are designed to improve human communication or to capture organizational memory. The Japanese are conducting little research on CSCW, at least research that is visible to Western eyes. Figure 3.1 is a graph of the number of publications in the major international conferences on CSCW over time. In the early days, for example 1986, North American authors did virtually all the published research. By the 1994 conference, that contribution had dropped to about 60% of the published research. The percentage of research articles by European authors is increasing, up to about 35%. In the most recent CSCW conference, Japanese authors published only 4% of the research articles. This disparity in publication occurs even though the Japanese represented a sizable minority of the attendees at this conference.
Fig. 3.1 The country of origin of articles published in the ACM's Conference on Computer-Supported Cooperative Work.
In comparing Japanese and American progress in computer-supported cooperative work research, it is helpful to differentiate approaches to the research and also to differentiate types of CSCW applications. One can identify two basic research approaches: empirical studies and prototype development. Empirical studies investigate how groups actually work with or without computers. They rely upon both quantitative and qualitative empirical data collection and analysis methodologies derived from psychology, sociology, anthropology, and other social sciences. The second approach to CSCW research is prototype development, that is, making hardware and software applications to support groups. The focus is improving the state of the art in applications or identifying applications that serve new functions in supporting groups. Intermediate between the empirical studies and prototype development are formative evaluations of CSCW applications. The goal is to identify the features and the applications that are successful or need improvement. These formative evaluations feed back into the redesign of the computer application.
CSCW applications differ in two dimensions. The first is whether they support delayed or real-time communication. Most people are familiar with delayed or asynchronous communication applications, like voice mail, electronic mail, or computer bulletin boards, which store messages. These can be contrasted with real-time or synchronous conferences, chat-rooms, or MUDs (Multiuser Dimensions), which are virtual spaces in which people can send messages back and forth to each other in almost real time (i.e., delay is measured in fractions of a second). Many asynchronous and synchronous applications, such as electronic mail, are designed for unrestricted communication, and they structure the communication very little. They just let people type at or talk to each other.
CSCW applications also differ in the extent to which they structure human contact (Galegher and Kraut 1990). In contrast to e-mail and MUDs, group memory systems, work flow systems, and group decision support systems do not just use computing as a message channel; they also use computer processing to impose structure on the communication. For example, the CoordinatorŪ (Flores et al. 1988) was an electronic mail system that forced users to classify their messages into categories such as "request" and "refusal of request." This categorization allowed the computer to identify if an answer to a request was overdue and to take remedial action. Some Group Decision Support Systems (Nunamaker et al. 1991) provide a module for stakeholder analysis, a procedure that encourages groups to consider the costs and benefits of a decision for a variety of stakeholders, some of whom might not be represented among the decision makers. A popular module in Group Decision Support Systems helps automate a decision technique known as policy capturing. The module aids groups in breaking down a global decision into underlying criteria, requiring them to rate decision alternatives against the criteria, and it then algebraically combines their ratings to help them reach a global decision. The mechanisms make decision makers more aware of the factors influencing their decision, and help them systematically apply their decision criteria.
Of the CSCW research conducted in Japan, virtually none adopts the empirical approach; almost all of the research that does exist is in prototype development. Much of the prototype development focuses on real-time communication rather than delayed communication, and virtually none focuses on the use of computing to structure communications. Virtually all of the Japanese research reported in the West is on video-telephony or video-conferencing of one sort or another.
Research on video-telephony -- transmitting pictures of people in conversation along with their voices -- originated with AT&T Bell Labs in 1929 (Ives 1930). In the almost seven decades since then, the goal of much of the research and development has been to improve technology so that higher-quality images could be transmitted under limited bandwidth and costs. Unfortunately, most behavioral research suggests that being able to see another person in a conversation is not very important. There is a twenty-five year tradition of research showing that communication is about the same, in terms of both process and effectiveness, whether or not people can see their conversational partners (e.g., Chapanis et al. 1972; Fish et al. 1993; Short, Williams, and Christie 1976). The irrelevance of a visual channel is especially true when people in conversation are simply exchanging information. Better quality images do not change this basic conclusion. Even face-to-face communication and telephone communication are very similar and accomplish the same things.
While it is not very important to see the person you are talking to, especially for many information-oriented, business conversations, it is very important to see what you are talking about. To have a discussion about a document, for example, it helps for all parties to have the document in front of them and to be able to easily identify what any speaker is referring to or pointing at during the course of the conversation. The requirement is that participants in a conversation share not only the document itself, but changes in the document generated during the course of a meeting. In addition, it is very useful if they can share pointing and other gestures over it. Currently, several computer applications are commercially available to allow people to share computer-based documents of many types (e.g., Farallon Computing 1988).
There are no good commercial products for sharing noncomputer-based artifacts at a distance. Many of the CSCW video-conferencing prototypes coming out of Japanese labs demonstrate innovative ways to solve this problem. Some of the best research in this genre comes from Hiroshi Ishii's lab at NTT, the giant Japanese telecommunications company. The research labs of NTT are essentially indistinguishable from those of the major U.S. telecommunications companies in terms of the breadth and depth of the research portfolio.
Ishii's lab has designed and tested a series of related video-conferencing prototypes over the past six years. The fundamental goal in Ishii's program of research is to determine how to seamlessly merge different streams of data onto a single frame or screen so that people who are conversing with each other at a distance have all of them available. The streams include the facial displays of the people with whom they are conversing, much as they would have in a traditional video-teleconferencing or video-telepathy format, and a shared workspace for discussion or manipulation. In the first stages of this work, known as Team Workstation (see Fig. 3.2), the shared workspace came from a camera focused on pieces of paper. Using a system of transparent overlays, people in a conversation could see the hands of their partners as they wrote or drew, and also saw the residue of the ink that they left behind. Using this procedure, two people could draw on paper simultaneously. One of the participants would write directly on the paper and the second would write on a transparent overlay, superimposed on the paper. There were three major problems with Ishii's technique of using video for a shared workspace: (1) the resolution was insufficient for most document-oriented work; (2) the video consumed substantially more bandwidth than necessary; and (3) the shared workspace never existed except as a temporary overlay, and therefore disappeared as soon as the telecommunication session was over.
Fig. 3.2. Team workstation. (Video Clip)
In later iterations of this work, known as Clearboard (see Fig. 3.3), video images of the participants in a conversation are overlaid with bitmaps of shared computer workspaces. The effect is of two people conversing through a window or, more accurately, the glass on the computer monitor; they can see and talk through the window, but also draw or type over each other's image (and each see all images and text in the appropriate spatial orientation). Studies suggest that people use the system with more whimsy than they do many other computer-supported work applications, for example by drawing mustaches and beards over the faces of their partners. People use the facial overlay to remain aware of what their partner is attending to. This gaze awareness is useful for managing turn-taking and references to topics of conversation. While this solution solves the three problems discussed previously, it is limited to the sharing of computer-based documents.
Fig. 3.3. Clearboard application with systems architecture. (Video Clip)
The ATR virtual video-teleconferencing project is another prototype designed to allow people in a conversation to share the artifacts that are the focus of their conversation. ATR is a research consortium on telecommunication founded after the privatization of NTT and funded by both industry and government. It conducts advanced research relevant to the telecommunications industry. The ATR virtual video-teleconferencing project attempts to use virtual reality as a way of transmitting information about both the faces of the people in a conversation and the 3-dimensional objects they are talking about. Imagine engineers and designers in several parts of the world discussing a model of a new automotive part, or several chemists in a multinational corporation collaborating on synthesizing a new molecule by trying to determine whether a chemical fragment fits into the already formed framework of the molecule. In these cases, the participants in a teleconference would want to see, touch, and manipulate 3-dimensional objects or synthesized models of them.
Figure 3.4 shows the ATR prototype in operation: A man is talking to a simulated head on the conferencing screen. Each party is wearing virtual reality garb (glasses and a data glove). Although the two parties are in different locations, they are jointly manipulating a model of an aircraft at the same time as they are discussing it. The goal of researchers at the ATR labs is to substitute computer cycles for telecommunications bandwidth. Rather than transmit a video image of people and the object that they are discussing, the system transmits commands that allow computers at each remote location to render the objects in their current states. By having the participants interact with a virtual reality model of an object, individuals from multiple sites can simultaneously operate the object.
Fig. 3.4. Virtual teleconferencing at ATR Laboratories.
Another way to share 3-dimensional objects is to use video cameras to show them. This differs from traditional video conferencing primarily by pointing the camera in a new direction; rather than showing the speaker, the camera points it at the object being discussed. A prototype from the University of Tokyo (Kuzuoka 1992) implements this idea by having a field worker at a remote location wear a helmet mounted with a small display and small camera. The output from the camera goes to a screen at another location; a person at this location interacts with the screen, and that image is transmitted back to the small head-mounted display. This type of arrangement could be valuable, for example, among field technicians getting advice from office-based experts. Imagine a telephone craftsperson on top of a telephone pole having trouble with forty-year-old parts. She might call a supervisor at the garage who has had more experience. They can both see and point to the equipment on the pole as they trade advice (Fig. 3.5).
Fig. 3.5. SharedView head-mounted camera and display.
The user interface for camera positioning in this early prototype was head movement. The camera followed what the person wearing it was looking at. This reduced the effort that the person controlling the camera had to expend in pointing it. However, there are many conditions under which it would be useful for the remote party to have telecontrol of the camera (e.g., for the remote party to search the periphery, while the person wearing the camera was concentrating on a task). Telecontrol would allow the remote participant in the conversation to have direct control of his or her visual environment. A later iteration, called the GestureCam, provides this teleoperation (Kuzuoka, Kosuge, and Tanaka 1994). The GestureCam is shown in Figure 3.6.
Fig. 3.6. Telecontrol in GestureCam.
If these innovative video-telephony prototypes characterize the Japanese style of CSCW research, they are as instructive for what they reveal about Japan's deficits in this domain as its research strengths. There are substantial subareas of CSCW research where the Japanese have no presence. Consider research on group decision support systems, a CSCW application for structuring group meetings that is common in business schools in North America. Discussions with Japanese researchers and reading of annual reports from Japanese firms show that Japanese are using group decision support systems as part of their business practices, especially in software development (e.g., Fig. 3.7). However, the JTEC panel uncovered no Japanese research programs to improve the state of the art in this domain.
Fig. 3.7. Use of a group decision support facility at NTT.
A bigger lack is in basic research on human communication processes that provides the theoretical understanding on which applied research projects are grounded. In the United States, for example, work by Herbert Clark and his students (Clark 1992) attempts to articulate the rules that people use to coordinate conversation. This research has been highly influential in providing design guidelines to determine when different communication modalities will be useful (e.g., Clark and Brennan 1991) and in understanding the failures of new communication systems to improve communication (e.g., Whittaker and O'Conaill 1995). Our impression of Japanese research, in contrast, is that Japan's communication scholars are not in the forefront of the discipline, and that prototype development is relatively uninfluenced by communications theory from either Japan or the West.