Site: Nippon Telegraph and Telephone (NTT) Corporation
Yokosuka R&D Center
Hikarinooka, Yokosuka-Shi
Kanagawa, 239 Japan

Date Visited: 25 March 1998

WTEC Attendess: R. Chellappa (report author), B. Davis-Brown, R. Larsen, J.Mendel, H. Morishita, R. Reddy



NTT, celebrating its 50th anniversary, is undergoing a transformation from atelecommunications company to an information communications business, andeventually to an information distribution business. Three major thrusts pursuedto realize this transformation are "Electrum Cyber Society (ECS)" "Megamedia"and "Next Generation Infrastructure."

As all the hosts and research demonstrations were drawn from the ECS thrustarea, this report addresses this area only. NTT's vision of ECS, eloquentlyexpressed by Mr. Toshiharu Aoki, Senior Executive Vice President and SeniorExecutive Manager of R&D Headquarters (NTT n.d.(a)), is electronic exchangeof information products and money through secure networks. NTT's activities arefocused on becoming a center of excellence in multimedia research throughR&D and active participation in several national and internationalcollaborative consortia and standardization efforts. Some of the notableactivities include involvement in the Asian multimedia forum Photonic NetworkForum, creation of ECS test-beds, cyber-society open experiments, "An OpenLab," and contributions to national social projects, such as the medicalinformation network.


The budget of the research and development headquarters is approximately 5%of net sales. Over the last seven years, R&D expenditures have been around¥3 billion. Roughly half of R&D expenditures are allotted to researchlaboratories.

Organization and Staffing

NTT R&D Headquarters is divided into three Laboratory Groups (NTTn.d.(b)):

The hosts, led by Mr. Shinichiro Yoshida, represent the Multimedia SystemsLaboratory Group. This group is divided into seven laboratories:

  1. multimedia systems development
  2. multimedia networks
  3. information and communication systems
  4. human interface
  5. software
  6. wireless systems
  7. integrated information and energy systems

The laboratories are split across different R&D centers. Researchers andengineers use video conferencing facilities to keep abreast of relatedactivities. The total size of the workforce has been steady at 8,500 over thelast seven years. Of these approximately 3,000 are engaged in research, therest being in development. Approximately 150 new hires are made every year,replacing those that are lost to academia, other subsidiaries, andretirement.


The site visit team was shown several demonstrations representing ongoingwork in some of the laboratories in the Multimedia Systems Laboratory Group.These are as follows:

The network library system provides multimedia services based on a broadbandATM network. The network is served by Hi-Fi music, MPEG-1, MPEG-2 and digitallibrary servers. Processing engines for voice recognition, search,Japanese/English translation and text-to-speech are provided. A key componentin this network is a super-high definition display, at a resolution of 2,048 x2,048 pixels, 24 bits/pixel operating at 60 frames/sec for video. The networklibrary is being used for doctors' viewing of medical images, sight- seeingtours, teleconferences and on-the-fly machine translation between Japanese andEnglish.

Text and content-based retrieval of video is a critical component of adigital library for automatic indexing and retrieval. Two demonstrations inthis area were shown. One involves reading the Japanese captions from TVbroadcasts so that topic- or concept-based video retrieval can be accomplished.This work is expected to be commercially available by the end of 1998. Keyalgorithmic steps involved are detection of frames that contain text,extraction of text region, character segmentation and recognition. Details ofthese steps may be found in Kurakake et al. (1997). The other demonstration wason ExSight, a multimedia retrieval system (Yamamuro et al. 1998; Kon'ya andKushima 1998) using object-based image matching and keyword-based retrieval.Unlike pixel- or impression-based approaches, object-based approaches, such asExSight, search over a large data-base using content. The steps involvedinclude automatic object extraction, feature extraction (color, shape, etc.)and high-speed similarity matching. Query fusion (as a union of image objects)and high-speed browsing are provided as Java applets. Potentialcommercialization applications are in electronic commerce, digital museums(show all the pictures of a boy with a dog), and digital photo albums. Althoughprimarily image-content driven, the system can accommodate keyword-basedretrieval.

Electronic commerce is viewed as being one of the promising opportunities inthe ECS thrust area. Major concerns in making this feasible are guaranteeingsecurity, copyrights and maintaining the timeline of transactions. Twodemonstrations illustrating how electronic money can be securely moved aroundbetween interested parties and how copyrights can be protected in the sale anddistribution of digital objects were the highlights of electronic commerceactivities over the network. In the demonstration of moving electronic moneyaround, a smart card is used for making purchases from anywhere as long as oneis connected to the network. This demonstration illustrated how securetransactions can be achieved.

When digital objects are marketed over the network, the sellers need toensure that their copyrights are protected. The project InfoProtectdemonstrates the secure distribution of images. The owner of the digitalcontent first creates a partial image (semi-disclosed) and its descramblingkey. The descrambling key is registered with the system center and the partialimage is transmitted to the potential buyer. The buyer decides to purchase byinspecting the scrambled image and buys the descrambling key via a secure keytransmission protocol known as InfoKey developed at NTT. The key is used todescramble the image. The buyer ID is embedded using digital watermarking,providing protection against copyright violation.

The high presence video teleconference system is centered around two largeprojection displays (each 110 inches long along the diagonal). The resolutionis four times that of high-definition TV and enables interaction with real-lifesized humans. The quality of display performance was demonstrated using 2-Dmonocular and stereo still images. The monocular images were viewed at aresolution of 6 million pixels/frame and the stereo pairs each had about 3million pixels/image, giving excellent quality to the stereo images. Althoughthis system as a whole is expensive, key components of the display technologyhave been commercialized. Using sound localization, an enhanced multimediapresentation is possible with applications to remote museums and education.

When audio books and video are collected and bound as digital objects, it iscritical to provide user-friendly interfaces to access them. In the CyberShelfproject, books created from HTML documents are accessible using a book metaphordescription language.

Another interesting demonstration was an image mosaicking system thatproduces a panoramic view from a sequence of translating images. User-friendlyinterfaces to the mosaicking algorithms have been provided. Details of themosaicking algorithms are in Akutsu et al. (1995) and Taniguchi et al.(1997).


Akutsu, A., Y. Tonomura and H. Hamada. 1995. Videostyler:multidimensional video computing for eloquent media interface. In Proc.Intl. Conf. on Image Processing. Washington D.C. October.

Kon'ya, S. and K. Kushima. 1998. A rotation invariantshape representation based on wavelet transform. In Proc. Workshop on ImageRetrieval. University of Northumbria at Newscastle. Feb: 1-9.

Kurakake, S., H. Kuwano and K. Odaka. 1997. Recognitionand visual feature matching of text region in video for conceptual indexing. InProc. SPIE on Storage and Retrieval for Image and Video Databases V. SanJose, CA. Feb: 368-379.

NTT. n.d.(a). Corporate Technology, Research andDevelopment. (Brochure.)

NTT. n.d.(b). Yokosuka R and D Center Guide.(Brochure.)

Taniguchi, Y., A. Akutsu and Y. Tonomura. 1997. Panoramaexcerpts: extracting and packing panoramas for video browsing. In Proc. ACMMultimedia 97. Seattle, Washington: 429-436.

Yamamuro, M., K. Kushima, H. Kimoto, H. Akama, S. Konya,J. Nakagawa, K. Mii, N. Taniguchi and K. Curtis. 1998. Exsight-multimediainformation retrieval system. In Proc. 20th Annual PacificTelecommunications Conference. Honolulu, Hawaii. Jan: 734-739.

Published: February 1999; WTECHyper-Librarian