Text and content-based retrieval of video is a critical component of a DIVLfor automatic indexing and retrieval. This is one of the most active researchareas in the United States; published papers and prototype software systems aretoo numerous to list here. Good overviews of recent efforts can be seen in DLIIn.d., ACM 1997a, ACM 1997b, AAAI 1997, and IEEE 1998. More detailed discussionsof text-related search can be found in Chapter 5. Several Japanese companiesare also actively involved in this area. Two demonstrations in the image areawere shown by NTT researchers. One involves reading the Japanese captions fromTV broadcasts so that topic- or concept-based video retrieval can beaccomplished. Key algorithmic steps involved are detection of frames thatcontain text, extraction of text regions, character segmentation andrecognition. Details of these steps can be found in Kurakaka, Kuwano and Odaka(1997). The other demonstration was of ExSight, a multimedia retrieval systemusing object-based image matching and keyword-based retrieval (Yamamuro et al.1998). Unlike pixel- or impression-based approaches, object-based approachessuch as ExSight search over a large database using content. The steps involvedinclude automatic object extraction, feature extraction (color, shape, etc.)and high-speed similarity matching. Query fusion (as a union of image objects)and high-speed browsing are provided as Java applets. Potentialcommercialization applications are in electronic commerce, digital museums(show all the pictures of a boy with a dog), and digital photo albums. Althoughprimarily image-content driven, the system can accommodate keyword-basedretrieval. A functional diagram of ExSight is shown in Figure. 6.3.

Fig. 6.3. Functional diagram of ExSight (NTT).

When audio books and video are collected and bound as digital objects, it iscritical to provide user-friendly interfaces to access them. In the CyberShelfproject, books created from HTML documents are accessible using a book metaphordescription language. Another interesting demonstration was an image mosaickingsystem that produces a panoramic view from a sequence of translating images.User-friendly interfaces to the mosaicking algorithms have been provided.Details of the mosaicking algorithms are found in Akutsu et al. 1995 and inTaniguchi et al. 1997.

