SEARCH

The problem of locating items in libraries is frequently referred to as"search," although that word tends to imply that one knows in advance what oneis looking for, and possesses handles, indicators or index terms to serve asfinding aids. This narrow view ignores the activity of browsing or even thehigher-level function of becoming acquainted in general with a library'sholdings. Browsing in a traditional library is a physical activity-it involvesscanning shelves on which related works have been placed in proximity, andoccasionally withdrawing them from the shelves for examination. Browsing in adigital library is a logical activity mediated by a computer. It does notrequire physical proximity in any sense; indeed, two consecutive items examinedmay be stored on different continents. The question, then, is how can a libraryuser (not to say the library staff) become familiar with the whole of recordedhuman information in a way that makes it accessible and useful?

We adopt the term "navigation" to mean moving about in a digital collection.Search is a directed form of navigation in which the goal is defined in advancewith reasonable clarity. The result of a search may be an item, a collection ofitems, or any part of an item, even down to a single glyph. Tools must beprovided that enable users to move about at varying levels of granularitywithin the corpus.

The usual requirement for a search is that the user is looking for aspecific piece of information or a summary of what is available about a certaintopic. A common case is that the user wants the answer to a specific question,such as when the postcard was invented. Only rarely does such a questiontranslate naturally into a keyword query. Such retrieval is indirect in thesense that the user wants to learn A, but formulates a query B, to which hereceives a set of retrieved documents that must be scanned to determine whetherthe answer to A is among them. It would be far better simply to allow the userto ask question A instead of requiring him to convert it to some querylanguage.

Non-Textual Matter

The existence of Web searchers proves that text can be searched withoutbeing indexed or cataloged. At least on a microscopic level, documents can belocated purely by their content. Many documents consist of text plus otherinformation such as mathematical equations, tables and drawings that themselvescannot be searched directly but can often be located by the presence of relatedtext. Purely non-textual matter is very different. Although substantialprogress is being made on video searching (through the use of extensivecaptioning cues, speech recognition and other aids), content searching of musicand visual materials is non-existent or in its infancy. The problem is furthercomplicated by the existence of work that combines media in various ways.

Translingual Issues

Most library items, particularly in non-English-speaking countries, are notin English. The central translingual library question is how users may navigatethrough materials in foreign languages and make effective use of them.Translingual search is currently a research problem for which obvious solutionsdo not work. A keyword search cannot be made multilingual merely by translatingthe keywords one at a time. The number of possible translations of each wordmay be very large, so an explosion in the number of hits may result. Thisapproach also takes no account of idiomatic uses, untranslatable words such asparticles, and numerous other language-related phenomena.

An interim solution is the use of translation assistants-programs that offerdictionary entries or partial or suggested translations of text portions. Theseshow great promise for users who are at least partially familiar with thelanguage of the retrieved document.

Synthetic Text

A user who is looking for general information on a particular topic isconstrained in traditional libraries to go to an encyclopedia (which may haveno entry or an outdated one on the topic of interest) or to refer to books thatare generally about the subject under consideration. The time necessary for theuser to obtain an overview at the appropriate level may be large because of thevolume of repetitive material obtained. Programs are needed that are able toscan hits with the particular query in mind and produce abstracts, summaries,translations or analyses of the retrieved material.


Published: February 1999; WTECHyper-Librarian