SCALABILITY: THE BILLION-USER PROBLEM

A major problem encountered in digital library development isscalability¾the expansion of system capabilities by many orders of magnitude.For example, a Web site, even one with huge capacity, may be choked if manypeople access it at the same time. Assuming that before long approximately abillion people will be able to connect to the Internet, if only one percent ofthem are interested in a topic (a number that is far too low for subjects ofglobal concern such as the death of Princess Diana), that is a collection of 10million people. If a server requires 100 milliseconds to grant access to a Webpage, then the population would have to wait 12 days for everyone to see thesame page. Therefore, technology that seems instantaneous when used on a smallscale may become impossibly cumbersome when expanded.

One can imagine speeding up access to a page by adding more servers inresponse to anticipated demand, but even this numerical solution does notscale. If the problem is delivery of an HDTV movie (which takes 10 seconds todownload at 10 gigabits per second), distributing the film to even one millionpeople (a tenth of a percent of anticipated net users and fewer than theattendance at a major film during its first weekend of release), would require120 days. Increasing the number of servers by an order of magnitude would notmake the delay even remotely tolerable.

Bandwidth scalability is largely a hardware and networking problem. Keywordsearching presents a problem of an entirely different sort. The commercial Websearchers now index approximately 50 million documents. A search can easilyreturn 1,000 hits. This is a number small enough that a user could considerglancing at all of them to find what he wants. If the corpus being searchedcontained 50 billion pages (less than the number of pages in all books), asearch might return a million hits, which would instead require a lifetime ofeffort to review. Therefore, building a digital library index, particularly oneto be shared among many libraries, is not simply a matter of building a largeone. Access methods, screening and navigation tools must also be provided.

Even if a library has a few million books, its staff members can begenerally familiar with the nature and extent of its holdings. A library with abillion books and several billion other items would be qualitatively differentand probably beyond the ability of any person to master. The sheer volume oftransactions, catalog records, new acquisitions and help requests would beoverwhelming. This is particularly true if the library permits access bycomputer programs as well as humans. It is apparent then that neworganizational concepts on a grand scale will be required if digitalinformation systems are to scale properly.


Published: February 1999; WTECHyper-Librarian