SCALABILITY: THE BILLION-USER PROBLEM

A major problem encountered in digital library development is scalability¾the expansion of system capabilities by many orders of magnitude. For example, a Web site, even one with huge capacity, may be choked if many people access it at the same time. Assuming that before long approximately a billion people will be able to connect to the Internet, if only one percent of them are interested in a topic (a number that is far too low for subjects of global concern such as the death of Princess Diana), that is a collection of 10 million people. If a server requires 100 milliseconds to grant access to a Web page, then the population would have to wait 12 days for everyone to see the same page. Therefore, technology that seems instantaneous when used on a small scale may become impossibly cumbersome when expanded.

One can imagine speeding up access to a page by adding more servers in response to anticipated demand, but even this numerical solution does not scale. If the problem is delivery of an HDTV movie (which takes 10 seconds to download at 10 gigabits per second), distributing the film to even one million people (a tenth of a percent of anticipated net users and fewer than the attendance at a major film during its first weekend of release), would require 120 days. Increasing the number of servers by an order of magnitude would not make the delay even remotely tolerable.

Bandwidth scalability is largely a hardware and networking problem. Keyword searching presents a problem of an entirely different sort. The commercial Web searchers now index approximately 50 million documents. A search can easily return 1,000 hits. This is a number small enough that a user could consider glancing at all of them to find what he wants. If the corpus being searched contained 50 billion pages (less than the number of pages in all books), a search might return a million hits, which would instead require a lifetime of effort to review. Therefore, building a digital library index, particularly one to be shared among many libraries, is not simply a matter of building a large one. Access methods, screening and navigation tools must also be provided.

Even if a library has a few million books, its staff members can be generally familiar with the nature and extent of its holdings. A library with a billion books and several billion other items would be qualitatively different and probably beyond the ability of any person to master. The sheer volume of transactions, catalog records, new acquisitions and help requests would be overwhelming. This is particularly true if the library permits access by computer programs as well as humans. It is apparent then that new organizational concepts on a grand scale will be required if digital information systems are to scale properly.


Published: February 1999; WTEC Hyper-Librarian