IRIS Photos

A Library Is Not a Box of Books.

If Google has millions of digitized books, it is not a library. It’s a resource and a tool. A library, your library, is an integrated suite of services optimized for you and your community, the Cornell community.

Sarah E.Thomas

Google. Everybody uses it. Most Internet users couldn’t live without it. A tech start-up that made good (Google’s market value is estimated at $50 billion), Google has repeatedly captured the world’s attention with new releases and announcements of breathtaking innovations. Google Print links publishers with readers, enabling searchers to buy works that turn up in their Google lists. Google Scholar winnows out the unwashed chaos of the open Web and zeros in on texts from scientific journals and academic and government sites. December’s headlines confirming a partnership of Google and five of the largest libraries in the world shocked readers into a new concept of the library. For some it was a vision of bibliographic bliss. Others were more cautious.

The advantages of Google’s library digitization program are strong. Users would find instantaneous access to the texts of millions of books. As Google co-founder Larry Page explained in the company’s December 12, 2004, press release, "Our work with libraries further enhances the existing Google Print program, which enables users to find matches within the full text of books, while publishers and authors monetize that information." Page added, "Google's mission is to organize the world's information, and we're excited to be working with libraries to help make this mission a reality." No wonder, with this impressive accomplishment on the horizon, that users began to question the need for libraries.

Google’s initiative to digitize library books from Harvard, Michigan, the New York Public Library, Oxford, and Stanford would indeed contribute to democratization of scholarship. With some of the world’s most superb collections opened for discovery, searchers would have access to a treasure trove of knowledge. New connections would be made, and the pace of understanding would accelerate. But wait, why would anyone find fault with this rosy scenario?

Let’s look at the facts of the project, at least so far as they are known. Google intends to digitize the entire collections of Michigan and Stanford, or about 15 million volumes. The other libraries are beginning with lesser amounts. Works in the public domain will be freely available. Titles under copyright will be presented in accordance with copyright law. This might limit results to bibliographic data or to snippets of text. Librarians at Michigan and Stanford expect to be able to make the full text of digitized items available to their faculty and students.

The announcement and scraps of information gleaned from blogs, newspaper accounts, and press conferences have led to speculation about how Google will accomplish the transformation, its costs, and its timeframe. At $10 per book, a figure that seems rock-bottom to those with experience, the project will exceed $150 million for Stanford’s and Michigan’s collections alone. Digitizing 8 million books will take a while. To convert a million per year would mean digitizing almost 20,000 volumes each week. Assuming 1,000 pages per hour and 300 pages per book, Google would have to operate 34 scanners, 24 hours a day, to capture the 300 million page images.

Critics have warned of many potential pitfalls with the project. Some worry that the quality of the scans will be too low for the serious scholar. Skeptics remark on the gap between the boldness of the announcement and the reality of what Google will actually deliver, since it will likely take at least five years to convert these libraries, and only a fraction of what is converted will be in the public domain. According to Ann Wolpert, the director of the MIT Libraries, “Of the 30 million works copyrighted in the U.S. since 1790, only about 12% are clearly in the public domain. Some 62% are protected by copyright, and an additional 26% may or may not be copyrighted” (“Google at the Gate,” American Libraries, March 2005, p. 42). Google watchers are concerned that information that is freely available in the beginning could be restricted in the future. Rumors abound that the tech firm is negotiating with publishers to create a pay-per-view service or to offer the user the ability to purchase the work. This would be a convenient service for those able to afford it, but a far cry from the vision of universal democratic access. Other quibbles about Google include the crudeness of its search engine, which returns great gobs of information to wade through. Google’s business model depends on advertising, which means there is no free lunch. Interspersed and connected with the information Google serves up will be links to advertisers. Libraries have traditionally been billboard-free information highways. Privacy mavens also fear the monitoring of personal data and search information, something protected by law within libraries.

Another key observation is that, as wonderful as it might be to have access to the full texts of leading libraries, the information presented will still not be comprehensive. Compare Cornell’s collections with those of the participating libraries, for example. None of these institutions have strength in agriculture, hospitality, human ecology, or veterinary medicine, all areas where Cornell excels. Cornell has special depth in such diverse subjects as Icelandic literature, the Andean region of Peru, anti-slavery, the French Revolution, indigenous peoples of North and South America, Southeast Asia, and historical witchcraft persecutions, to name but a few. Today’s scholars increasingly use materials other than books as the medium for exploration. Primary source documents are essential. Photographs, prints, maps, and audiovisual materials are rising in importance. And the French national librarian has argued that digitizing the contents of leading English and American libraries will present history through a distorted lens, with a prejudice toward views expressed in English and a distinctly Anglo-Saxon perspective on the world. Will books not indexed by Google become invisible?

Is it churlish to harp on these deficiencies? Having access to millions of titles would be an enormous gift. When libraries first converted their card catalogs to online form and shared them with the world, the enthusiasm about knowing where physical copies were located was soon overtaken by a demand to get the actual text. Users wanted more: access to full text, access to special collections, and faster access. Our appetite is insatiable.

What librarians want Google enthusiasts to know is that the library is much more than a box of books. Collections are only one part of the value of a library. It is common for faculty to place a higher premium on buying more books for the library or on licensing access to electronic journals than on any staff function or the physical library itself. “Just get more books,” declares the humanist, when asked about renovating Olin Library. From inside the library, however, we see the competing needs of various constituencies. Students benefit from the welcoming environment of our libraries, places that offer a more-disciplined atmosphere for studying than their dorm rooms or apartments. The search for information today is much more complex than when the card catalog was the starting point. The tradition of the codex spanned centuries, so reading books became standardized and their organization predictable. We lack this familiarity with searching online resources, where database structure is wildly diverse. Librarians and educators have been devoting increasing time to information literacy and fluency in the drive to ensure that graduates have the skills to find, evaluate, and manipulate information. President Jeffrey Lehman has described one of Cornell’s great challenges as “wisdom in the age of digital information.” Google will offer information, but knowledge and wisdom will arise from the intelligence that people contribute to the interpretation of that information. People at the top of their game, such as professors with decades of experience, are far more self-sufficient than undergraduate or graduate students or beginning professionals. Yet for all levels of readers, the librarian offers the service of someone whose responsibility it is to be expert in the navigation of databases or in the awareness of new resources, particularly those that cross disciplines. And as curators of the cultural heritage of our civilization, we have a commitment that has endured for centuries. Businesses may come and go, but universities offer a stability that is priceless when it comes to the preservation of ideas—captured on paper, film, or digital media.

Despite the reservations harbored about Google, the information giant is a potent force with which to be reckoned. A user study conducted by the Cornell Library two years ago showed that 95% of faculty surveyed consulted Google or Yahoo daily or weekly, but only 84% used the library’s online resources. Only 53% visited the physical library that frequently. If the resource of choice is Google, then it behooves the library to do the best it can to ensure that users find reliable, accurate information from their Google searches. To start, libraries must make sure their holdings and services are easily located. This entails employing metadata that will facilitate Google’s indexing of materials of scholarly value. Since Cornell’s division of Rare and Manuscript Collections began using Encoded Archival Description (EAD) and has converted over 1,000 finding aids for manuscript collections, inquiries about, and use of, those collections have risen sharply. OCLC, which manages the infrastructure of a union catalog known as WorldCat and holds the bibliographic records of 57 million titles from 23,000 libraries, has entered into a partnership with Google to display the WorldCat titles in Google. Searchers can link to WorldCat and find a library in a specific geographic location that holds the title they have found. Cornell and other libraries need to enhance their collaboration to provide convenient delivery of materials located in this manner for scholars and other users.

Google’s chief attractions are ease of use and wealth of content. Libraries, convinced that both their content and service are in some ways superior to Google, can take a page from Google’s (and Amazon’s) book by introducing more of the features users love and adding others. What’s needed are faster yields (fractions of a second response time rather than whole seconds), more material (more than a single institution’s holdings, but less of a kitchen sink than Google), more helpful information (people who looked at this title also checked out these works; abstracts, tables of contents, book reviews, and blurbs), and more-timely information (books and other materials processed immediately so current information is readily available.) Unique holdings, often marooned in special collections for lack of staff to process them, need to become a higher priority for cataloging so they can be found and used. Librarians also must promote their services and make people aware of why using the library alongside Google will increase their productivity. Studies of Google Scholar, for example, show that searches fail to bring up many important resources that a search using the Cornell Library Gateway will yield. At the same time, Google Scholar indexes literature not included in the Library Gateway. The two are complementary, and neither alone is sufficient.

When the news first broke that Google Print would include library content, it prompted many of those libraries not participating to respond defensively. The Cornell Daily Sun, in an editorial on January 20, asked why our library was not among the chosen. Humanists, yes, even humanists! questioned why Cornell even needed a library if all those books would be online. With the first wave of excitement receding, we can see that although Google’s initiative is a great leap forward, it in no way obviates the need for libraries, which offer many specialized services beyond collections. Still, Google’s bold venture will affect libraries in many ways. Many small and medium-sized colleges and universities will be able to provide their students and faculty with libraries that others have spent billions assembling, so mere access to information will no longer be a barrier to scholarly advances. Institutions can rethink their need to hold items in the public domain. It is likely that only a few universities, perhaps those in the top twenty, and national libraries will commit to storing and preserving large print libraries. Others can reduce the space they devote to housing books and the funds they spend on acquiring, processing, and preserving items that will be predictably accessible on Google. The leading institutions will position themselves strategically so they are not vulnerable to a Google business model that increases the market share of certain publishers. There will be opportunities to work together to provide access to copyrighted works held by these universities in a legal, but more-democratic fashion. Universities will distinguish themselves through their special collections and their value-added services, such as effective integration of instruction in classes.

Just as having MIT’s course syllabi on the Internet does not replace the value of studying at MIT, or assigning the same textbook that Cornell faculty use in their classes does not make another school’s courses identical with those taught in Goldwin Smith Hall, having millions of digitized books online in Google will not substitute for the Cornell University Library. Google’s contribution to the spread of knowledge by transforming words on paper into searchable text is revolutionary. Librarians will offer valuable complementary services to Google, with the result being increased scholarly productivity.