One Hundred Years of Access to Digital Documents
Draft for Discussion
Sarah E Thomas
University Librarian, Cornell University
September 17, 1999
An author creating a work today expects to communicate not only to his contemporaries, but also to future generations of world citizens. Our cultural institutions venerate our oldest documents and objects as windows on past civilizations and for their ability to reveal, through their form and their content, information that advances knowledge betters society. These ancient paper, clay, and stone materials have endured for centuries, and we anticipate they will greet the third millennium, barring some universal catastrophe. True, brittle paper endangers the paper publications of the last century, but a variety of solutions reduce the risk of their total eclipse from the human record, and we are increasingly confident that a strong representation of the most significant titles will be preserved for further use. By contrast, digital objects produced as recently as last month seem far more ephemeral. Electronic media, extolled for the positive attributes of ease and rapidity of dissemination, dynamism, and a score of other attractive features, are cursed with evanescence. Yet because we unwillingly discard information, we are alert to the need to preserve electronic materials. The Task Force on Archiving of Digital Information, commissioned by the Commission on Preservation and Access and The Research Libraries Group, set forth eloquently the reasons why long-term access is a cultural imperative. Its report, Preserving Digital Information (1996), reminds us: "The ability of a culture to survive into the future depends on the richness and acuity of its members' sense of history."(p. 1)
Technological developments in the transmission of ideas and information have had an unsettling effect on the accepted practices of authors, publishers, scholarly societies, book sellers, librarians, readers, and our governments. Opportunities for rethinking existing roles abound, with a blurring of roles occurring in some instances. Some revolutionaries have suggested that intermediaries, such as publishers and libraries, are superfluous. This paper proceeds from the premise that the value-added services currently offered through intermediaries would continue to be of merit throughout the next century. Although publishers and libraries will certainly evolve, there will still be a need for editing, review, dissemination, selection, analysis, organization, and preservation, among many other services these organizations have traditionally offered. Nonetheless, the changing technology of communication and the consequent ripple effect of innovation will create new patterns of relationships and services.
Our challenge is to identify the ways in which we can assure that the thoughts and findings encoded in electronic documents in 1999 are at least as accessible as those published in the Scientific American or Harper's Weekly of the end of the last century. Several assumptions underlie the discussion of long-term retention in this paper:
1. Universities, libraries, publishers, societies, and governments will survive, albeit with changes, through 2099.
2. The objects under consideration are electronic journals published by "mainstream publishers." They represent only a subset of electronic documents, but constitute an important segment of the publishing market. As a consequence of the review and editorial process these publications have undergone, they will constitute a perceived higher quality product than most other issuances not subjected to such scrutiny. Not included are the growing categories of self-published and grey literature flooding the web. Although the nature of journals may become more fluid, with subsequent atomization at the article or even image level, there will continue to be a large body of published material for which archiving solutions are essential.
3. Technological details of archiving problems are soluble in a cost-effective manner.
4. Our goal is to eliminate the current duplication inherent in the dual publication of print and electronic formats and to move to an order in which the electronic journal is the standard, coupled with a print on demand feature as long as it is desired.
5. The transformation from paper-only or paper plus digital versions of journals to digital only/digital and print-on-demand is desirable to reduce production, acquisition, and storage costs currently associated with parallel formats, and will be hastened by removing the uncertainty about long-term accessibility of electronic journals.
6. The chief reason to address issues of long-term retention is to ensure ongoing access to works; preservation of an object in itself is an insufficient end.
Our task is to consider how to ensure that a researcher in 2099 may reasonably consult texts, images, programs and the whole array of digital creations produced in the wrapper of an electronic journal in 1999. This assignment is complex, given the number of parties with vested interests in document survival. Outlined below are descriptions of categories of stakeholders, their missions, their interest in long-term retention, and their relationship to other stakeholders. Categories considered are author, university, publisher, scholarly society, information service provider, library, and government agency. In several cases there is overlap within the sets of stakeholders, but there are also differentiated goals that separate them from one another in some instances. These goals are not always congruent: they can be in overt or in silent conflict.
Author (Scholarly communicator)
Mission: to advance the frontiers of knowledge through recording research findings and scholarly analysis for transmission through an enduring medium to the public.
Interest in Long-Term Retention: As the creator of the work, the author has an intellectual investment and numerous grounds for wanting to see that his ideas live on. The author can take satisfaction in knowing that his influence continues beyond his lifetime and extends into the future. He, or his heirs, may derive financial benefit from his intellectual property. Association with the publication of something of lasting significance enhances the reputation of the scholar and confers status and material advantages, such as career advancement, higher salaries, and awards. Publication in a peer-reviewed journal offers the author the opportunity to disseminate his research or ideas conveniently to a broad population of interested readers. Although communication may be the primary objective for publication, the slowness with which authors have embraced digital publishing suggests a tangle of other motives, since Web-based materials demonstrably are capable of reaching orders of magnitude higher "circulation" than paper equivalents.
Relationship to other stakeholders: The author is often an employee of a university or other organization of higher learning, a user of libraries, an editor of a journal, a client of a publisher, a member of a professional societies, and a citizen of a country.
Mission: to educate, to conduct research, and to reach out to society
"As we approach the twenty-first century, the University is committed to preserving the quest for knowledge as more than simply a practical pursuit. Through its broad range of innovative multidisciplinary programs, and through the earnest exploration of difficult questions, Columbia provides students from the United States and around the world with the depth of understanding and intellectual flexibility they need to respond to the challenges we all will face in the years to come."--Columbia University
"I would found an institution where any person can find instruction in any study."--Ezra Cornell
Interest in Long-Term Retention: Universities have an interest in sustaining scholarly research across a variety of disciplines to further the creation of the building blocks of knowledge. Having a permanent record of achievement readily accessible ensures that future generations of researchers and students can work efficiently to extend our understanding. They can avoid inadvertently covering the same ground. Institutions of higher learning have, up to this point, operated in an open, mostly non-proprietary manner, and are typically non-profit organizations. Nonetheless, there is a competitive aspect to universities, and having leading researchers associated with the university is an asset. One way in which this association manifests itself is through affiliations documented in journal articles. There is also an implicit, and sometimes explicit, co-investment by the university in the work which yields a journal article. For the university, as for the author, the publication results in higher status, and indirectly (and in concert with other factors) may serve as an enticement in the recruiting of other faculty and students and a legitimatization of proposals for funding of future research. Bibliographies of faculty publications are signs of their productivity, and a source of pride of accomplishment. It stands to reason that institutions that are centuries old expect to accrete more value through publication of research and scholarship conducted under their auspices, and that such efforts would need to last to be considered truly valuable.
Relationship with other stakeholders: The relationship between the faculty author and the university is changing with the weakening of support for tenure; the increase in faculty mobility with the attendant higher loyalty toward a discipline over a single institution; and faculty wariness about the commodization of learning. Universities view distance learning as a market opportunity, and intellectual property ownership issues are a subtext of the discussion about maximizing investment in this area. Universities relate to other stakeholders as well. They subsidize the library as a public good for their community, but, as part of the trend for universities to become more businesslike, are increasingly reviewing costs for value. Universities also subsidize employee participation in learned society activities, and support editorial roles by provision of space, clerical and technical infrastructure, and other tangible and intangible benefits.
Mission: to disseminate quality publications to a broad audience.
"For nearly 60 years AP has been serving the information needs of scientists, researchers, engineers, and other professionals in industry and academia."--Academic Press
"Our strategy is to focus on delivering valuable information through innovative products and services, whether in print or electronic form."--Elsevier
Commercial publishers, like authors and universities, depend on the quality of their products for distinction. Unlike authors, societies, or universities, however, they focus almost exclusively on publishing: it is their core business. They rely on the sale of the product to sustain themselves financially, whereas for the above-named stakeholders, with the exception of societies, publications are usually not a major source of income. The safeguarding of this valuable asset thus becomes a driving force for the publisher as long as the title is economically viable.
Interest in Long-Term Retention: Publishers are interested in the archiving of their electronic publications because they represent a source of income. Electronic documents are versatile, and the information in them may be restructured and recombined to create new entities with their own economic value. With the change in dissemination of content offered through electronic media, most publishers retain ownership of the publication and physical control of the manifestation of the work, whereas with printed publications, multiple copies were dispersed and housed in libraries and elsewhere. The opportunity to exercise greater control over the physical package has enabled publishers to achieve tighter control over their intellectual property. As a result, they have a greater sense of responsibility for the product.
Relationship with other Stakeholders: Publishers have a business relationship with authors and universities; they provide a service for authors, sometimes in exchange for a fee, as in page charges. They also have contractual relationships with information service providers, who may aggregate their publications with other to increase access. Editors drawn from the university may serve as voluntary or paid labor in the processing of manuscripts.
Mission: To support the teaching, research, and outreach goals of the University
"To support the instructional, research, and service goals of the University by working collaboratively, creatively, and efficiently to provide services that support physical and intellectual access to information resources in fulfillment of the present and anticipated needs of Cornell students, faculty, and staff, and as appropriate, to alumni and the broader public."--Cornell University Library
"Advance scholarship and science, foster excellence in teaching and learning, and promote service to the public through: developing and providing continuous access to shared collections; and applying appropriate digital technologies to influence and support innovations in scholarly communication. Provide collaborative leadership in selecting, designing, building, managing, and preserving high-quality digital collections."--California Digital Library
Interest in Long-Term Retention: Libraries represent a huge capital investment on the part of the parent organization, and unlike many other capital investments, much of their content does not depreciate or become obsolete over time. Both the artifact and the information it holds can grow in value either through scarcity or through its role as a source of new discoveries. University libraries have always had a special relationship with faculty and have attempted to collect everything published by authors affiliated with their university.
In this capacity, librarians envision a role as custodians of digital information for future generations, although they do not "own" copies of the data, but only channel access for their patrons through licensing. Based on prior experience, librarians know that some publications which seem to be of ephemeral value are often critical resources. Important scientific discoveries of the nineteenth century remain in demand today, sometimes to be reviewed from an historical perspective, but sometimes combed for new clues about the universe. Librarians tend to see information as a public good, rather than a commodity, and commonly hold free and democratic access to information as part of their value system.
Relationship with other stakeholders: Libraries are subunits of universities that serve as the crossroads between research and teaching. Their repositories are often the university's most valuable assets, worth more than any building or laboratory. Over the years, libraries have had a close relationship with publishers because of the centrality of the publications they issued for the service initiatives of the library. In many cases, however, the bond with the publisher has frayed, either through the imposition of the book vendor as the intermediary or through increasing anxiety over rising prices or perceived loss of control over product containers, with a concomitant destabilization of their organization. Information service providers have commanded increasingly larger shares of library budgets, demonstrating the confidence the organizations have in them. At the same time, there are undercurrents of apprehension about monopolistic control, exacerbated by the very large stakes at risk. In this changing environment, libraries have seen societies as closer kin because they draw their membership from the university faculty and seem to hold a more compatible value system (non profit, Libraries alternatively look to the government for relief, as they develop, issue, and maintain standards that facilitate interoperability or enact or uphold legislation that favors fair use of makes information more widely accessible.
Mission: To support scholarship and professional development in a particular field.
"The mission of the American Chemical Society is to encourage in the broadest and most liberal manner the advancement of the chemical enterprise and its practitioners."--American Chemical Society
"The APS has always viewed itself to be in a partnership with libraries, and indeed other publishers, in the mission of promoting and disseminating the knowledge of physics."--American Physical Society
Interest in Long-Term Retention: Societies exist to nurture their members, who band join together to accomplish collectively objectives which serve the discipline or domain with which they are affiliated. A means to achieve this end is the scholarly journal, which promotes the work of its members or work of interest to the membership, and publicizes issues and events of interests to the society. An important by-product of this is revenue derived from subscriptions and advertisements, which in many cases sustain the society. The journal is an asset for the society in much the same way as it is for the university. It is a service for members, and a tangible product that concretizes the work of the society. The existence of the journal, either one of venerable tradition or a large circulation, adds luster to the society. The journal is one of the attractions for society membership. In addition to the journal as documentation of the solidity of the society, the importance of the intellectual property as a source of continuing revenue has grown in recent years.
Relationship with Other Stakeholders: Scholarly societies have a close association with faculty of universities, who are their members, and the authors and editors of articles in their publications. Libraries are their customers. Commercial publishers can be their competitors.
Information Service Provider
"In the broadest sense, JSTOR's mission is to help the scholarly community take advantage of advances in information technologies. In pursuing this mission, JSTOR has adopted a system-wide perspective, taking into account the sometimes conflicting needs of libraries, publishers, and scholars."--JSTOR
"Furthering Access to the World's Information. As set forth in OCLC's Articles of Incorporation, the objectives of the organization are to: Establish, maintain and operate a computerized library network and to promote the evolution of library use, of libraries themselves and of librarianship, and to provide processes and products for the benefit of library users and libraries, including such objectives as increasing availability of library resources to individual library patrons and reducing the rate-of-rise of library per-unit costs, all for the fundamental public purpose of furthering ease of access to and use of the ever-expanding body of worldwide scientific, literary and educational knowledge and information."--OCLC
"HighWire was founded to ensure that its partners - scientific societies and responsible publishers - would remain strong and able to lead the transition toward use of new technologies for scientific communication. Concerned that scientific societies separately would lack the resources and expertise to lead a major technical infrastructure shift in publications, Stanford University, in founding HighWire, accepted the role of partner, agent of change, and advisor."--HighWire Press
Interest in Long-Term Retention: The Information Service Provider is a relative newcomer to the field of scholarly communication, but it is star is ascending, thanks to its ability to harness technological innovation with intellectual content and services that meet scholarly needs. ISP's can be for profit or not-for-profit. Their interest in the longevity of documents is a factor of their customer's demand for ongoing access to content. For some, publications are a core business, as in the case of JSTOR or HighWire, or one item in a suite of services, as they are for OCLC.
Relationship to Stakeholders: ISP have contractual relationships to clients, who are consumers of their goods and services, or suppliers of those products. Universities, libraries, and individuals are mostly to be consumers, whereas publishers and societies would fall in the category of supplier.
Mission: to govern the people of a nation in such a way that promotes their economic, social, and cultural well being
Interest in Long-Term Retention: Governments have an interest in the long-term accessibility of publications because they are a cornerstone of knowledge and an essential ingredient in creating an informed citizenry. Legal deposit requirements enable them to maintain comprehensive national collections that surpass the holdings of any other library in the nation.
Relationship to Other Stakeholders: Governments relate to other stakeholders in unique ways. Most universities in a country derive some sort of government support and authors are often beneficiaries of the government through grants and awards. All institutions and citizens must comply with government relations, of which laws on copyright seem to have the greatest bearing on publishing.
Options for Ensuring Long-Term Access to Electronic Journals
An examination of intertwined relationships of the major stakeholders reveals that it is unlikely that a single player will dominate or that the chief participants will arrive at a uniform solution. A collaborative plan will be more complex, but in all probability, more effective. Multiple approaches may be appropriate, given the proliferation of publications from manifold sources. Since there will be many beneficiaries, it is appropriate that the significant workload and financial and moral responsibility for ongoing access to publications be shouldered by all.
Several options present themselves. No doubt many more permutations exist than are presented here. But these provide an indication of some of the manifold possibilities. The goal is not to replicate the imperfect process we presently have for the preservation of books, but to try to be prescient enough to create a system that is more economical, more effective, and more reliable than our current practices. To satisfy the interests of the various stakeholders and to promote communication about and compliance with the recommended practices, there should be an oversight group with representatives drawn from the different communities. This oversight group would certify that publications were being archived in a secure, responsible manner according to agreed-upon standards.
All proposals require a high degree of collaboration to take advantage of the expertise of the various partners. For example, libraries have a traditional core service value of preserving materials for use by future generations and a history of standards development. They would be key players in developing the functional specifications for archiving. Additionally, libraries would be strong candidates to prioritize what should be archived, in the eventuality that selection of a subset of publication was necessary on economic grounds. Since libraries have considerable experience in selection which materials should be added to collections, it is logical that they would develop policies for determining what to archive.
1. Government Agency
In this centralized repository model, which would be replicated for each country, publishers deposit a copy at a designated government agency. In the United States, for publications licensed to U.S. customers, this could be the Library of Congress. The publisher would be obliged to update the archive on a regular basis, perhaps annually. The publisher would provide the agency with the software, including proprietary software, required to access the files. Specifications for software would accompany the data. The agency could make information in the file publicly available:
a) upon expiration of copyright restrictions
b) if a publisher ceased operation without a successor organization
c) if a publisher ceased contact with the agency for a certain number of years, perhaps 3.
d) when publisher determines publication is no longer financially viable and transfers copyright
Governments are fairly stable.
Legal deposit laws create an existing flow of information to LC.
Commitment to public good, even as a subsidy
Legislation prolonged and contentious.
Bureaucracies sometimes sluggish.
Government funding can be capricious.
2. Non-Profit Repository
Publishers contribute titles annually to a repository. The repository could be administered by an individual entity such as OCLC, RLG or JSTOR or could be a consortium of universities, societies, or publishers.
Economies of scale.
Collaboration can be difficult and expensive.
3. Commercial repositories
Publishers contract with commercial repositories to archive.
Businesses would be subject to accountability.
Businesses would be motivated to be cost-effective.
Economies of scale.
Few businesses survive 100 years.
Unprofitable functions subject to divestment.
4. Libraries as Centers of Responsibility
Libraries agree to collect, make accessible (per licensing agreement) and archive e-journals.
Responsibility for access passes to library
a) upon expiration of copyright restrictions.
b) if a publisher ceased operation without a successor organization.
c) if a publisher ceased contact with the library for a certain number of years, perhaps 3.
d) when publisher determines publication is no longer financially viable and transfers copyright.
Continues traditional preservation role.
Conflict between publisher's list and library's selection policy may exist.
Cooperative collection development not a successful model for emulation.
Economies of scale unlikely.
Libraries may not be structured for economical delivery of services on a widespread basis.
Fragmentation of materials.
Publishers commit to archive their publications.
Source remains constant with owners of original material.
Fragmentation of materials.
Economies of scale unlikely.
Long-term retention not core business.
Archiving not profitable for most materials.
Of the alternatives listed above, the nonprofit repository operated under the collective guidance of universities, societies, and publishers seems to offer the most promise as a feasible solution. The nonprofit repository would serve as the archive for published material deposited by publishers. Universities and their subunit libraries, publishers, and societies would participate in the governance of the repository, which would be guided by policies on selection, specifications for access, technical standards for ensuring enduring access, and economic management.
Selected Alternatives for Funding Long-Term Retention
1. Publishers pay initial/annual fee.
2. Publishers pay as long as material generates revenue.
3. Publishers receive tax incentives to archive.
4. Government funding supports archiving by federal agency.
5. Libraries pay surcharge in subscription or use fees.
6. Libraries reallocate materials budget to archiving in exchange for free or discounted access.
7. Libraries reallocate funds from storage of paper to digital archiving.
8. Libraries reallocate preservation microfilming funds to digital archiving.
9. Universities subsidize for faculty or for libraries in exchange for favorable conditions of access.
10. Societies tax members through membership dues.
11. Authors pay per page or per article.
12. Users pay per use or through subscription.
Beebe, Linda and Barbara Meyers. 1999. "The Unsettled State of Archiving" The Journal of Electronic Publishing. June. http://www.press.umich.edu/jep/04-04/beebe.html
Cox, John. 1999. "Publisher/Library Relationships in the Digital Environment." An STM White Paper. April
Research Libraries Group. 1996. Preserving Digital Information:Report of the Task Force on Archiving of Digital Information. (Donald Waters and John Garrett, Task Force Chairs). http://www.rlg.org/ArchTF/index.html
National Library of Australia.Statement of Principles for the Preservation of and Long-Term Access to Australian Digital Objects. http://www.nla.gov.au/niac/digital/princ.html