Project Harvest

The Cornell University Library's
Proposal to
The Andrew W. Mellon Foundation
To Develop a Repository for E-Journals


Submitted by

Sarah E. Thomas
Principal Investigator
Carl A. Kroch University Librarian
Cornell University
15 October 2000


______

TABLE OF CONTENTS
_______

Overall objectives for the project
Importance to Cornell
Cornell's leadership in digital preservation
Cornell's interest in electronic publication
Cornell's leadership in the preservation of agricultural literature
The appeal of a subject-based approach
Planning year focus
Detailed work plan and budget
Work with publishers to develop an archiving policy

Investigate how to ensure scholarly acceptance of the repository
Develop a technical model for the repository
Develop acquisition and growth plans
Identify an organizational and staffing model
Negotiate access policies for the prototype repository
Develop a plan for the long-term funding of an e-journal repository
Project Staff
Citations
Appendix A: Chronological Workplan



Overall Objectives for the Project

In response to the call from the Mellon Foundation in a letter of 14 August 2000 from Don Waters, Scholarly Communications Project Officer, the Cornell University Library proposes to develop a plan for a repository of electronic journals in the field of agriculture. Project Harvest, as the project will be known, would build on Cornell's historic excellence in preservation in general and the preservation of agricultural literature in particular. We propose to initiate a dialogue with a number of agriculture publishers with whom we have successfully cooperated on other projects. During these discussions, we will identify the elements of a compelling preservation strategy and negotiate a mutually acceptable approach that Cornell could implement and which the publishers could accept. As a product of the negotiations, we will develop a model agreement that could be used as the basis for negotiations with other publishers in agriculture as well as publishers in other disciplines.

In parallel with the negotiations with the publishers, we will also be at work on the considerable technical design issues associated with an e-journal repository. We will develop a design that addresses the various functions needed in an e-journal repository, including ingest, storage, management, migration, and access.

At the end of the planning year, we will have negotiated with a number of publishers over the inclusion of their materials in a library-based repository. We hope that the negotiations will lead to the development of a model agreement that other publishers could readily accept. In addition, we will have modeled the architecture for a long-term repository based on the best thinking in the digital preservation community tempered by the realities of what our publisher/partners are willing to accept. We will have developed an RFP for the purchase of equipment and services needed for the implementation of the repository that we will distribute to vendors when further funding is secured. Finally we will have planned how to address other issues associated with the successful implementation of a long-term e-journal repository, including how to gain community support for the project, how it might grow, what organizational model we would need to follow to develop the e-journal archive, and what our long-term budget plan might be.

Importance to Cornell

There are a number of reasons why Cornell University wishes to undertake the difficult task of organizing, designing, and implementing a digital e-journal repository. First, it is a natural outgrowth of our long-standing interest in preservation of research library materials. In addition, the proposal builds on our interest in issues surrounding electronic publication. Finally, the subject matter of the proposal - the preservation of electronic agricultural literature - has special importance to us. The Mann Library at Cornell has spearheaded several national initiatives to ensure that essential agricultural literature is preserved; Project Harvest would extend this interest to the realm of electronic publication. Fortunately, our past experience in all three areas (preservation, electronic publishing, and agricultural subject areas) has equipped us well to undertake Project Harvest.

Cornell's leadership in digital preservation

For over a decade the Cornell University Library has been a leader in the preservation field in general, and in the application of digital technologies to library preservation in particular. Through a series of research grants and award-winning publications, Anne Kenney and the other staff of the Library's Preservation Department have developed much of the conceptual basis for our understanding of the place of digital technologies in the conversion of analog material. Over the last several years, she and Oya Y. Rieger have collaborated on a number of research projects to advance Cornell's understanding of digital preservation requirements. One of these projects, funded by the Institute for Museum and Library Services, has resulted in the development of a preservation strategy for Cornell's digital image collections, including requirements for the establishment of a central depository. A second research project, conducted in collaboration with Mann Library staff and funded by the Council on Library and Information Resources, assessed risks associated with file format migration (Lawrence, 2000). Currently Kenney, Rieger and other members of the Department of Preservation are collaborating with colleagues in the Computer Science Department on Project PRISM, a Digital Library Initiative, Phase Two project funded by the National Science Foundation and other interested funding agencies. Project PRISM is exploring issues around information integrity in the development and growth of digital libraries, with particular emphasis on preservation and security requirements. Cornell's growing expertise in digital preservation is evidenced by staff appointments to important initiatives. Anne Kenney, for example, has been named to the RLG/OCLC Working Group on Attributes of a Digital Archive. Oya Rieger serves on the Preservation Metadata Working Group of the RLG/DLF Task Force on Policies and Practice for the Long-Term Retention of Digital Materials and is co-chairing the NISO Technical Committee on Metadata for Digital Still Images. Another library staff member, Peter Hirtle, was a member of the seminal CPA/RLG Task Force on Digital Archiving (Task Force, 1996) and is serving on the Institutional Records Working Group of the RLG/DLF Task Force.

In short, Cornell University has had extensive interest in and commitment to the preservation of research library materials. The preservation of "born digital" material is the next logical area for Cornell to address. Our previous experience has uniquely positioned Cornell to explore and implement an archival repository for electronic journals.

Cornell's interest in electronic publication

Cornell University has also developed a strong interest in electronic publication. During the past two years, and with the support of The Andrew W. Mellon Foundation, the Cornell University Library has planned and undertaken a project in the electronic publication of mathematical journals. Project Euclid is intended to serve the needs of both mathematics publishers and mathematician end-users. We intend to develop Euclid into a primary channel for the publication and exchange of mathematical scholarship.

Project Euclid demonstrates Cornell University's interest in issues surrounding electronic publication. Project Euclid, however, is intended to be an experiment in publication, not archiving. Nevertheless, the policies, practices, and procedures that are developed as part of Project Harvest may benefit the journals included in Project Euclid. Because a long-term repository is a key component of the scholarly exchange process in the online environment, such a repository, specifically tailored to the requirements of mathematics publications, would be a further service that we would like to offer those publishers who make use of Euclid to publish their journals.

It is in Cornell's interest, therefore, to support the creation of a protocol for the creation and maintenance of an e-journal archive. Such a protocol could be adapted for mathematics and other disciplines. It would attract other mathematics publishers to the project who are not publishing their journals through Euclid, and foster the support of the Euclid project. In this manner, our currently-funded investigation of new models for scholarly communication and the proposed new investigation into methods by which we can provide long-term access to electronic scholarly information can support each other.

Cornell's leadership in the preservation of agricultural literature

The literature of agriculture is a natural body of materials for a test of the preservation of electronic literature. For over a decade the Cornell University Library, the land grant library for the state of New York, has taken a leadership role in the preservation of the literature of agriculture. Librarians at Cornell's Mann Library, specializing in agricultural and life sciences, have worked with other land grant university libraries and the National Agricultural Library to establish a national preservation plan for agriculture (Gwinn 1993). As part of the preservation plan, the information components of the agricultural sciences were identified and primary responsibilities for the coordination of preservation activities were assigned to different institutions, as indicated in Figure 1. Cornell currently coordinates several parts of this ongoing preservation work. Mary Ochs and Joy Paulson at Cornell University's Mann Library manage on behalf of USAIN (United States Agricultural Information Network) the NEH-funded cooperative microfilming project to preserve state and county agricultural documents. Cornell also continues to develop the Core Historical Literature of Agriculture digital library. In September 2000, Cornell University received an IMLS National Leadership Grant to digitize the core historical literature of the related field of home economics. As part of the project, Mann Library will implement the guidelines established in the IMLS-funded preservation strategy project mentioned above. Part of the project will experiment with the Endeavor Encompass software to increase the interoperability of the home economics digital library with other digital repositories at Cornell, such as Making of America. Finally, the project will define a set of model workflows for capturing metadata for access and preservation of digital materials.

Agricultural literature is well-served by a carefully developed and coordinated program, the National Preservation Program for Agricultural Literature. The USAIN National Preservation Special Project Committee oversees the plan, and Mary Ochs of Mann Library serves as a member of that committee. As indicated in figure 1, preservation of electronic publications in agriculture was not a significant component of the plan when it was developed. To address one part of the changing landscape of publishing, Cornell University co-sponsored with the National Agricultural Library in March 1997 a meeting on government electronic publications in agriculture. A product of that meeting was the publication by Paul Uhlir, Project Consultant for the National Research Council, of the "Framework for the Preservation of and Permanent Public Access to USDA Digital Publications." This framework was later adopted by the National Agricultural Library in conjunction with the USDA Office of the Chief Information Officer. No similar meeting for electronic journals in agriculture has yet been held. The growing importance of e-journals in the field of agriculture makes the secure archiving and long-term access to these "born digital" files of central importance to the evolving preservation plan. As Cornell's representative on the USAIN special projects committee, Mary Ochs will inform the committee about the work on this planning project and seek input from them. Connections with USAIN and the agricultural library community offer links for building trust within the user community.

The appeal of a subject-based approach

While the absence of e-journals as an area of concern in the National Agricultural Preservation Plan cries out as an area that needs to be addressed, we are proposing to study the e-journals in agriculture also because of the tactical advantages it provides us. An agricultural subject-based archive would include journals from a wide variety of publishers, with a wide variety of file formats and multiple contractual agreements required. Appendix B contains a list of the core journals in agriculture and represents a starting point in the search for journals for possible inclusion in the archive. This list of journals is based on the seven-volume bibliography, The Literature of the Agricultural Sciences, edited by Wallace Olsen of Cornell University's Mann Library, which identifies the core journals in the seven sub-disciplines of agriculture (Olsen, 1991-1996). Preliminary investigations (which will be expanded during the planning year) suggest that as many as 75% of the journals are now available in electronic form. As Appendix B indicates, many publishers are represented, and a number of the journals already have in place arrangements with nominal archival repositories such as JSTOR, OCLC's ECO (Electronic Collections Online) project, or HighWire Press. It is unclear, however, whether any of these repositories meet the requirements outlined in the "Minimum criteria for an archival repository of digital scholarly journals" developed by CLIR, DLF, and CNI.

In sum, there are a large number of agricultural journals that are available in electronic form; these journals represent a high percentage of the core serial literature in agriculture; they are produced by a number of different publishers and publishing arrangements; and some of the journals have a quasi-archival arrangement in place. Taken together, the agricultural publishing arena offers the real potential in the second phase of the project for the creation of a large, varied, and robust e-journal repository that reflects much of the diversity found in scholarly communication.

Planning year focus

We heartily endorse the assertion in the "Minimum Criteria for an Archival Repository of Digital Scholarly Journals" that an archival repository that acts to preserve digital scholarly publications must be a trusted party that conforms to certain minimum requirements agreed to by both scholarly publishers and libraries. The most serious challenges and impediments to the creation of an e-journal repository are political: they have to do not with how the technology is designed, but rather with how the essential stakeholders (publishers, libraries, the scientific societies that support both, and the user) relate and work with each other. Indeed, how they relate will to no small extent determine how the technology evolves. What precisely is stored in such a repository, how access to it is guaranteed, who owns it, how and under what circumstances it is accessed, who authorizes such access, how the entire operation is securely and regularly funded--these and similar questions must be answered jointly by the stakeholders before the building of a fail-safe repository can commence. During the planning year, we will work with our target publishers to formulate and develop provisional answers to these basic business and technical questions. Negotiations with publishers over the design, organization, and operation of the digital repository will therefore be the primary activity during the planning year.

While political issues may be the greatest challenge to the successful implementation of an e-journal repository, serious technical challenges also confront us. There are a number of technical issues that must be identified and addressed in conjunction with the negotiation with publishers. Some concern the nature of the e-journal archive itself. Is it to be, for example, a fail-safe repository of last resort whose contents are shaped by a desire to ensure the longest possible lifespan, or should it try to offer the full range of functionality found in the e-journal itself? Can the repository be built so that both options are possible? Once the nature of the archive is defined, what systems are to be used for the ingest, organization, maintenance, migration, and delivery of the e-journal files? What is the place of redundancy in the system? The development of a technical model for the e-journal repository will be the second focus of the planning year.

In addition to our negotiations with publishers and the development of a technical work plan, we will also use the planning year to develop mechanisms for convincing the scholarly community of the validity of the repository, explore organizational and staffing models for any implemented repository, and explore long-term funding options and growth plans for the repository.

Detailed work plan and budget

The planning year will be divided into seven separate but related activities.

Work with publishers to develop an archiving policy

It is our belief that effective archiving of electronic journals can only be accomplished through a publisher/librarian partnership. The Mellon planning grant would allow us to work with publishers to establish a set of responsibilities for both Cornell, as the archiving institution, and the publishers, as archival depositors. Along with those responsibilities, we must establish conditions for inclusion, including copyright clearance, that are broadly acceptable to the publishers, but allow Cornell, as the archiving institution, the flexibility to establish technical specifications and access policies that serve users well. To be successful, these negotiations must identify the benefits and drawbacks of the different configurations of an e-journal archive and find an acceptable common ground for all parties. This will then form the basis of the contracts with publishers depositing files in the archive.

The first step in the acquisitions plan would be to develop selection criteria that will allow us to prioritize from the list of journals in Appendix B which of the publishers we wish to ask to be part of Project Harvest. Twelve publishers are obvious candidates with which to work. These publishers issue a large number of the core titles; they are prominent in the field (and hence likely to serve as models for others); and they represent a wide variety of publishing models, including both profit and non-profit. They include:

  • Elsevier, with 16 journals on the list
  • Either the Tri-Societies (American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America) or Springer, which produces the journals
  • The National Research Council of Canada. At one point, their archiving policy was to maintain material "until they ran out of room on their server"
  • Annual reviews. This is available via Highwire, and offer a way of working with another university
  • University of Chicago Press. Titles include Economic Development and Cultural Change, International Journal of Plant Sciences, and American Naturalist;
  • American Agricultural Economics Association
  • Federation of American Societies for Experimental Biology
  • Cambridge University Press, with 7 journals on our list
  • Entomological Society of America. We have already negotiated permission with them to include several titles in the online Core Historical Literature of Agriculture
  • Kluwer, with 5 titles on list
  • Oxford University Press, with 3 titles on list
  • Blackwell Science, with 9 titles on list

After identifying the potential publisher partners, we will then ask a pilot group to participate in Cornell's Project Harvest. From those publishers expressing interest in participating, we would gather a small development team to consider the issues outlined above. The goal for the development team would be to create, through an iterative process, a standard agreement for archival deposit. Topics that would be identified in the agreement include:

  • The general responsibilities of the publishers and Cornell
  • Characteristics of the data, accompanying metadata, and any additional documentation that are to be deposited
  • Guidelines on transmission methods and media for deposit
  • Procedures for the deposit
  • Procedures and protocols Cornell will use to verify the arrival and completeness of the data
  • Rights of the depositing organizations to audit the repository
  • The respective roles, responsibilities, and rights of the Cornell and the data producers with regard to the data
  • Articulation of Cornell's responsibilities and capabilities with regard to the accessioning, description, management, and even transformation of the deposited data
  • Access policies for users of the repository, and how they may vary over time
  • Conditions on the use of the data, and again how they may vary over time
  • Fees (if any) associated with the deposit
  • Cornell's ability to share the data with partners to create an agreed-upon level of redundancy
  • Clarification of issues surrounding copyright retained by authors
  • Other key issues defined by the development team

Assuming the implementation phase of the project is funded, we anticipate contacting all the publishers from the list in Appendix B to assess their interest in participating in the project. Through the planning process, we would need to determine the number of titles we can handle in the first years of the project.

Cornell has had experience with this type of negotiations. We worked with 68 publishers, including Elsevier, Kluwer, and others, to secure rights to use material in TEEAL (The Essential Electronic Agricultural Library) and The Core Historical Literature of Agriculture. In the process of negotiations, staff members developed a standard agreement similar in function (if not in content) to the agreement we are proposing to develop for Project Harvest. In Project TEEAL, once one major publisher agreed to the TEEAL basic agreement many other publishers followed. We anticipate a similar development with Project Harvest. Project Euclid, like TEEAL, has been built on a partnership of publishers and librarians. In the case of Project Euclid, many of the publishers are scientific societies, providing us with experience in learning and understanding the concerns of a different group of publishers. We would use the lessons we have learned from developing Euclid in shaping the discussions for Project Harvest.

We would also draw on the lessons others have learned in negotiating with publishers. The CLIR/DLF draft model license found on the LIBLICENSE web site at Yale University <http://www.library.yale.edu/~llicense/>, for example, is a natural model on which to draw for our similar effort to develop a model archiving agreement. The data depository program of the Arts and Humanities Data Service <http://ahds.ac.uk/deposit/depintro.html> will also provide information on what is needed for a digital archive and what creators are likely to be willing to deposit

Investigate how to ensure scholarly acceptance of the repository

The Cornell repository will only be successful if the scholarly community is convinced that the journals deposited at Cornell will remain accessible and readable over time. An important component of the planning year therefore will be assessing how scholars feel about e-journals and identifying methods to build trust in the community.

In the matter of trust, Project Harvest is in a favored position. Mann Library within the Cornell library system has had a long history of preserving and making available to the scholarly community the core literature of agriculture. An ongoing and significant electronic initiative is the USDA Economics and Statistics System with its statistical and textual reports from the Agriculture Department's Economic Research Service, National Agricultural Statistics Service, and World Agricultural Outlook Board. Scholars know that Cornell has a vested interest in the preservation of the literature of agriculture, making this project mission-driven, rather than external to the overall goals of the institution.

Given the confidence that the university already enjoys with publishers, librarians, and scholars, some in the scholarly community may be willing to accept whatever the university proposes to do just because it comes from Cornell. However, it will also be important to develop formal methods of representing the organizational and technical competencies Cornell plans to build during the course of Project Harvest. To meet this need, the project team will develop a plan to outline the organizational and technical components of the repository. We assume that the success of the journal deposit system developed during the course of the project will be heavily dependent on the reliability and credibility of the organizational and technical work plan. We will convince the repository's customers that materials in the repository are in good hands by articulating for them our plans for the building, maintenance, and management of the repository.

One component part of our information campaign will be to develop a mission statement for Project Harvest that can be shared with the appropriate scholarly communities. The mission statement will include the information recommended by the "Minimum criteria for an archival repository of digital scholarly journals," including the scope and nature of the materials to be included in the repository, the strategy and methods we will adopt to attract materials, and the user community we hope to serve.

A second means of building scholarly acceptance of Project Harvest will be to ensure that the archive conforms to generally accepted standards for digital repositories. One of the recommendations of the highly influential report of the Task Force on Archiving of Digital Information was that standards and criteria for the certification of digital information repositories be developed. Several national and international projects are exploring the process and methodology in defining the requirements for a certified repository. Among the key initiatives are:

· During the October 1999 ISO Archiving Workshop Series, certification of archives (specifically within the framework of the "Reference Model for an Open Archival Information System" (OAIS)) was one of the key areas for workshop focus and possible standardization efforts.
· The upcoming Preservation 2000 conference, which is sponsored by the UK's Cedars Project, RLG, and OCLC, will provide a platform to continue the discussion of criteria for certification at an international level.
· In March 2000 the Research Libraries Group (RLG) and the Online Computer Library Center (OCLC) announced that they will cooperate to create infrastructures for digital archiving. One of their goals is to establish best practices and document the attributes of digital archives for research repositories.

The Library's Digital Imaging and Preservation Research Unit is an active participant in all of these initiatives. During the planning phase of Project Harvest, the staff will closely monitor this and related work in the certification of repositories and will actively contribute to them by sharing Cornell's empirical experience. As certification standards emerge, Cornell will publicize our adherence to the standards as one more way of ensuring trust in the user community.

Develop a technical model for the repository

Cornell will invest in a five-pronged effort that will focus on: 1) establishing a baseline of e-journal software and file format needs; 2) specifying the archival repository; 3) specifying monitoring tools that will flag documents within the repository that require migration; 4) specifying a baseline hardware and software infrastructure to house the repository; and 5) exploring the need and implementation models for redundancy in the repository.

1) Establish a baseline of formats and related software.
Cornell will inventory file formats and software in use today to store and manage e-journals in agriculture. We will collect conversion routines that permit modifying these formats. We will explore whether there is one "least common denominator" format that has minimum software dependencies, and that can be used to create one parallel copy of each journal in that format. Whether or not there is such a format, we will also look at how we might maintain the formats in use in the current live system. One area we want to explore in particular is whether we can maintain both systems: a system with high functionality based on current software as well as one based on a more limited, but likely more enduring, format.

2) Specify the archival repository.
Cornell will investigate potential architectures and design criteria for the archive repository, and will choose an approach that is the essence of simplicity. The repository will be based on the OAIS reference model and compliant with Open Archives Initiative protocols and other initiatives in the subject domain of agriculture. (Cornell is already planning to implement OAI protocols in Project Euclid.) The repository model will provide for redundancy of instances. The repository architecture needs to support establishing relationships among the e-journal components without depending on specialized software that is itself subject to technological obsolescence. An example of a possible architecture would be one that relates internal components based on sequence and naming conventions. The repository files will contain metadata for each journal complying with contemporary standards and files in multiple formats. It will include at least the file format in common use for that journal today and an additional "least common denominator" version, as well as associated conversion software.

3) Specify a monitoring system.
Cornell will specify a software application to manage the status of each member of the repository. It will be a tool that includes a record for each member of the repository with information needed to establish its age, migration status, and technological dependencies (standards, software, etc). This system will be used as a prediction tool. Criteria will be fed to the system to identify changes in standards or versions of software. The system will present specific e-journals in the repository related to that criteria. These e-journals will then require review to determine whether they need migration.

In investigating and developing the specification of such a monitoring system, Cornell will build upon its previous and current digital preservation investigations. For example, the Risk Management of Digital Information project, which was sponsored by CLIR, equipped the library with a better understanding of the organizational and technical threats that need to be monitored and controlled to ensure the longevity of digital resources (report available at <http://www.clir.org/pubs/abstract/pub93abst.html>). The library's current DLI2 project focuses on digital preservation. Particularly relevant to this proposal is a Web profiling tool that is being developed by the library's Digital Imaging and Preservation Unit and the Cornell Computer Science Department. This web profiling software will attempt to gather information on various characteristics of digital resources to support digital preservation monitoring and decision-making. This tool provides a technical background for the development of the proposed assessment tool. Another library project, sponsored by an IMLS grant, helped the library to develop a better understanding of the role of preservation metadata in supporting the long-term management of digital collections. The library is developing guidelines for preservation procedures and metadata for digital image collections that are to be deposited in a central digital repository.

4) Establish baseline hardware infrastructure.
Cornell will specify hardware with modular storage components to accommodate massive growth in the amount of material stored and identify an architecture for data and system backup that is automatic and self-reporting. Reliability and redundancy of internal hardware components, combined with growth and migration potential, will be priority attributes in the hardware plan. Cornell will develop an RFI to distribute to hardware vendors for their comment before the end of the planning year.

5) Investigate need for and approach to redundancy.
Along with the addition of new journals to the repository, there is the possibility of mirroring and/or distributing some of the repository functions to library collaborators. The land grant community has strong ties and a history of cooperative preservation efforts. Other institutions within the land grant community could provide redundancy for the system Cornell develops, or they might duplicate the procedures followed by Cornell with other publishers and subjects. In either event, the workload would be shared among other committed partners. During the planning year, we would want to explore further the need for redundancy in the repository, and begin to work with potential partners. Cornell is a partner in the LOCKSS program from Stanford University and Highwire Press. LOCKSS - Lots of Copies Keeps Stuff Safe - is intended to be a revolutionary, distributed archiving model. We will want to see if any of the lessons learned from the LOCKSS project can be applied to Project HARVEST.

Develop acquisition and growth plans

During the planning year, we will develop a two-phased acquisition and growth plan. The first phase will focus on the addition of journals to the pilot agricultural repository. This work will continue during the implementation phase. As new journals are published in the field of agriculture, or as older journals become more important, publishers could request to have a journal included in Project Harvest. Journals could also be nominated, possibly by an advisory board of agricultural scholars who would recommend whether to seek out that journal for Project Harvest. We may also wish to work with the agricultural library community to ensure that at least one print copy of all e-journals that also have a printed manifestation is retained. This process would be explored fully during the planning process.

More importantly, the implementation phase would give us hard data on how the pilot could be expanded to other disciplines and/or publishers in a second phase. Our experience may indicate that future repositories should be developed around a subject discipline, as with Project Harvest. We may also find that while the subject approach proves useful in the pilot phase when the primary task is negotiating a general agreement with publishers (and Cornell's good relationship with agricultural publishers makes this task possible), future repositories would be better organized around publishers and their specific publishing systems than by subject. One of the elements we will want to assess during the planning year (and possibly after) is whether a subject-based approach is appropriate for a repository, or whether we should use the agreements we have developed with our agricultural publishing partners as the basis for a general agreement regarding the deposit of all of our partner's publications, regardless of subject matter.

Identify an organizational and staffing model

We can already see that the project will require collaboration across normal institutional boundaries. We are structuring the planning phase so that it will be a cooperative project drawing on the expertise found in Mann Library, the Preservation and Digital Libraries and Information Technologies (DLIT) departments, and the Library's Institute for Digital Collections (CIDC). Project Harvest will be overseen in the planning phase by a steering committee consisting of representatives from the Mann Library, DLIT, and the Preservation Department, with the inclusion of a faculty member to represent the interest of users.

The staffing model of the planning phase is based on the functional activities suggested in the OAIS reference model. Staff will be assigned to work in each of these four areas:

Submission

· identify and contact publishers seeking collaboration
· negotiate terms for submission, access, updates, and other conditions
· plan future growth and acquisitions
· coordinate the role of Cornell in agricultural cooperative preservation efforts

The submission activities of the planning phase will be the primary responsibility of the Collection Development unit in Mann Library. They will be assisted by a working group drawn from the staff from the Preservation Department and license librarians in Mann Library and CUL Central Technical Services. Legal advice from the university's General Counsel's office will be sought as appropriate when working out the details of the contract.

Ingestion

· prepare data for archiving
· profile resources - identify characteristics
· chose standards, develop procedures

Planning for ingest will be a collaborative effort between the Mann Library's Information Technology Section and the Digital Library and Information Technology division in the Cornell University Library system. A minimum of one half FTE will work in this area and the subsequent area. The work will be informed by the findings of the Submission group and the preservation requirements identified by Preservation Department staff, particularly in the area of standards.


Data Management, Archival Storage, and Access
· determine hardware and software needs
· conduct requirements analysis to determine system infrastructure
· design the archival system (both ingest and access components)

Again, Mann Library's Information Technology Section and the Digital Library and Information Technology division in the Cornell University Library system will collaborate on the design of this aspect of the system.

Policy Development

· facilitate the interaction of the different groups within the library
· contribute to the development of criteria for the certification of archival repositories
· develop economic models to ensure the long-term sustainability of the repository
· work closely with the technology team and the collection development team to develop strategies for standards, file formats used, preservation metadata, preservation strategies, etc.

Staff of the Preservation Department will take the lead in identifying the policy framework for the project. Their investigations will be tempered by the work of the Submission group and the technical requirements identified by the Ingest and Data Administration groups.

Overall policy will be approved by a Steering Committee for the project. The Steering committee will be composed of senior administrators in the library (the directors of Mann Library and the Digital Library and Information Technology division, the Associate Director of the Preservation Department, and the University Librarian serving as PI) and one faculty member, representing the interests of some of the users.

A key question to explore during the planning year will be whether digital repository functions can be absorbed within our existing organizational model, or whether a new organizational unit that cuts across current administrative, subject, and functional lines is needed.

Negotiate access policies for the prototype repository
Publishers have been unwilling in the past to maintain large print archives of back issues of their journals. Often libraries hold the only complete back-run of print titles. E-journals, while they do not require large warehouses or library shelves for storage, do require electronic storage space and maintenance that assures the integrity of the digital content. It is unclear whether publishers intend to maintain archives their own archives, but libraries are requiring this assurance when they sign e-journal contracts. Many e-journal publishers are relying on OCLC ECO (Electronic Collections Online) for archiving, but OCLC cannot do it all, nor is the reliance on one sole archive sound practice. Research libraries are, not surprisingly, unwilling to discard print issues without long term guarantees that e-journal files will be available.

This planning grant would allow us to explore two major scenarios for an e-journal archive, the "dark archive" and the "living archive." The "dark archive" model creates an archive where stored files would only be used in an emergency. This model is similar to the model of storing microfilm in the National Underground Storage facility. In order to minimize the cost of maintaining a "dark archive," e-journal content might be converted on ingest to some common, stable, minimal format (albeit with a concomitant loss of functionality). A "living archive" of agricultural scholarship, in contrast, would be modeled after JSTOR or OCLC and would provide access to back files of e-journals that publishers no longer wished to maintain or to which publishers are willing to provide additional access. The publishers would of course still be able to provide access to recent issues if they desired. As part of the planning process, the development team would need to investigate the staffing, contractual, economic and technical implications of both options.

The living archive presents the greater challenge in that publishers may be less willing to allow open access to their material. The development team would have to carefully evaluate the implications of the various access policies on the publishers and the users. Issues such as when files would be made available, mechanisms for allowing access, and comparability with the original files, among other issues, must be addressed.

Develop a plan for the long-term funding of an e-journal repository
Libraries have traditionally assumed the cost of storing and preserving paper copies of agricultural journals. While libraries may be willing to absorb the cost of preserving electronic copies of the same publications, it is more likely that a business model that can make the preservation of e-journals self-sustaining must be found. During the planning year, we will investigate several approaches for making the repository economically self-sufficient over the long-term. This requires that we account for the capital costs associated with building and expanding the repository infrastructure over time. We must also account for the operating costs associated with maintaining and providing access to the repository. [Guthrie, 2000]

There are several possible sources of funds that could be used to maintain and grow the repository over time. They include:

  • agencies and foundations supportive of the need to preserve the agricultural literature
  • publishers, who may be willing to pay on a per-journal basis the cost for archiving the journal (perhaps by including an archival surcharge with the electronic access surcharge common among major publishers)
  • acquiring free or reduced subscriptions from publishers in exchange for archiving their journals
  • charging fees for access to the archival repository

The last three options require the agreement and cooperation of the publishers. Based on the results of the negotiations with them, we anticipate being able to develop a business model that will indicate how much, if anything, archiving agricultural literature will cost Cornell University.

Project Staff

Project Harvest will be a Library-wide effort. The following individuals will play key roles in its implementation.


Sarah Thomas, University Librarian, will serve as Principal Investigator of Project Harvest.

Peter B. Hirtle, Co-Director, Cornell Institute for Digital Collections, will serve as Project Coordinator.

Three working groups will work directly with the Coordinator. Each will be chair by a senior library staff member. Mary Ochs, Head, Collection Development and Preservation at Mann Library, will chair the Publisher Relations Group. Tim Lynch, Head, Information Technology Section at Mann Library, will chair the Technical Design Group. Oya Y. Rieger, Acting Assistant Director of Preservation for Digital Imaging and Preservation Research, will chair the Preservation Policy Group.

A Publisher Relations Specialist, Preservation Policy Advisor, and Administrative Assistant will be hired to work with Cornell staff on Project Harvest.

A Steering Committee will be established to provide general oversight. Anne R. Kenney, Co-Director, CIDC and Associate Director of the Department of Preservation, will chair the Steering Committee. Other members will include: Sarah Thomas, University Librarian, Janet McCue, Director of Mann Library, and H. Thomas Hickerson, Associate University Librarian for Digital Libraries, Information Technology and Special Collections.

Citations

Guthrie, Kevin. 2000. "Developing a Digital Preservation Strategy for JSTOR, an interview with Kevin Guthrie." RLG DigiNews 4:4 (15 August 2000) <http://www.rlg.org/preserv/diginews/diginews4-4.html - feature1>

Gwinn, Nancy E. 1993. A national preservation program for agricultural literature. S.l. : s.n.

Lawrence, Gregory W., William R. Kehoe, Oya Y. Rieger, William H. Walters, and Anne R. Kenney. 2000. Risk Management of Digital Information: A File Format Investigation. Washington, D.C. : Council on Library and Information Resources.

Olsen, Wallace C., editor. 1991-1996. The Literature of the Agricultural Sciences. Ithaca, N.Y. : Cornell University Press.

Task Force on Archiving of Digital Information. 1996. Preserving digital information: Report of the Task Force on Archiving of Digital Information. Washington, D.C. : Commission on Preservation and Access.

Uhlir, Paul. 1997. Framework for the preservation of and permanent public access to USDA digital publications. S.l. : s.n.

Appendix A: Chronological Workplan

Note: lead participants are identified in italics after each task.

(Prior to start of project)
· Advertise and interview for project-funded positions: Publisher relations specialist; Administrative support person (Project coordinator, administrative staff)
· Identify space and equipment for new Project Harvest staff (Project coordinator, Library administration)

Jan. 2001 - March 2001
· Hold Project Harvest organization meeting. Bring together Project Harvest Team, Advisory Committee. Create mission statement for the Project Harvest plan (Project leader, Project Harvest team)
· Develop selection criteria to allow prioritization of possible partners (Publisher relations specialist, collection development staff)
· Contact an initial group of potential partners to identify partners interested in the problem (Publisher relations specialist, collection development staff)
· Establish, based on the OAIS model and the "Minimum criteria" what we feel are the ideal component parts of an e-journal preservation system (Preservation policy advisor)
· Establish a baseline of formats and software used in pilot e-journals (Publisher relations specialist, Technology design group)
· Advisory Committee will meet to review progress (Project coordinator)

April 2001 - May 2001
· Hold negotiations with the pilot group of publishers on the issues we have identified as core to a successful e-journal archival policy (Publisher relations specialist, Preservation policy advisor)
· Investigate potential architectures for e-journal repository that are both open and compatible with the needs identified in the negotiations with the publishers (Technology Design Group)
· Identify the organizational and staffing model the Library would follow in implementing Project Harvest (Project leader, Project Harvest team)
· Advisory Committee will meet to review progress (Project coordinator)

June 2001 - July 2001
· Develop a model license agreement based on the results of the negotiations with the pilot group of publishers (Project coordinator, Publisher relations specialist, Preservation policy advisor, Legal counsel)
· Contact additional publishers lower on the priority the list in order to field test the license agreement. (Publisher relations specialist)
· Specify a software application to manage the status of each member of the repository (Technology Design Group)
· Advisory Committee will meet to review progress (Project coordinator)

August - October 2001
· Contact remainder of the publishers of the core journals in agriculture to solicit interest in possible participation in the project (Publisher relations specialist)
· Establish the baseline hardware needed to implement Project Harvest (Technology Design Group)
· Investigate the place of redundancy in the archiving system (Technology Design Group, Preservation policy advisor, Publisher relations specialist)
· Given the needed technological and organizational environment, develop a business model that can make Project Harvest financially acceptable to the Library (Project Coordinator, Preservation policy advisor, Publisher relations specialist)
· Advisory Committee will meet to review progress (Project coordinator)

November - December 2001
· Assuming a sustainable business model can be identified, prepare a grant application for the implementation of Project Harvest based on the findings of the previous year (Project Coordinator)
· Develop an RFP for the hardware and software needed to implement Project Harvest in a manageable, scalable, fashion. The RFP will be ready to distribute as soon as implementation funding is received (Technology Design Group)
· Develop methods for representing the organizational and technology competencies developed during the design of Project Harvest to the scholarly and user communities (Preservation policy advisor, Publisher relations specialist)
· Develop formal acquisition and growth plans to guide the implementation of Project Harvest. The plan will determine how new journals are to be added to the implementation (Publisher relations specialist, Project Coordinator)
· Advisory Committee will meet to review progress (Project coordinator)

Throughout the course of the project:
· Share information about the design and implementation of Project Harvest with relevant preservation and agricultural information communities (Entire project team).

 
 

TOP

Contact Us:
Webmaster
last updated February 2002