|
Submitted by
Sarah E. Thomas
Principal Investigator
Carl A. Kroch University Librarian
Cornell University
15 October 2000
______
TABLE OF CONTENTS
_______
Overall
objectives for the project
Importance to Cornell
Cornell's leadership in digital preservation
Cornell's interest in electronic publication
Cornell's leadership in the preservation of agricultural
literature
The appeal of a subject-based approach
Planning year focus
Detailed work plan and budget
Work with publishers to develop an archiving policy
Investigate how to ensure scholarly acceptance of the
repository
Develop a technical model for the repository
Develop acquisition and growth plans
Identify an organizational and staffing model
Negotiate access policies for the prototype repository
Develop a plan for the long-term funding of an
e-journal repository
Project Staff
Citations
Appendix A: Chronological Workplan
Overall Objectives for the
Project
In
response to the call from the Mellon Foundation in a letter of 14
August 2000 from Don Waters, Scholarly Communications Project Officer,
the Cornell University Library proposes to develop a plan for a
repository of electronic journals in the field of agriculture. Project
Harvest, as the project will be known, would build on Cornell's
historic excellence in preservation in general and the preservation
of agricultural literature in particular. We propose to initiate
a dialogue with a number of agriculture publishers with whom we
have successfully cooperated on other projects. During these discussions,
we will identify the elements of a compelling preservation strategy
and negotiate a mutually acceptable approach that Cornell could
implement and which the publishers could accept. As a product of
the negotiations, we will develop a model agreement that could be
used as the basis for negotiations with other publishers in agriculture
as well as publishers in other disciplines.
In
parallel with the negotiations with the publishers, we will also
be at work on the considerable technical design issues associated
with an e-journal repository. We will develop a design that addresses
the various functions needed in an e-journal repository, including
ingest, storage, management, migration, and access.
At
the end of the planning year, we will have negotiated with a number
of publishers over the inclusion of their materials in a library-based
repository. We hope that the negotiations will lead to the development
of a model agreement that other publishers could readily accept.
In addition, we will have modeled the architecture for a long-term
repository based on the best thinking in the digital preservation
community tempered by the realities of what our publisher/partners
are willing to accept. We will have developed an RFP for the purchase
of equipment and services needed for the implementation of the repository
that we will distribute to vendors when further funding is secured.
Finally we will have planned how to address other issues associated
with the successful implementation of a long-term e-journal repository,
including how to gain community support for the project, how it
might grow, what organizational model we would need to follow to
develop the e-journal archive, and what our long-term budget plan
might be.
Importance
to Cornell
There
are a number of reasons why Cornell University wishes to undertake
the difficult task of organizing, designing, and implementing a
digital e-journal repository. First, it is a natural outgrowth of
our long-standing interest in preservation of research library materials.
In addition, the proposal builds on our interest in issues surrounding
electronic publication. Finally, the subject matter of the proposal
- the preservation of electronic agricultural literature - has special
importance to us. The Mann Library at Cornell has spearheaded several
national initiatives to ensure that essential agricultural literature
is preserved; Project Harvest would extend this interest to the
realm of electronic publication. Fortunately, our past experience
in all three areas (preservation, electronic publishing, and agricultural
subject areas) has equipped us well to undertake Project Harvest.
Cornell's leadership in digital preservation
For
over a decade the Cornell University Library has been a leader in
the preservation field in general, and in the application of digital
technologies to library preservation in particular. Through a series
of research grants and award-winning publications, Anne Kenney and
the other staff of the Library's Preservation Department have developed
much of the conceptual basis for our understanding of the place
of digital technologies in the conversion of analog material. Over
the last several years, she and Oya Y. Rieger have collaborated
on a number of research projects to advance Cornell's understanding
of digital preservation requirements. One of these projects, funded
by the Institute for Museum and Library Services, has resulted in
the development of a preservation strategy for Cornell's digital
image collections, including requirements for the establishment
of a central depository. A second research project, conducted in
collaboration with Mann Library staff and funded by the Council
on Library and Information Resources, assessed risks associated
with file format migration (Lawrence, 2000). Currently Kenney, Rieger
and other members of the Department of Preservation are collaborating
with colleagues in the Computer Science Department on Project PRISM,
a Digital Library Initiative, Phase Two project funded by the National
Science Foundation and other interested funding agencies. Project
PRISM is exploring issues around information integrity in the development
and growth of digital libraries, with particular emphasis on preservation
and security requirements. Cornell's growing expertise in digital
preservation is evidenced by staff appointments to important initiatives.
Anne Kenney, for example, has been named to the RLG/OCLC Working
Group on Attributes of a Digital Archive. Oya Rieger serves on the
Preservation Metadata Working Group of the RLG/DLF Task Force on
Policies and Practice for the Long-Term Retention of Digital Materials
and is co-chairing the NISO Technical Committee on Metadata for
Digital Still Images. Another library staff member, Peter Hirtle,
was a member of the seminal CPA/RLG Task Force on Digital Archiving
(Task Force, 1996) and is serving on the Institutional Records Working
Group of the RLG/DLF Task Force.
In
short, Cornell University has had extensive interest in and commitment
to the preservation of research library materials. The preservation
of "born digital" material is the next logical area for
Cornell to address. Our previous experience has uniquely positioned
Cornell to explore and implement an archival repository for electronic
journals.
Cornell's interest in electronic publication
Cornell
University has also developed a strong interest in electronic publication.
During the past two years, and with the support of The Andrew W.
Mellon Foundation, the Cornell University Library has planned and
undertaken a project in the electronic publication of mathematical
journals. Project Euclid is intended to serve the needs of both
mathematics publishers and mathematician end-users. We intend to
develop Euclid into a primary channel for the publication and exchange
of mathematical scholarship.
Project
Euclid demonstrates Cornell University's interest in issues surrounding
electronic publication. Project Euclid, however, is intended to
be an experiment in publication, not archiving. Nevertheless, the
policies, practices, and procedures that are developed as part of
Project Harvest may benefit the journals included in Project Euclid.
Because a long-term repository is a key component of the scholarly
exchange process in the online environment, such a repository, specifically
tailored to the requirements of mathematics publications, would
be a further service that we would like to offer those publishers
who make use of Euclid to publish their journals.
It
is in Cornell's interest, therefore, to support the creation of
a protocol for the creation and maintenance of an e-journal archive.
Such a protocol could be adapted for mathematics and other disciplines.
It would attract other mathematics publishers to the project who
are not publishing their journals through Euclid, and foster the
support of the Euclid project. In this manner, our currently-funded
investigation of new models for scholarly communication and the
proposed new investigation into methods by which we can provide
long-term access to electronic scholarly information can support
each other.
Cornell's leadership in the preservation of
agricultural literature
The
literature of agriculture is a natural body of materials for a test
of the preservation of electronic literature. For over a decade
the Cornell University Library, the land grant library for the state
of New York, has taken a leadership role in the preservation of
the literature of agriculture. Librarians at Cornell's Mann Library,
specializing in agricultural and life sciences, have worked with
other land grant university libraries and the National Agricultural
Library to establish a national preservation plan for agriculture
(Gwinn 1993). As part of the preservation plan, the information
components of the agricultural sciences were identified and primary
responsibilities for the coordination of preservation activities
were assigned to different institutions, as indicated in Figure
1. Cornell currently coordinates several parts of this ongoing preservation
work. Mary Ochs and Joy Paulson at Cornell University's Mann Library
manage on behalf of USAIN (United States Agricultural Information
Network) the NEH-funded cooperative microfilming project to preserve
state and county agricultural documents. Cornell also continues
to develop the Core Historical Literature of Agriculture digital
library. In September 2000, Cornell University received an IMLS
National Leadership Grant to digitize the core historical literature
of the related field of home economics. As part of the project,
Mann Library will implement the guidelines established in the IMLS-funded
preservation strategy project mentioned above. Part of the project
will experiment with the Endeavor Encompass software to increase
the interoperability of the home economics digital library with
other digital repositories at Cornell, such as Making of America.
Finally, the project will define a set of model workflows for capturing
metadata for access and preservation of digital materials.
Agricultural
literature is well-served by a carefully developed and coordinated
program, the National Preservation Program for Agricultural Literature.
The USAIN National Preservation Special Project Committee oversees
the plan, and Mary Ochs of Mann Library serves as a member of that
committee. As indicated in figure 1, preservation of electronic
publications in agriculture was not a significant component of the
plan when it was developed. To address one part of the changing
landscape of publishing, Cornell University co-sponsored with the
National Agricultural Library in March 1997 a meeting on government
electronic publications in agriculture. A product of that meeting
was the publication by Paul Uhlir, Project Consultant for the National
Research Council, of the "Framework for the Preservation of
and Permanent Public Access to USDA Digital Publications."
This framework was later adopted by the National Agricultural Library
in conjunction with the USDA Office of the Chief Information Officer.
No similar meeting for electronic journals in agriculture has yet
been held. The growing importance of e-journals in the field of
agriculture makes the secure archiving and long-term access to these
"born digital" files of central importance to the evolving
preservation plan. As Cornell's representative on the USAIN special
projects committee, Mary Ochs will inform the committee about the
work on this planning project and seek input from them. Connections
with USAIN and the agricultural library community offer links for
building trust within the user community.
The appeal of a subject-based approach
While
the absence of e-journals as an area of concern in the National
Agricultural Preservation Plan cries out as an area that needs to
be addressed, we are proposing to study the e-journals in agriculture
also because of the tactical advantages it provides us. An agricultural
subject-based archive would include journals from a wide variety
of publishers, with a wide variety of file formats and multiple
contractual agreements required. Appendix B contains a list of the
core journals in agriculture and represents a starting point in
the search for journals for possible inclusion in the archive. This
list of journals is based on the seven-volume bibliography, The
Literature of the Agricultural Sciences, edited by Wallace Olsen
of Cornell University's Mann Library, which identifies the core
journals in the seven sub-disciplines of agriculture (Olsen, 1991-1996).
Preliminary investigations (which will be expanded during the planning
year) suggest that as many as 75% of the journals are now available
in electronic form. As Appendix B indicates, many publishers are
represented, and a number of the journals already have in place
arrangements with nominal archival repositories such as JSTOR, OCLC's
ECO (Electronic Collections Online) project, or HighWire Press.
It is unclear, however, whether any of these repositories meet the
requirements outlined in the "Minimum criteria for an archival
repository of digital scholarly journals" developed by CLIR,
DLF, and CNI.
In
sum, there are a large number of agricultural journals that are
available in electronic form; these journals represent a high percentage
of the core serial literature in agriculture; they are produced
by a number of different publishers and publishing arrangements;
and some of the journals have a quasi-archival arrangement in place.
Taken together, the agricultural publishing arena offers the real
potential in the second phase of the project for the creation of
a large, varied, and robust e-journal repository that reflects much
of the diversity found in scholarly communication.
Planning year focus
We
heartily endorse the assertion in the "Minimum Criteria for
an Archival Repository of Digital Scholarly Journals" that
an archival repository that acts to preserve digital scholarly publications
must be a trusted party that conforms to certain minimum requirements
agreed to by both scholarly publishers and libraries. The most serious
challenges and impediments to the creation of an e-journal repository
are political: they have to do not with how the technology is designed,
but rather with how the essential stakeholders (publishers, libraries,
the scientific societies that support both, and the user) relate
and work with each other. Indeed, how they relate will to no small
extent determine how the technology evolves. What precisely is stored
in such a repository, how access to it is guaranteed, who owns it,
how and under what circumstances it is accessed, who authorizes
such access, how the entire operation is securely and regularly
funded--these and similar questions must be answered jointly by
the stakeholders before the building of a fail-safe repository can
commence. During the planning year, we will work with our target
publishers to formulate and develop provisional answers to these
basic business and technical questions. Negotiations with publishers
over the design, organization, and operation of the digital repository
will therefore be the primary activity during the planning year.
While
political issues may be the greatest challenge to the successful
implementation of an e-journal repository, serious technical challenges
also confront us. There are a number of technical issues that must
be identified and addressed in conjunction with the negotiation
with publishers. Some concern the nature of the e-journal archive
itself. Is it to be, for example, a fail-safe repository of last
resort whose contents are shaped by a desire to ensure the longest
possible lifespan, or should it try to offer the full range of functionality
found in the e-journal itself? Can the repository be built so that
both options are possible? Once the nature of the archive is defined,
what systems are to be used for the ingest, organization, maintenance,
migration, and delivery of the e-journal files? What is the place
of redundancy in the system? The development of a technical model
for the e-journal repository will be the second focus of the planning
year.
In
addition to our negotiations with publishers and the development
of a technical work plan, we will also use the planning year to
develop mechanisms for convincing the scholarly community of the
validity of the repository, explore organizational and staffing
models for any implemented repository, and explore long-term funding
options and growth plans for the repository.
Detailed work plan and budget
The
planning year will be divided into seven separate but related activities.
Work with publishers to develop an archiving
policy
It is our belief that effective archiving of electronic journals
can only be accomplished through a publisher/librarian partnership.
The Mellon planning grant would allow us to work with publishers
to establish a set of responsibilities for both Cornell, as the
archiving institution, and the publishers, as archival depositors.
Along with those responsibilities, we must establish conditions
for inclusion, including copyright clearance, that are broadly acceptable
to the publishers, but allow Cornell, as the archiving institution,
the flexibility to establish technical specifications and access
policies that serve users well. To be successful, these negotiations
must identify the benefits and drawbacks of the different configurations
of an e-journal archive and find an acceptable common ground for
all parties. This will then form the basis of the contracts with
publishers depositing files in the archive.
The
first step in the acquisitions plan would be to develop selection
criteria that will allow us to prioritize from the list of journals
in Appendix B which of the publishers we wish to ask to be part
of Project Harvest. Twelve publishers are obvious candidates with
which to work. These publishers issue a large number of the core
titles; they are prominent in the field (and hence likely to serve
as models for others); and they represent a wide variety of publishing
models, including both profit and non-profit. They include:
- Elsevier,
with 16 journals on the list
- Either
the Tri-Societies (American Society of Agronomy, Crop Science
Society of America, and Soil Science Society of America) or Springer,
which produces the journals
- The
National Research Council of Canada. At one point, their archiving
policy was to maintain material "until they ran out of room
on their server"
-
Annual reviews. This is available via Highwire, and offer a way
of working with another university
- University
of Chicago Press. Titles include Economic Development and Cultural
Change, International Journal of Plant Sciences, and American
Naturalist;
- American
Agricultural Economics Association
- Federation
of American Societies for Experimental Biology
- Cambridge
University Press, with 7 journals on our list
- Entomological
Society of America. We have already negotiated permission with
them to include several titles in the online Core Historical Literature
of Agriculture
- Kluwer,
with 5 titles on list
- Oxford
University Press, with 3 titles on list
- Blackwell
Science, with 9 titles on list
After
identifying the potential publisher partners, we will then ask a
pilot group to participate in Cornell's Project Harvest. From those
publishers expressing interest in participating, we would gather
a small development team to consider the issues outlined above.
The goal for the development team would be to create, through an
iterative process, a standard agreement for archival deposit. Topics
that would be identified in the agreement include:
- The
general responsibilities of the publishers and Cornell
- Characteristics
of the data, accompanying metadata, and any additional documentation
that are to be deposited
- Guidelines
on transmission methods and media for deposit
- Procedures
for the deposit
- Procedures
and protocols Cornell will use to verify the arrival and completeness
of the data
- Rights
of the depositing organizations to audit the repository
- The
respective roles, responsibilities, and rights of the Cornell
and the data producers with regard to the data
- Articulation
of Cornell's responsibilities and capabilities with regard to
the accessioning, description, management, and even transformation
of the deposited data
- Access
policies for users of the repository, and how they may vary over
time
- Conditions
on the use of the data, and again how they may vary over time
- Fees
(if any) associated with the deposit
- Cornell's
ability to share the data with partners to create an agreed-upon
level of redundancy
- Clarification
of issues surrounding copyright retained by authors
- Other
key issues defined by the development team
Assuming
the implementation phase of the project is funded, we anticipate
contacting all the publishers from the list in Appendix B to assess
their interest in participating in the project. Through the planning
process, we would need to determine the number of titles we can
handle in the first years of the project.
Cornell
has had experience with this type of negotiations. We worked with
68 publishers, including Elsevier, Kluwer, and others, to secure
rights to use material in TEEAL (The Essential Electronic Agricultural
Library) and The Core Historical Literature of Agriculture. In the
process of negotiations, staff members developed a standard agreement
similar in function (if not in content) to the agreement we are
proposing to develop for Project Harvest. In Project TEEAL, once
one major publisher agreed to the TEEAL basic agreement many other
publishers followed. We anticipate a similar development with Project
Harvest. Project Euclid, like TEEAL, has been built on a partnership
of publishers and librarians. In the case of Project Euclid, many
of the publishers are scientific societies, providing us with experience
in learning and understanding the concerns of a different group
of publishers. We would use the lessons we have learned from developing
Euclid in shaping the discussions for Project Harvest.
We
would also draw on the lessons others have learned in negotiating
with publishers. The CLIR/DLF draft model license found on the LIBLICENSE
web site at Yale University <http://www.library.yale.edu/~llicense/>,
for example, is a natural model on which to draw for our similar
effort to develop a model archiving agreement. The data depository
program of the Arts and Humanities Data Service <http://ahds.ac.uk/deposit/depintro.html>
will also provide information on what is needed for a digital archive
and what creators are likely to be willing to deposit
Investigate
how to ensure scholarly acceptance of the repository
The Cornell repository will only be successful if the scholarly
community is convinced that the journals deposited at Cornell will
remain accessible and readable over time. An important component
of the planning year therefore will be assessing how scholars feel
about e-journals and identifying methods to build trust in the community.
In
the matter of trust, Project Harvest is in a favored position. Mann
Library within the Cornell library system has had a long history
of preserving and making available to the scholarly community the
core literature of agriculture. An ongoing and significant electronic
initiative is the USDA Economics and Statistics System with its
statistical and textual reports from the Agriculture Department's
Economic Research Service, National Agricultural Statistics Service,
and World Agricultural Outlook Board. Scholars know that Cornell
has a vested interest in the preservation of the literature of agriculture,
making this project mission-driven, rather than external to the
overall goals of the institution.
Given the confidence that the university already enjoys with publishers,
librarians, and scholars, some in the scholarly community may be
willing to accept whatever the university proposes to do just because
it comes from Cornell. However, it will also be important to develop
formal methods of representing the organizational and technical
competencies Cornell plans to build during the course of Project
Harvest. To meet this need, the project team will develop a plan
to outline the organizational and technical components of the repository.
We assume that the success of the journal deposit system developed
during the course of the project will be heavily dependent on the
reliability and credibility of the organizational and technical
work plan. We will convince the repository's customers that materials
in the repository are in good hands by articulating for them our
plans for the building, maintenance, and management of the repository.
One
component part of our information campaign will be to develop a
mission statement for Project Harvest that can be shared with the
appropriate scholarly communities. The mission statement will include
the information recommended by the "Minimum criteria for an
archival repository of digital scholarly journals," including
the scope and nature of the materials to be included in the repository,
the strategy and methods we will adopt to attract materials, and
the user community we hope to serve.
A
second means of building scholarly acceptance of Project Harvest
will be to ensure that the archive conforms to generally accepted
standards for digital repositories. One of the recommendations of
the highly influential report of the Task Force on Archiving of
Digital Information was that standards and criteria for the certification
of digital information repositories be developed. Several national
and international projects are exploring the process and methodology
in defining the requirements for a certified repository. Among the
key initiatives are:
·
During the October 1999 ISO Archiving Workshop Series, certification
of archives (specifically within the framework of the "Reference
Model for an Open Archival Information System" (OAIS)) was
one of the key areas for workshop focus and possible standardization
efforts.
· The upcoming Preservation 2000 conference, which is sponsored
by the UK's Cedars Project, RLG, and OCLC, will provide a platform
to continue the discussion of criteria for certification at an international
level.
· In March 2000 the Research Libraries Group (RLG) and the
Online Computer Library Center (OCLC) announced that they will cooperate
to create infrastructures for digital archiving. One of their goals
is to establish best practices and document the attributes of digital
archives for research repositories.
The
Library's Digital Imaging and Preservation Research Unit is an active
participant in all of these initiatives. During the planning phase
of Project Harvest, the staff will closely monitor this and related
work in the certification of repositories and will actively contribute
to them by sharing Cornell's empirical experience. As certification
standards emerge, Cornell will publicize our adherence to the standards
as one more way of ensuring trust in the user community.
Develop
a technical model for the repository
Cornell will invest in a five-pronged effort that will focus on:
1) establishing a baseline of e-journal software and file format
needs; 2) specifying the archival repository; 3) specifying monitoring
tools that will flag documents within the repository that require
migration; 4) specifying a baseline hardware and software infrastructure
to house the repository; and 5) exploring the need and implementation
models for redundancy in the repository.
1)
Establish a baseline of formats and related software.
Cornell will inventory file formats and software in use today
to store and manage e-journals in agriculture. We will collect
conversion routines that permit modifying these formats. We will
explore whether there is one "least common denominator"
format that has minimum software dependencies, and that can be
used to create one parallel copy of each journal in that format.
Whether or not there is such a format, we will also look at how
we might maintain the formats in use in the current live system.
One area we want to explore in particular is whether we can maintain
both systems: a system with high functionality based on current
software as well as one based on a more limited, but likely more
enduring, format.
2)
Specify the archival repository.
Cornell will investigate potential architectures and design criteria
for the archive repository, and will choose an approach that is
the essence of simplicity. The repository will be based on the
OAIS reference model and compliant with Open Archives Initiative
protocols and other initiatives in the subject domain of agriculture.
(Cornell is already planning to implement OAI protocols in Project
Euclid.) The repository model will provide for redundancy of instances.
The repository architecture needs to support establishing relationships
among the e-journal components without depending on specialized
software that is itself subject to technological obsolescence.
An example of a possible architecture would be one that relates
internal components based on sequence and naming conventions.
The repository files will contain metadata for each journal complying
with contemporary standards and files in multiple formats. It
will include at least the file format in common use for that journal
today and an additional "least common denominator" version,
as well as associated conversion software.
3)
Specify a monitoring system.
Cornell will specify a software application to manage the status
of each member of the repository. It will be a tool that includes
a record for each member of the repository with information needed
to establish its age, migration status, and technological dependencies
(standards, software, etc). This system will be used as a prediction
tool. Criteria will be fed to the system to identify changes in
standards or versions of software. The system will present specific
e-journals in the repository related to that criteria. These e-journals
will then require review to determine whether they need migration.
In
investigating and developing the specification of such a monitoring
system, Cornell will build upon its previous and current digital
preservation investigations. For example, the Risk Management of
Digital Information project, which was sponsored by CLIR, equipped
the library with a better understanding of the organizational and
technical threats that need to be monitored and controlled to ensure
the longevity of digital resources (report available at <http://www.clir.org/pubs/abstract/pub93abst.html>).
The library's current DLI2 project focuses on digital preservation.
Particularly relevant to this proposal is a Web profiling tool that
is being developed by the library's Digital Imaging and Preservation
Unit and the Cornell Computer Science Department. This web profiling
software will attempt to gather information on various characteristics
of digital resources to support digital preservation monitoring
and decision-making. This tool provides a technical background for
the development of the proposed assessment tool. Another library
project, sponsored by an IMLS grant, helped the library to develop
a better understanding of the role of preservation metadata in supporting
the long-term management of digital collections. The library is
developing guidelines for preservation procedures and metadata for
digital image collections that are to be deposited in a central
digital repository.
4)
Establish baseline hardware infrastructure.
Cornell will specify hardware with modular storage components to
accommodate massive growth in the amount of material stored and
identify an architecture for data and system backup that is automatic
and self-reporting. Reliability and redundancy of internal hardware
components, combined with growth and migration potential, will be
priority attributes in the hardware plan. Cornell will develop an
RFI to distribute to hardware vendors for their comment before the
end of the planning year.
5)
Investigate need for and approach to redundancy.
Along with the addition of new journals to the repository, there
is the possibility of mirroring and/or distributing some of the
repository functions to library collaborators. The land grant community
has strong ties and a history of cooperative preservation efforts.
Other institutions within the land grant community could provide
redundancy for the system Cornell develops, or they might duplicate
the procedures followed by Cornell with other publishers and subjects.
In either event, the workload would be shared among other committed
partners. During the planning year, we would want to explore further
the need for redundancy in the repository, and begin to work with
potential partners. Cornell is a partner in the LOCKSS program from
Stanford University and Highwire Press. LOCKSS - Lots of Copies
Keeps Stuff Safe - is intended to be a revolutionary, distributed
archiving model. We will want to see if any of the lessons learned
from the LOCKSS project can be applied to Project HARVEST.
Develop
acquisition and growth plans
During the planning year, we will develop a two-phased acquisition
and growth plan. The first phase will focus on the addition of journals
to the pilot agricultural repository. This work will continue during
the implementation phase. As new journals are published in the field
of agriculture, or as older journals become more important, publishers
could request to have a journal included in Project Harvest. Journals
could also be nominated, possibly by an advisory board of agricultural
scholars who would recommend whether to seek out that journal for
Project Harvest. We may also wish to work with the agricultural
library community to ensure that at least one print copy of all
e-journals that also have a printed manifestation is retained. This
process would be explored fully during the planning process.
More
importantly, the implementation phase would give us hard data on
how the pilot could be expanded to other disciplines and/or publishers
in a second phase. Our experience may indicate that future repositories
should be developed around a subject discipline, as with Project
Harvest. We may also find that while the subject approach proves
useful in the pilot phase when the primary task is negotiating a
general agreement with publishers (and Cornell's good relationship
with agricultural publishers makes this task possible), future repositories
would be better organized around publishers and their specific publishing
systems than by subject. One of the elements we will want to assess
during the planning year (and possibly after) is whether a subject-based
approach is appropriate for a repository, or whether we should use
the agreements we have developed with our agricultural publishing
partners as the basis for a general agreement regarding the deposit
of all of our partner's publications, regardless of subject matter.
Identify
an organizational and staffing model
We can already see that the project will require collaboration across
normal institutional boundaries. We are structuring the planning
phase so that it will be a cooperative project drawing on the expertise
found in Mann Library, the Preservation and Digital Libraries and
Information Technologies (DLIT) departments, and the Library's Institute
for Digital Collections (CIDC). Project Harvest will be overseen
in the planning phase by a steering committee consisting of representatives
from the Mann Library, DLIT, and the Preservation Department, with
the inclusion of a faculty member to represent the interest of users.
The
staffing model of the planning phase is based on the functional
activities suggested in the OAIS reference model. Staff will be
assigned to work in each of these four areas:
Submission
· identify and contact publishers seeking collaboration
· negotiate terms for submission, access, updates, and other
conditions
· plan future growth and acquisitions
· coordinate the role of Cornell in agricultural cooperative
preservation efforts
The
submission activities of the planning phase will be the primary
responsibility of the Collection Development unit in Mann Library.
They will be assisted by a working group drawn from the staff from
the Preservation Department and license librarians in Mann Library
and CUL Central Technical Services. Legal advice from the university's
General Counsel's office will be sought as appropriate when working
out the details of the contract.
Ingestion
· prepare data for archiving
· profile resources - identify characteristics
· chose standards, develop procedures
Planning
for ingest will be a collaborative effort between the Mann Library's
Information Technology Section and the Digital Library and Information
Technology division in the Cornell University Library system. A
minimum of one half FTE will work in this area and the subsequent
area. The work will be informed by the findings of the Submission
group and the preservation requirements identified by Preservation
Department staff, particularly in the area of standards.
Data Management, Archival Storage, and Access
· determine hardware and software needs
· conduct requirements analysis to determine system infrastructure
· design the archival system (both ingest and access components)
Again,
Mann Library's Information Technology Section and the Digital Library
and Information Technology division in the Cornell University Library
system will collaborate on the design of this aspect of the system.
Policy
Development
· facilitate the interaction of the different groups within
the library
· contribute to the development of criteria for the certification
of archival repositories
· develop economic models to ensure the long-term sustainability
of the repository
· work closely with the technology team and the collection
development team to develop strategies for standards, file formats
used, preservation metadata, preservation strategies, etc.
Staff
of the Preservation Department will take the lead in identifying
the policy framework for the project. Their investigations will
be tempered by the work of the Submission group and the technical
requirements identified by the Ingest and Data Administration groups.
Overall
policy will be approved by a Steering Committee for the project.
The Steering committee will be composed of senior administrators
in the library (the directors of Mann Library and the Digital Library
and Information Technology division, the Associate Director of the
Preservation Department, and the University Librarian serving as
PI) and one faculty member, representing the interests of some of
the users.
A
key question to explore during the planning year will be whether
digital repository functions can be absorbed within our existing
organizational model, or whether a new organizational unit that
cuts across current administrative, subject, and functional lines
is needed.
Negotiate
access policies for the prototype repository
Publishers have been unwilling in the past to maintain large print
archives of back issues of their journals. Often libraries hold
the only complete back-run of print titles. E-journals, while they
do not require large warehouses or library shelves for storage,
do require electronic storage space and maintenance that assures
the integrity of the digital content. It is unclear whether publishers
intend to maintain archives their own archives, but libraries are
requiring this assurance when they sign e-journal contracts. Many
e-journal publishers are relying on OCLC ECO (Electronic Collections
Online) for archiving, but OCLC cannot do it all, nor is the reliance
on one sole archive sound practice. Research libraries are, not
surprisingly, unwilling to discard print issues without long term
guarantees that e-journal files will be available.
This
planning grant would allow us to explore two major scenarios for
an e-journal archive, the "dark archive" and the "living
archive." The "dark archive" model creates an archive
where stored files would only be used in an emergency. This model
is similar to the model of storing microfilm in the National Underground
Storage facility. In order to minimize the cost of maintaining a
"dark archive," e-journal content might be converted on
ingest to some common, stable, minimal format (albeit with a concomitant
loss of functionality). A "living archive" of agricultural
scholarship, in contrast, would be modeled after JSTOR or OCLC and
would provide access to back files of e-journals that publishers
no longer wished to maintain or to which publishers are willing
to provide additional access. The publishers would of course still
be able to provide access to recent issues if they desired. As part
of the planning process, the development team would need to investigate
the staffing, contractual, economic and technical implications of
both options.
The
living archive presents the greater challenge in that publishers
may be less willing to allow open access to their material. The
development team would have to carefully evaluate the implications
of the various access policies on the publishers and the users.
Issues such as when files would be made available, mechanisms for
allowing access, and comparability with the original files, among
other issues, must be addressed.
Develop
a plan for the long-term funding of an e-journal repository
Libraries have traditionally assumed the cost of storing and preserving
paper copies of agricultural journals. While libraries may be willing
to absorb the cost of preserving electronic copies of the same publications,
it is more likely that a business model that can make the preservation
of e-journals self-sustaining must be found. During the planning
year, we will investigate several approaches for making the repository
economically self-sufficient over the long-term. This requires that
we account for the capital costs associated with building and expanding
the repository infrastructure over time. We must also account for
the operating costs associated with maintaining and providing access
to the repository. [Guthrie, 2000]
There
are several possible sources of funds that could be used to maintain
and grow the repository over time. They include:
-
agencies and foundations supportive of the need to preserve the
agricultural literature
- publishers,
who may be willing to pay on a per-journal basis the cost for
archiving the journal (perhaps by including an archival surcharge
with the electronic access surcharge common among major publishers)
- acquiring
free or reduced subscriptions from publishers in exchange for
archiving their journals
- charging
fees for access to the archival repository
The
last three options require the agreement and cooperation of the
publishers. Based on the results of the negotiations with them,
we anticipate being able to develop a business model that will indicate
how much, if anything, archiving agricultural literature will cost
Cornell University.
Project Staff
Project
Harvest will be a Library-wide effort. The following individuals
will play key roles in its implementation.
Sarah Thomas, University Librarian, will serve as Principal Investigator
of Project Harvest.
Peter
B. Hirtle, Co-Director, Cornell Institute for Digital Collections,
will serve as Project Coordinator.
Three
working groups will work directly with the Coordinator. Each will
be chair by a senior library staff member. Mary Ochs, Head, Collection
Development and Preservation at Mann Library, will chair the Publisher
Relations Group. Tim Lynch, Head, Information Technology Section
at Mann Library, will chair the Technical Design Group. Oya Y. Rieger,
Acting Assistant Director of Preservation for Digital Imaging and
Preservation Research, will chair the Preservation Policy Group.
A
Publisher Relations Specialist, Preservation Policy Advisor, and
Administrative Assistant will be hired to work with Cornell staff
on Project Harvest.
A
Steering Committee will be established to provide general oversight.
Anne R. Kenney, Co-Director, CIDC and Associate Director of the
Department of Preservation, will chair the Steering Committee. Other
members will include: Sarah Thomas, University Librarian, Janet
McCue, Director of Mann Library, and H. Thomas Hickerson, Associate
University Librarian for Digital Libraries, Information Technology
and Special Collections.
Citations
Guthrie,
Kevin. 2000. "Developing a Digital Preservation Strategy for
JSTOR, an interview with Kevin Guthrie." RLG DigiNews 4:4 (15
August 2000) <http://www.rlg.org/preserv/diginews/diginews4-4.html
- feature1>
Gwinn,
Nancy E. 1993. A national preservation program for agricultural
literature. S.l. : s.n.
Lawrence,
Gregory W., William R. Kehoe, Oya Y. Rieger, William H. Walters,
and Anne R. Kenney. 2000. Risk Management of Digital Information:
A File Format Investigation. Washington, D.C. : Council on Library
and Information Resources.
Olsen,
Wallace C., editor. 1991-1996. The Literature of the Agricultural
Sciences. Ithaca, N.Y. : Cornell University Press.
Task
Force on Archiving of Digital Information. 1996. Preserving digital
information: Report of the Task Force on Archiving of Digital Information.
Washington, D.C. : Commission on Preservation and Access.
Uhlir,
Paul. 1997. Framework for the preservation of and permanent public
access to USDA digital publications. S.l. : s.n.
Appendix A: Chronological Workplan
Note:
lead participants are identified in italics after each task.
(Prior
to start of project)
· Advertise and interview for project-funded positions: Publisher
relations specialist; Administrative support person (Project coordinator,
administrative staff)
· Identify space and equipment for new Project Harvest staff
(Project coordinator, Library administration)
Jan.
2001 - March 2001
· Hold Project Harvest organization meeting. Bring together
Project Harvest Team, Advisory Committee. Create mission statement
for the Project Harvest plan (Project leader, Project Harvest team)
· Develop selection criteria to allow prioritization of possible
partners (Publisher relations specialist, collection development
staff)
· Contact an initial group of potential partners to identify
partners interested in the problem (Publisher relations specialist,
collection development staff)
· Establish, based on the OAIS model and the "Minimum
criteria" what we feel are the ideal component parts of an
e-journal preservation system (Preservation policy advisor)
· Establish a baseline of formats and software used in pilot
e-journals (Publisher relations specialist, Technology design group)
· Advisory Committee will meet to review progress (Project
coordinator)
April
2001 - May 2001
· Hold negotiations with the pilot group of publishers on
the issues we have identified as core to a successful e-journal
archival policy (Publisher relations specialist, Preservation policy
advisor)
· Investigate potential architectures for e-journal repository
that are both open and compatible with the needs identified in the
negotiations with the publishers (Technology Design Group)
· Identify the organizational and staffing model the Library
would follow in implementing Project Harvest (Project leader, Project
Harvest team)
· Advisory Committee will meet to review progress (Project
coordinator)
June
2001 - July 2001
· Develop a model license agreement based on the results
of the negotiations with the pilot group of publishers (Project
coordinator, Publisher relations specialist, Preservation policy
advisor, Legal counsel)
· Contact additional publishers lower on the priority the
list in order to field test the license agreement. (Publisher relations
specialist)
· Specify a software application to manage the status of
each member of the repository (Technology Design Group)
· Advisory Committee will meet to review progress (Project
coordinator)
August
- October 2001
· Contact remainder of the publishers of the core journals
in agriculture to solicit interest in possible participation in
the project (Publisher relations specialist)
· Establish the baseline hardware needed to implement Project
Harvest (Technology Design Group)
· Investigate the place of redundancy in the archiving system
(Technology Design Group, Preservation policy advisor, Publisher
relations specialist)
· Given the needed technological and organizational environment,
develop a business model that can make Project Harvest financially
acceptable to the Library (Project Coordinator, Preservation policy
advisor, Publisher relations specialist)
· Advisory Committee will meet to review progress (Project
coordinator)
November
- December 2001
· Assuming a sustainable business model can be identified,
prepare a grant application for the implementation of Project Harvest
based on the findings of the previous year (Project Coordinator)
· Develop an RFP for the hardware and software needed to
implement Project Harvest in a manageable, scalable, fashion. The
RFP will be ready to distribute as soon as implementation funding
is received (Technology Design Group)
· Develop methods for representing the organizational and
technology competencies developed during the design of Project Harvest
to the scholarly and user communities (Preservation policy advisor,
Publisher relations specialist)
· Develop formal acquisition and growth plans to guide the
implementation of Project Harvest. The plan will determine how new
journals are to be added to the implementation (Publisher relations
specialist, Project Coordinator)
· Advisory Committee will meet to review progress (Project
coordinator)
Throughout
the course of the project:
· Share information about the design and implementation of
Project Harvest with relevant preservation and agricultural information
communities (Entire project team).
|
|