Cornell’s Project Harvest
CNI Fall 2001 Task Force Meeting
Anne R. Kenney and Nancy Y. McGovern

Project Harvest Overview
Subject-based  approach: agriculture
National Preservation Plan
USAIN
Mann Library
Core Historical Literature
TEEAL
USDA
75% of core journals now available in electronic form

Focus of Planning Year
Investigating conditions under which publishers willing to participate in the development of an Subject-Based Digital Archives (SBDA)
Two pronged iterative cycle:
Explore (potential of SBDA, business model, broader preservation matrix)
Build (using agriculture as pragmatic application)

PBDA

SBDA

Intersection of Digital Archives

Slide 7

USAIN Survey
Access
45% indicated need for both print and electronic
55% indicated e-journal already substituted for print;
84% would cancel print if reliable archives built
JSTOR study – 78% of faculty think hard copy should be retained even if reliable digital archives

USAIN Survey
Observed loss in e-journals:
45% don’t know
22% yes noted difference
22% no, no difference
What to preserve (priority order):
1. Preserve content plus journal “look and feel” plus publisher functionality
2. Preserve content plus journal “look and feel”
How to preserve:
Over 90% rejected single solution; prefer multiple custodians or 3rd party

Sept. 6 Publishers’ Meeting
American Dairy Science
Academic/Elsevier
American Phytopathological Society
BioOne
CABI
NRC-Canada
Wiley
NLA and USAIN representation

What’s the Publisher Incentive to Archive?
Protect assets, continuing value of material as it ages
Low additional overhead
Satisfy customers
Risk tolerance; sustainable loss
As calling card for or bi-product of services

Meeting Results
All publishers intend to establish archives
Shift from content currency to database development
Publishers see revenue stream in retrospective holdings
Publishers less concerned than librarians about “artifactual” archiving

Meeting Results
Differing perceptions around who should do digital preservation
Librarians want trusted third-party archiving
Publishers insufficiently aware that others don’t trust them to safeguard materials and insufficiently aware of what it takes to archive
Distrust of government (competition)

Meeting Results
Publishers not enthusiastic about “lit” archives—some would consider it if revenue returned to publisher
Convergence in formats
Reluctance to force authors to conform
Unwilling to share proprietary publisher DTD
Willing to consider archival DTD as another output

Trigger Events
None acknowledged by publishers
Technology watersheds:
Retrofitting legacy digital files
When paper no longer represents access and preservation alternative for electronic

SBDA triggers
Different subject domains have different half-lives
When common interests outweigh individual interests
Stakeholder pressure: when detrimental not to participate

Access and Funding
Publishers and librarians went into the meeting presuming different things
Publishers differed on access issues
Librarians asserted that publishers would have to finance dark archives

SBDA Distinguishes Between Metadata and Data
Dark metadata/dark data
Light metadata/light data
Light metadata/dark data
Light metadata/no data
Multiple options for different publishers and audiences

SBDA Hybrid Model
Ultimate goal is lightness
Comprehensiveness and buy-in trumps lightness
Commonality over distinctiveness emphasized
Hybrid model enables combinations of light to dark metadata and data
Access to metadata/data will change over time and in response to particular circumstances
Offers win/win possibilities

Possible Sustainability Models
Preservation surcharge on subscription
Preservation endowment
Bartered access privileges for preservation
Business insurance policy model
Government support

Slide 21

Possible Sustainability Models
Develop new markets
Harness the free riders
Charge for services, not content and archiving
Build value-adds on the SBDA

Next Steps
Developing subject domain profile
Surveying agricultural publishers to determine level of cooperation in SBDA
Evaluating existing architectural models
Writing CLIR report on the significance of the SBDA

Subject-based Profile
Who are the stakeholders?  How many publishers? Research demographics of new user groups?
How big is the field?  How structured  and defined is it? What’s important? Why? Change driven by discipline and by technology
How standardized is the literature? (xml, etc)
How complex/fixed is it? (database, virtual)
Who owns rights for re-use? Assessment of economic, first-use, citations, second use, technology

How Willing to Cooperate?
Pre- and post-competitive collaboration
Standardized, normalized, and limited number of formats
Preservation from conception (requirements of authors; shut off point for non cooperation)
Archival DTD
Preservation metadata

How Willing to Cooperate?
Self certification/ external certification
Light (and common) metadata, move toward light data (monitoring with scheduling)
Economy of scale
Willing to financially support the effort