Notes
Outline
Cornell’s Project Harvest
CNI Fall 2001 Task Force Meeting
Anne R. Kenney and Nancy Y. McGovern
Project Harvest Overview
Subject-based  approach: agriculture
National Preservation Plan
USAIN
Mann Library
Core Historical Literature
TEEAL
USDA
75% of core journals now available in electronic form
Focus of Planning Year
Investigating conditions under which publishers willing to participate in the development of an Subject-Based Digital Archives (SBDA)
Two pronged iterative cycle:
Explore (potential of SBDA, business model, broader preservation matrix)
Build (using agriculture as pragmatic application)
PBDA
SBDA
Intersection of Digital Archives
Slide 7
USAIN Survey
Access
45% indicated need for both print and electronic
55% indicated e-journal already substituted for print;
84% would cancel print if reliable archives built
JSTOR study – 78% of faculty think hard copy should be retained even if reliable digital archives
USAIN Survey
Observed loss in e-journals:
45% don’t know
22% yes noted difference
22% no, no difference
What to preserve (priority order):
1. Preserve content plus journal “look and feel” plus publisher functionality
2. Preserve content plus journal “look and feel”
How to preserve:
Over 90% rejected single solution; prefer multiple custodians or 3rd party
Sept. 6 Publishers’ Meeting
American Dairy Science
Academic/Elsevier
American Phytopathological Society
BioOne
CABI
NRC-Canada
Wiley
NLA and USAIN representation
What’s the Publisher Incentive to Archive?
Protect assets, continuing value of material as it ages
Low additional overhead
Satisfy customers
Risk tolerance; sustainable loss
As calling card for or bi-product of services
Meeting Results
All publishers intend to establish archives
Shift from content currency to database development
Publishers see revenue stream in retrospective holdings
Publishers less concerned than librarians about “artifactual” archiving
Meeting Results
Differing perceptions around who should do digital preservation
Librarians want trusted third-party archiving
Publishers insufficiently aware that others don’t trust them to safeguard materials and insufficiently aware of what it takes to archive
Distrust of government (competition)
Meeting Results
Publishers not enthusiastic about “lit” archives—some would consider it if revenue returned to publisher
Convergence in formats
Reluctance to force authors to conform
Unwilling to share proprietary publisher DTD
Willing to consider archival DTD as another output
Trigger Events
None acknowledged by publishers
Technology watersheds:
Retrofitting legacy digital files
When paper no longer represents access and preservation alternative for electronic
SBDA triggers
Different subject domains have different half-lives
When common interests outweigh individual interests
Stakeholder pressure: when detrimental not to participate
Access and Funding
Publishers and librarians went into the meeting presuming different things
Publishers differed on access issues
Librarians asserted that publishers would have to finance dark archives
SBDA Distinguishes Between Metadata and Data
Dark metadata/dark data
Light metadata/light data
Light metadata/dark data
Light metadata/no data
Multiple options for different publishers and audiences
SBDA Hybrid Model
Ultimate goal is lightness
Comprehensiveness and buy-in trumps lightness
Commonality over distinctiveness emphasized
Hybrid model enables combinations of light to dark metadata and data
Access to metadata/data will change over time and in response to particular circumstances
Offers win/win possibilities
Possible Sustainability Models
Preservation surcharge on subscription
Preservation endowment
Bartered access privileges for preservation
Business insurance policy model
Government support
Slide 21
Possible Sustainability Models
Develop new markets
Harness the free riders
Charge for services, not content and archiving
Build value-adds on the SBDA
Next Steps
Developing subject domain profile
Surveying agricultural publishers to determine level of cooperation in SBDA
Evaluating existing architectural models
Writing CLIR report on the significance of the SBDA
Subject-based Profile
Who are the stakeholders?  How many publishers? Research demographics of new user groups?
How big is the field?  How structured  and defined is it? What’s important? Why? Change driven by discipline and by technology
How standardized is the literature? (xml, etc)
How complex/fixed is it? (database, virtual)
Who owns rights for re-use? Assessment of economic, first-use, citations, second use, technology
How Willing to Cooperate?
Pre- and post-competitive collaboration
Standardized, normalized, and limited number of formats
Preservation from conception (requirements of authors; shut off point for non cooperation)
Archival DTD
Preservation metadata
How Willing to Cooperate?
Self certification/ external certification
Light (and common) metadata, move toward light data (monitoring with scheduling)
Economy of scale
Willing to financially support the effort