CUL Prism Research Components and Related Work
  Cornell
Library
Gateway
Cornell
Library
Catalog
 

 

Mission and Scope

Staff

Projects

Workshops and Tutorials

Products and Services

RLG Diginews

Resources

DIPPR

 

 

 

Preservation Risk Management for Web Resources
(CUL and Computer Science Department Project Prism Research Teams), 2001 -2003

The main thrust of the CUL portion of Project Prism is to characterize the nature of preservation risks in the Web environment, develop a risk management methodology for establishing a preservation monitoring and evaluation program, and create management tools and policies for virtual remote control. The approach will demonstrate how Web crawlers and other automated tools and utilities can be used to identify and quantify risks; to implement appropriate and effective measures to prevent, mitigate, recover from damage to and loss of Web-based assets; and to support post-event remediation. Project Prism is producing a framework for developing an ongoing comprehensive monitoring program that is scalable, extensible, and cost effective.

See the January 2002 issue of DLib Magazine for an overview of approach entitled: "Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism" by Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette.

Additional Risk Management resources.

Digital Information Longevity Study
(Richard Entlich), 2001-2003
Since Web technology was introduced in the early 1990s, Web sites have had a huge impact on the Internet and the ways in which organizations and individuals can transmit and receive information. The functionality and range of formats available to Web designers has grown dramatically. The longevity study is using quantitative and qualitative methods to evaluate a sample of online electronic journals that were included in the Directory of Electronic Journals, Newsletters and Academic Discussion Lists, published by the Association of Research Libraries (ARL) from 1991-97. The longevity study is looking at technological change, as well as organizational, administrative, and economic issues that may have had affected the persistence of the resources. The study is identifying the nature and causes of loss when possible and the characteristics that may have enabled their survival. For example, the impact of the shift from FTP and Gopher to Web sites and the current shift from HTML to various pure and hybrid forms of XML have implications for Internet resources of enduring value. See the longevity study methodology and results for more information.

Risk Management Experiments for Web Resources
(CUL and Computer Science Project Prism Research Teams), 2001-2003
The complexity of Web documents has increased dramatically as Web site content has shifted from stand-alone HTML files displayed in a browser to clusters of files that contain text, images, video, etc., and provide access to database content (the so-called "deep Web") or downloadable files. Such innovations make these resources substantially harder to manage, especially in the highly distributed organizational environment of the Web. The Prism experiments combine automated and manual tools and techniques to identify and counter risks to Web resources. Automated approaches primarily use a powerful Web crawler to capture instances of selected Web pages and track changes in those pages through iterative captures. The Web crawler supports the analysis of selected characteristics of target Web sites and pages. We are also using Web site management tools to assess the health of the servers upon which Web sites reside. The data captured in the experiments supports and drives the risk analysis to develop the policies, procedures and practices to develop the risk management program for Web resources discussed above.

Related Project Prism work

Event-based Metadata Research
In the Harmony project, Cornell researchers are helping to develop an event-based metadata model that is expressible and searchable in RDF. Such a model may make it possible to precisely record state transitions and transformations of digital resources, an approach that may prove to be critical for preserving documents in the fluid environment of the Web.

See the main Project Prism Web site for information on the full set of Prism research projects.



 

Webmaster

  IRIS | D-LIT | Digital Preservation Officer | Digital | Common Depository System | CDIC
Feedback  
© 2002 Cornell University Library Instruction, Research, and Information Services