|
Preservation Risk
Management for Web Resources
(CUL and Computer Science Department Project Prism Research Teams),
2001 -2003
The main thrust of
the CUL portion of Project Prism is to characterize the nature of preservation
risks in the Web environment, develop a risk management methodology for
establishing a preservation monitoring and evaluation program, and create
management tools and policies for virtual remote control. The approach
will demonstrate how Web crawlers and other automated tools and utilities
can be used to identify and quantify risks; to implement appropriate and
effective measures to prevent, mitigate, recover from damage to and loss
of Web-based assets; and to support post-event remediation. Project Prism
is producing a framework for developing an ongoing comprehensive monitoring
program that is scalable, extensible, and cost effective.
See the January
2002 issue of DLib Magazine for an overview of approach entitled:
"Preservation Risk Management for Web Resources: Virtual Remote Control
in Cornell's Project Prism" by Anne R. Kenney, Nancy Y. McGovern,
Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette.
Additional Risk Management
resources.

Digital
Information Longevity Study
(Richard Entlich), 2001-2003
Since Web technology was introduced in the early 1990s, Web sites have
had a huge impact on the Internet and the ways in which organizations
and individuals can transmit and receive information. The functionality
and range of formats available to Web designers has grown dramatically.
The longevity study is using quantitative and qualitative methods to evaluate
a sample of online electronic journals that were included in the Directory
of Electronic Journals, Newsletters and Academic Discussion Lists,
published by the Association of Research Libraries (ARL) from 1991-97.
The longevity study is looking at technological change, as well as organizational,
administrative, and economic issues that may have had affected the persistence
of the resources. The study is identifying the nature and causes of loss
when possible and the characteristics that may have enabled their survival.
For example, the impact of the shift from FTP and Gopher to Web sites
and the current shift from HTML to various pure and hybrid forms of XML
have implications for Internet resources of enduring value. See the longevity
study methodology
and results for more information.

Risk Management
Experiments for Web Resources
(CUL and Computer Science Project Prism Research Teams), 2001-2003
The complexity of Web documents has increased dramatically as Web site
content has shifted from stand-alone HTML files displayed in a browser
to clusters of files that contain text, images, video, etc., and provide
access to database content (the so-called "deep Web") or downloadable
files. Such innovations make these resources substantially harder to manage,
especially in the highly distributed organizational environment of the
Web. The Prism experiments combine automated and manual tools and techniques
to identify and counter risks to Web resources. Automated approaches primarily
use a powerful Web crawler to capture instances of selected Web pages
and track changes in those pages through iterative captures. The Web crawler
supports the analysis of selected characteristics of target Web sites
and pages. We are also using Web site management tools to assess the health
of the servers upon which Web sites reside. The data captured in the experiments
supports and drives the risk analysis to develop the policies, procedures
and practices to develop the risk management program for Web resources
discussed above.

Related Project
Prism work
Event-based Metadata
Research
In the Harmony
project, Cornell researchers are helping to develop an event-based
metadata model that is expressible and searchable in RDF. Such a model
may make it possible to precisely record state transitions and transformations
of digital resources, an approach that may prove to be critical for preserving
documents in the fluid environment of the Web.
See the main Project
Prism Web site for information on the full set of Prism research
projects.
|