Digital Information Longevity Study
  Cornell
Library
Gateway
Cornell
Library
Catalog
 

 

Mission and Scope

Staff

Projects

Workshops and Tutorials

Products and Services

RLG Diginews

Resources

DIPPR

This funded project has ended, and this web site is no longer active or being updated. It is being retained for historical purposes only.

 

 

 

Digital Information Longevity Study
(Richard Entlich), 2001-2003

The Longevity Study approach to characterization and analysis of loss is described below:

Sample resource set

The universe of resources being analyzed consists of electronic journal titles included in the seven-year set of the Directory of Electronic Journals, Newsletters and Academic Discussion Lists, published by the Association of Research Libraries (ARL). From 1991-97, this annual directory chronicled the growth of electronic publishing from the early days of ASCII publications distributed via e-mail through the appearance of mainstream, scholarly journals on the World Wide Web.

The longevity study is using the original machine-readable files, generously provided by ARL, to analyze the fate of this diverse group of publications. We have consolidated seven years worth of files into a single database, eliminated duplicates and other anomalous data, and focused on the subset of titles classified by ARL as 'journals' rather than 'newsletters,' 'magazines' or 'zines.' Though ARL's categorization was somewhat lacking in consistency (for example, titles classified as journals were not necessarily scholarly or peer-reviewed), it still provides a reasonable basis for sampling, leaving a collection of about 1800 unique titles. These titles run the gamut from fairly obscure and iconoclastic self-publishing efforts to widely read and distributed commercial titles.

Analysis

The entire universe of titles is being analyzed to produce a profile of electronic publishing during the pivotal period of change in the size, sophistication, and usability of the Internet. Some profiles will come directly from the data reported in the directories, such as the means of distribution and file formats employed. This analysis will build on extend work already conducted by ARL (see http://dsej.arl.org/dsej/2000/mogge.html) . We are also examining the current status of sites previously used for storage and dissemination of journal content, using a high-speed Web-crawler to test a normalized set of URLs derived from the directory listings.

A smaller subset is being examined in much more depth in order to determine whether the journal content can still be found on the Internet, or in any other form. These titles, taken from a range of years within the published history of the ARL directories, will be profiled in great detail. In addition to documenting various levels of loss (from complete disappearance to minor encoding problems), we are also broadly characterizing each publication and looking for characteristics that seem to correlate with either increased or decreased vulnerability to loss. Though most of this data collection requires interpretation and is not suitable for automated collection, portions of each profile, such as complete enumeration of link status, use of authoring and programming tools, and MIME types is being carried out using a Web crawler.

An even smaller subset of titles will be subject to a further stage of analysis, adding richness to their profiles through interviews with publishers, editors and technology staff originally involved in their creation and dissemination. These interviews will attempt to better understand institutional practices, and the organizational, economic or political factors that may have influenced the preservation status of the title and which are not discernable through passive examination of existing Internet sites.

Additionally, we will attempt to assess not only the degree and causes of loss, but user perception of the value and impact of completely and partially lost content. What is the significance of the material that's been lost? How serious an obstacle to use and interpretation are problems that result in some loss of fidelity to the original presentation?

Finally we will examine the role and fate of some of the efforts that arose during the early to mid-1990s to bring some order to the chaos of early electronic publishing. What role did initiatives such as CICNet (an aggregator of electronic journals) and attempts by individual libraries to collect and catalog electronic journals have on their long-term availability? What lessons can be learned from their success or failure?

Outcomes

The longevity study will add to our understanding of the life-cycle of electronic publications and the factors that influence the survival of their content. Though the focus on electronic journals may at first seem limiting, the titles chosen for analysis cover a wide range of issues and presentation styles, and incorporate a variety of textual, audio and pictorial content. They also come from a particularly volatile period of Internet development, and should shed some light on the impact of rapid technological change on the survival of machine-readable content. Most significantly, however, we expect to begin quantifying both the extent and importance of information loss and move beyond what has primarily been an anecdotal and speculative enterprise to one based on a more formal analysis.


 

Webmaster

  IRIS | D-LIT | Digital Preservation Officer | Digital | Common Depository System | CDIC
Feedback  
© 2002 Cornell University Library Instruction, Research, and Information Services