|
Digital Information
Longevity Study
(Richard Entlich), 2001-2003
The Longevity Study
approach to characterization and analysis of loss is described below:
Sample resource
set
The universe of resources
being analyzed consists of electronic journal titles included in the seven-year
set of the Directory of Electronic Journals, Newsletters and Academic
Discussion Lists, published by the Association of Research Libraries
(ARL). From 1991-97, this annual directory chronicled the growth of electronic
publishing from the early days of ASCII publications distributed via e-mail
through the appearance of mainstream, scholarly journals on the World
Wide Web.
The longevity study
is using the original machine-readable files, generously provided by ARL,
to analyze the fate of this diverse group of publications. We have consolidated
seven years worth of files into a single database, eliminated duplicates
and other anomalous data, and focused on the subset of titles classified
by ARL as 'journals' rather than 'newsletters,' 'magazines' or 'zines.'
Though ARL's categorization was somewhat lacking in consistency (for example,
titles classified as journals were not necessarily scholarly or peer-reviewed),
it still provides a reasonable basis for sampling, leaving a collection
of about 1800 unique titles. These titles run the gamut from fairly obscure
and iconoclastic self-publishing efforts to widely read and distributed
commercial titles.
Analysis
The entire universe
of titles is being analyzed to produce a profile of electronic publishing
during the pivotal period of change in the size, sophistication, and usability
of the Internet. Some profiles will come directly from the data reported
in the directories, such as the means of distribution and file formats
employed. This analysis will build on extend work already conducted by
ARL (see
http://dsej.arl.org/dsej/2000/mogge.html) . We are also examining
the current status of sites previously used for storage and dissemination
of journal content, using a high-speed Web-crawler to test a normalized
set of URLs derived from the directory listings.
A smaller subset is
being examined in much more depth in order to determine whether the journal
content can still be found on the Internet, or in any other form. These
titles, taken from a range of years within the published history of the
ARL directories, will be profiled in great detail. In addition to documenting
various levels of loss (from complete disappearance to minor encoding
problems), we are also broadly characterizing each publication and looking
for characteristics that seem to correlate with either increased or decreased
vulnerability to loss. Though most of this data collection requires interpretation
and is not suitable for automated collection, portions of each profile,
such as complete enumeration of link status, use of authoring and programming
tools, and MIME types is being carried out using a Web crawler.
An even smaller subset
of titles will be subject to a further stage of analysis, adding richness
to their profiles through interviews with publishers, editors and technology
staff originally involved in their creation and dissemination. These interviews
will attempt to better understand institutional practices, and the organizational,
economic or political factors that may have influenced the preservation
status of the title and which are not discernable through passive examination
of existing Internet sites.
Additionally, we will
attempt to assess not only the degree and causes of loss, but user perception
of the value and impact of completely and partially lost content. What
is the significance of the material that's been lost? How serious an obstacle
to use and interpretation are problems that result in some loss of fidelity
to the original presentation?
Finally we will examine
the role and fate of some of the efforts that arose during the early to
mid-1990s to bring some order to the chaos of early electronic publishing.
What role did initiatives such as CICNet (an aggregator of electronic
journals) and attempts by individual libraries to collect and catalog
electronic journals have on their long-term availability? What lessons
can be learned from their success or failure?
Outcomes
The longevity study
will add to our understanding of the life-cycle of electronic publications
and the factors that influence the survival of their content. Though the
focus on electronic journals may at first seem limiting, the titles chosen
for analysis cover a wide range of issues and presentation styles, and
incorporate a variety of textual, audio and pictorial content. They also
come from a particularly volatile period of Internet development, and
should shed some light on the impact of rapid technological change on
the survival of machine-readable content. Most significantly, however,
we expect to begin quantifying both the extent and importance of information
loss and move beyond what has primarily been an anecdotal and speculative
enterprise to one based on a more formal analysis.
|