A collaborative project of the University of Michigan Library, Cornell University Library,
and the State and University Library Göttingen

PROJECT DESCRIPTION


Interoperable Access to Multiple Systems across a Broad Discipline


The State and University Library Göttingen, Cornell University Library, and the University of Michigan Library propose the creation of a digital library of historical mathematical literature, enhancing already rich, standards-based digital library systems at each of the institutions with mechanisms for interoperability. Together, the three institutions will digitize or enhance already digitized resources to build a combined collection of nearly 2,000 volumes of mathematical literature from the 19th and early 20th century, as well as of dissertations and related materials. The three institutions commit themselves to ongoing support of the system and collection. Using robust digital library systems from each of the institutions, access will be enhanced by introducing an interoperability layer that capitalizes on recent developments in the Dienst protocol in order to produce a unified “virtual” collection.

Research libraries have been undertaking important digitization projects for several years, and we believe that these collections are underutilized because the information—even within a single discipline—is distributed throughout a number of different repositories, and is thus fragmented. Through our separate efforts, many large, unique, and essential collections are available online to benefit students and researchers. Historical literature is being digitized by many different institutions and is accessed in unique systems at those separate institutions, each offering a different user interface, a different navigational methodology, and various levels of functionality. Only in rare cases are these collections linked. Consequently, users have difficulty both in discovering and using this literature online. They must discover (and maintain a knowledge of) the different repositories, and they must become proficient in using the different systems. Users must also contend with a lack of efficient, distributed printing options, a problem made especially significant with book-length materials, especially when these books are otherwise out-of-print or inaccessible.

Digital library efforts in research libraries have long recognized the need to develop interoperable mechanisms, but have found no viable options for the relatively complex, standards-based material being created. Work in the Digital Library Federation, for example, has focused on interoperability for several years, without identifying satisfactory mechanisms to achieve this end. And while it is certainly unrealistic to assume that different institutions, with different needs, resources, and users, will adopt a common access and delivery system, there is a clear sense of the need for mechanisms that heterogeneous systems can use to communicate with each other. By defining and implementing an effective level of interoperability, we can aid users by creating “meta-repositories” and uniform access mechanisms for focused subject collections such as that proposed here.

Libraries and librarians have a demonstrated a fundamental interest in the development of mechanisms to provide users with unified access to resources; similarly, they have shown a strong and continuing interest in building digital libraries with long-term viability. The goal of this project is to contribute to the fundamental knowledge required to achieve meaningful interoperability. In the course of the project, we will develop a mechanism capable of unifying standards-based digital materials from a single discipline (i.e., mathematics) distributed among different access systems at different research institutions. After a thorough testing and evaluation, the system will be maintained by the three libraries to allow users to access and exploit these collections in a new and more efficient way, and the results of our work (i.e., an interoperability layer for each library’s digital library system) will be shared broadly.

Partnership of Three Institutions

As a focus for our efforts in the area of interoperability, we will concentrate on historical monographs in mathematics, an area for which the three participating institutions have extraordinary collection strengths and digital library activities.

State and University Library Göttingen

Context

Since the founding of the Library in 1734-37, mathematics has been one of the core collections of the Library and has always been supported by extra funds of the State and generous gifts of famous scientists of the University. For example the splendid scientific library of Gauss is now part of the library, and the mathematicians Klein and Hilbert have always requested special funds for the mathematical collections of the central library during their negotiations with the Preussian Ministry. Springer Verlag is one of several publishers that supported the Göttingen Library in previous years, and as a consequence, the Göttingen mathematical collection is now the strongest one in Europe. The central library holds approximately 60,000 mathematical monographs and about 1,100 current mathematical journal titles, as well as older journal volumes, mathematical preprints, and mathematical dissertations. In addition to the large collections in the Central Library, Göttingen holds the famous Mathematical Library of the Mathematical Department, founded as the “Mathematische Lesezimmer” by Felix Klein, and these volumes too can be used for digitization purposes.

Digital Collection

The Göttingen State and University Library began its digitization of mathematical literature in 1997. Project funding from ERAM (Electronic Research Archive for Mathematics), along with support from the DFG, made it possible to convert the journal “Jahrbuch über die Fortschritte der Mathematik” (1868- 1943) into a database. The ERAM project began in 1998 and is funded through 2003. During the first years of the project, we implemented the “Jahrbuch” and have now completed 80% of the conversion. The ultimate aim of the project is to create a digital library for mathematics, storing the most relevant publications from each period electronically. This full-text portion of the archive will be created over the next six years and will contain approximately 1,200,000 pages of mathematical literature. The titles to be digitized will be chosen by mathematicians who are reviewing parts of the “Jahrbuch über die Fortschritte der Mathematik.” Their criteria will result in all types of mathematical literature being digitized, including monographs, journal articles, dissertations and conference proceedings. Several hundred monographs from the ERAM project will be part of the distributed digital library of mathematical monographs.

University of Michigan

Context

The Michigan and Cornell mathematical collections are among the strongest in the United States, a strength long recognized by the mathematics community and reflected in the Research Libraries Group collections “Conspectus.” The mathematics collection at the University of Michigan is one of the earliest founded collections at the university and one of the most comprehensively developed. In the late 19th and early 20th centuries, the library received many gifts of rare and core mathematics books and journals from university faculty who traveled to Europe to collect mathematics literature. The library established many long-standing exchange and gift agreements with mathematics institutions around the world. The university has traditionally provided generous funding for the mathematics collection. The strength of the mathematics collection was a factor in Mathematical Reviews’ decision to move their offices to Ann Arbor in 1964, from Providence, Rhode Island and Brown University. The library currently holds approximately 33,000 bound mathematics serial volumes, and 32,000 mathematics monographs. These numbers do not include the mathematics titles held in the University Library Special Collections Library or in its engineering collection. Many of the key works in the development of non-Euclidean Geometry, as outlined in histories and bibliographies of non-Euclidean geometry, are held by the University of Michigan Library.

Digital Collection

The University of Michigan will fund the digitization of a thousand monographic volumes in mathematics and will contribute this material for the purposes of the proposed project. A tentative list of items has been prepared and is being reviewed by scholars in appropriate fields. The mathematics monographs suggested for digitization share the following characteristics:


• Held by the University of Michigan University Library
• Published between 1803 and 1923
• Printed on brittle paper (many volumes identified as brittle during mass-deacidification of mathematics collection, 1999-2000)
• Not currently available in digital form
• Printed in English, French, German, Dutch, Russian, Spanish, or Italian
• Are in bound book format
• Works of mathematicians who contributed to the development of non-Euclidean geometry (authors are included in the Bibliography of Non-Euclidean Geometry by D.M.Y. Sommerville, 2nd edition, Chelsea Publishing Company)

Although authors in the Michigan list are all in the bibliography of non-Euclidean geometry by D.M.Y. Sommerville, this collection is much broader than that suggests. The collection of books includes the collected works of many of the most influential mathematicians of the 19th century: A. Cayley, P. G. L. Dirichlet, E. Galois, H. Grassmann, C. G. J. Jacobi, J. L. Lagrange, L. Kronecker, B. Riemann, J. J. Sylvester, and K. Weierstrass, among others. There are also books of Euler, Gauss, Hadamard, Hermite, Hilbert, Kelvin, Klein, Legendre, Leibniz, Lie, Plücker, Poincaré, and Stokes. In addition the collection contains a large number of turn of the century European theses and other mathematical publications that had small print runs and that are currently extremely difficult to find. The ready electronic availability of this collection will have a significant impact on the work of both mathematicians and historians of mathematics. Prior to the proposed funding, work will continue to refine the list by soliciting feedback from scholars primarily at Michigan and Cornell, and ultimately the list will be made publicly available for broader review.

Cornell University

Context

The Cornell Mathematics Library dates from the founding of the university and the collection was enhanced with the purchase of the Kelly Collection in 1871 through Ezra Cornell’s personal intervention. Through the remainder of the 19th and early 20th century the collection grew by the addition of several other collections, and then was complemented by Mathematics Department and University Library purchases, producing a very comprehensive collection. In 1953 the Math Library became a fully integrated part the University Library. Close personal involvement of the faculty remains very important. The Mathematics Faculty Library Endowment, started in 1990, has by direct gifts and solicitation raised $260,000 to date for the purchase of library materials. The University Library has continued a high level of support for the Mathematics Library with regular increases of its appropriation and by assigning the Class of 1938 Endowment to the Math Library in 1994. The new facility, significant endowment, renowned comprehensive collection, and the high level of use are a source of pride for the University Library. The Math Library currently holds 53,000 volumes that represent the core of the university’s outstanding mathematics collection. Titles related to specific application reside with appropriate subject collections elsewhere on campus and some rare materials are in the Kroch Library.

Digital Collection

In 1991, 576 mathematics monographs were digitized as part of a preservation project. The titles were carefully selected by library staff and reviewed by a faculty advisory committee with an eye to their mathematical significance. Since this was a preservation project many very worthwhile candidate titles were not digitized because microfilm or reprint editions secured their preservation status. Even though this collection was constrained in its selection of titles it has proven to be a very popular. From the beginning there has been a steady demand from libraries and individuals for printed copies of these books. With no active marketing effort more than 1,000 volumes have been sold to over 200 customers. For project details and ordering information, see: http://www.math.cornell.edu/%7Elibrary/reformat.html. A free, online book browsing interface to Cornell University Library's digitized Historical Math Book Collection (http://cdl.library.cornell.edu/math.html) has generated a high level of use and has helped win recognition for the content of the collection.

Digital Library Efforts of the Partners

Our three institutions are also internationally recognized for outstanding accomplishments in the field of digital library research and production.

State and University Library Göttingen

Göttingen State and University Library has initiated a wide range of digital library projects. Because it holds many DFG-funded special collections in a variety of disciplines, resource discovery in a global environment is one of the key issues for the daily work in Göttingen. Göttingen has mounted significant efforts in making scholarly information available through Internet-based subject gateways (including, for example, MathGuide, GeoGuide, and Anglo-American Culture and History).

Göttingen has undertaken significant projects in the field of mathematics both on the national level with Math-Bib-Net (Corporate Information Services of Libraries and Mathematical Departments), and on the international level with EULER (EUropean Libraries and Electronic Resources in Mathematical Sciences). These efforts are especially important because Göttingen Library has the primary collecting responsibility for mathematics for Germany. Offering a “one-stop shopping site,” EULER will make it possible for users to search for topics in different databases (e.g., bibliographic databases, library online public access catalogues, and indexes of mathematical Internet resources) in a single pass. A Dublin Core based metadata description plays a central role in achieving interoperability in all of these projects.

Göttingen has digitized a number of collections, with significant activities in the fields of historical travel literature and North Americana, as well as in mathematics. The Jahrbuch-project, building up an Electronic Research Archive for Mathematics (ERAM), is a joint effort of Göttingen and the Department on Mathematics at Berlin University (Prof. Wegener). In DIEPER (DIgitised European PERiodicals) project, Göttingen Library is leading a consortium of eight European Libraries, testing decentralized scanning-production and unified access over local repositories. Another goal of DIEPER is the establishment of a European database for digitized documents, which, like the EROMM (European Register of Microform Masters), will serve as a central reference to avoid duplicate digital conversion; the database will be located at Göttingen.

Göttingen Library moved from digital object description to digitizing the documents themselves in 1996, when it became the base for a new funding activity for the Deutsche Forschungsgemeinschaft. Within the program frame of establishing a distributed digital research library in Germany, a new program was initiated to support retrospective digitization of library holdings. The coordination for the initial phase was undertaken in Göttingen, and the results of the task forces on “Digital technique” and “Content selection” were published under the direction of the project officer in Göttingen.

In May 1997 the Göttingen Digitization Center (GDZ) was established. Göttingen is one of two national supply centers for digitization in Germany, with the second center located at the Bavarian State Library in Munich. The focus of the activities at the GDZ has been on the different fields of technology required to build a digital library.

Following the recommendations of the DFG technical task force, the GDZ chose a strategy of collaboration with an industrial software partner to create a Document Management System (DMS) as a key component for the digital library. The main requirements of the task force were to use open, standard formats to ensure a high degree of scalability and interoperability of the prospective digital collections: to this end, the GDZ worked with a database driven system for data import and export, and for handling highly structured documents and metadata. A prototype of Agora, the new DMS, was presented in Göttingen in April 1999 and is now in production at the GDZ. There are currently five Agora installations in Germany and there is an increasing interest outside Germany in the system.

The Agora system is a RDB (Relational database) driven EDMS based on an extensible metadata model. The model can be implemented on different RDB platforms (e.g., Oracle, DB2, and Sybase). An administrative tool (AdminTool), running on Windows NT, controls all functions. In order to allow for maximum interoperability with other metadata sources, the system works with an import/export format that is based on RDF and XML. The Java servlet of Agora acts as interface between the RDB, web server and browser. The communication with the RDB is made through JDBC. HTML templates for the user interface are used to flexibly achieve different views; they are associated with collections through the AdminTool. Based on the structured information in the underlying RDB, elaborate search functionality can be offered to the user. Advanced searches in metadata for different document types (e.g., monographs, multi-volume works, and journals) can be combined with searching in document structures such as chapters, articles, and figures. It is also possible to browse single or multiple collections.

Agora developers recently integrated the Verity Information Server, a powerful full-text search engine, used in a number of significant digital library efforts. The administrator can now offer to users the search capabilities of both the RDB and Verity, making possible not only traditional SQL-queries, but also the range of search functionality that is part of Verity (e.g., fuzzy search and ranking). The inclusion of Verity now makes it possible to offer effective searching, in metadata such as bibliographic fields, titles of chapters, articles, and figures. Later, Göttingen will offer full-text searching as well, a feature that becomes increasingly important as Göttingen moves from digitization of older text material (often in Fraktur type) to 20th century works. Verity is able to search a wide range of document formats from MSWord over PDF to XML files.

Agora’s flexible export functions contribute significantly to interoperability. From its inception, Agora was able to export all data in the RDF/XML format. A recent addition, especially promising for users, was the ability to export PDF files with integrated Bookmarks, created automatically from the structural metadata in the database. Because of the modularity of Agora, the GDZ has been able to add external features to Agora as demonstrated by the successful integration of the on-the-fly conversion of images with the Tif2Gif program, developed at the University of Michigan [TIF2GIF, 1997]. In a related (import) effort, the GDZ has recently developed a tool to convert bibliographic data (Pica/GBV) into Agora’s RDF/XML format, and the tool is freely available as Java-Applet from the GDZ’s web site.

University of Michigan

The University of Michigan Library has been host to or a major participant in a number of significant digital library efforts, including JSTOR [Guthrie, 1997], PEAK [Bonn1, 1999], and a broad campus-wide digital library initiative begun in 1993 [Lougee, 1998]. In 1996, the Digital Library Production Service was formed within the University Library to provide a permanent support framework for the University’s production-level digital library services and collections. Through its Digital Library eXtension Service (DLXS), the University of Michigan’s Digital Library Production Service offers a suite of resources designed to aid educational and non-profit institutions in mounting a broad variety of digital library collections [Price-Wilkin, 1999]. The UM DLXS includes a powerful search engine, middleware, and a set of tools for mounting many types of digital library resources. The search engine, XPAT, is specially designed to handle large and highly structured documents found in digital library efforts. The tools that DLXS makes available are designed to tap the power of the XPAT engine for broad “classes” of resources found in most digital libraries. The University of Michigan invests significant production-level financial and staff resources in the ongoing development, maintenance, and support activities directly associated with the DLXS. There are currently nearly two dozen DLXS institutions, primarily in North America but also in Europe and Africa. Most are actively involved in the creation of digital library collections with significant historical value.

A notable feature of DLXS is its support for rich document formats favored by libraries and archives involved in digital library development. This focus on “durable document” formats includes mechanisms to support materials complying with a number of national and international standards, including the ITU TIFF G4 format for bitonal page image, and XML and SGML encoding (e.g., both TEI and EAD). In addition to this support of standards-compliant formats, the DLXS systems support powerful mechanisms such as wavelet compression, enabling support for higher resolutions of continuous tone images. The XPAT engine is an SGML/XML-aware search engine that the University of Michigan has deployed with an extremely diverse set of digital library resources. XPAT provides excellent support for word and phrase searching, indexing of encoded text (i.e., SGML and XML) elements and attributes, fast retrieval, and open systems integration. As part of the DLXS, the University of Michigan Digital Library Production Service has launched a continuous development process in which we hope to add a number of features to XPAT, including better support for XML and support for Unicode. These approaches, focusing on standards and rich document formats, have allowed DLXS institutions to capitalize on the growing array of services developing around the creation of these formats while ensuring a better longterm investment in conversion and creation of digital documents. [Bonn2, 1999; Price-Wilkin, 1997]

In addition to DLXS, the University of Michigan’s Digital Library Production Service is responsible for a number of other production-oriented digital library efforts. It includes approximately twenty full-time staff working in areas such as digitization, information retrieval, and architecture. The digitization group within DLPS provides high volume OCR services (approximately 2 million pages per year), text encoding, continuous tone imaging (e.g., approximately 10,000 photographic quality images per year), and bitonal scanning for book and journal collections. The information retrieval group is responsible not only for mounting production systems for the University of Michigan, but also for developing a number of “host” services for non-profits and academic enterprises. In this role, DLPS develops and supports online subscription resources for such organizations the Association of Asian Studies and the University of Michigan Press. DLPS has been responsible for the development and deployment of significant systems for authentication, usage analysis, and fee-based transactions.

Cornell University

Cornell University Library has been a leader in digital library efforts for over a decade now. It has created and maintained more than a dozen major digital library collections across a wide range of formats and disciplines from the SGML/XML Encoded Archival Description, through the Cornell University Geospatial Information Repository, to the Core Historical Collection of Agriculture. In the process of developing and maintaining these collections and services Cornell University Library has contributed to the creation of best practices in areas such as text and image conversion. Awards received include the Scout Award for the digital math books collection and the USDA Secretary’s 1999 Honor Award for the USDA-Cornell Economics and Statistics System. Current major initiatives include the Mellon-funded Project Euclid (ProjectEuclid.org), a scholarly communication initiative to enable independent mathematics and statistics journals to publish their issues effectively and efficiently on the web as part of an aggregation.

Other Cornell units, namely Computer Science’s Digital Library Research Group and Cornell Information Technologies have also contributed substantially to the research and practice of digital libraries through such initiatives and systems as Dienst and CUPID. The library has a strong history of partnering with these units to leverage their expertise and to balance the different perspectives that these different units bring to the table.

The Library’s latest cooperative effort with Computer Science is Project PRISM, a four-year, $2.2 million, DLI2 funded effort to investigate and develop the policies and mechanisms needed for information integrity in digital libraries. The project will focus on five key areas: preservation, reliability, interoperability, security and metadata. Project PRISM will undertake research on the policies and mechanisms to ensure information integrity in the context of a component-based digital library architecture. Such an architecture allows the seamless federation of distributed content and services facilitating extensibility through the addition of new technologies and services. Supporting integrity in a digital library implemented as a distributed system poses new technical challenges. Known preservation, reliability, and security solutions were not intended for and are not sufficient for handling the novel characteristics of such digital libraries. Project Prism is a collaboration of uniquely skilled librarians, computer scientists, evaluation experts, and international testbed participants.

The Dienst protocol, at the heart of the currently proposed interoperability project, was first developed at the Cornell Digital Library Research Group. Dienst is a system for configuring a set of individual services running on distributed servers to cooperate in providing the services of a digital library. The open architecture of the Dienst system—exposure of the functionality through a defined protocol—makes it possible to combine Dienst services in flexible ways and augment the existing services with other mediator services, which build on the functionality of the existing services. The Dienst system was first implemented in the Computer Science Technical Reports Project, a DARPA-funded collaboration to establish a digital library of computer science technical reports (NCSTRL). Within the Cornell University Library, Dienst has more recently been used as the basis of Project Euclid, a publishing initiative in mathematics and statistics. For this work, Dienst has been modified and extended, and some of this development work will inform the current project.

CUPID, another building block of the distributed library of mathematical monographs that we want to build, was developed at Cornell Information Technologies. It is an architecture and protocol for high-end, distributed network printing. An implementation of the system built at Cornell allows users to stipulate custom finishing operations such as double-sided printing, stapling, covers or binding. The architecture has the capacity to facilitate billing, assess a usage or copyright fee, and access online documents from web sites, ftp servers and document archive systems (e.g. Dienst) without requiring that users download the document to their workstation. Globally distributed printing is being commercially developed by Netpaper.com. Cornell University Library is presently in negotiations for the design of a digital library printing capacity to be implemented world wide.

A History of Collaboration

The partners have a long history of national and international collaboration. The University Michigan Library and Cornell University Library cooperated on the creation of the Making of America, a multi-year project that has produced a substantial online collection of nineteenth century material. The two institutions are currently working with the Library of Congress to provide access to the Making of America available through LC’s American Memory project. Both institutions are active members of the Digital Library Federation, a body whose steering committee was chaired last year by Cornell’s University Librarian, Sarah E. Thomas. The Michigan PI, John Price-Wilkin serves on the advisory group of Cornell University Library’s IMLS-funded preservation project, directed by Anne Kenney. Cornell University Library has had two major recent international projects. The first one is with the National and University Library of Iceland. This project, called SagaNet, was funded by the Andrew W. Mellon Foundation and the Icelandic Government and Research Council. The other collaboration was the creation and maintenance at Cornell University Library of a mirror site of the major European mathematical indexing and reviewing database, Zentralblatt. The three institutions also have a great deal in common, including their rich collections in mathematics, their strong digital library efforts, their interest in the problems of access (especially with regard to multi-lingual collections of a single discipline) and their involvement in national and international collaboration (including with each other).

The establishment of the Digitization Center (GDZ) at Göttingen State and University Library is closely connected to the DLPS at Michigan and to Cornell University Library. During the establishment of the Center in May 1997, Norbert Lossau, head of the GDZ, together with a Frank Klaproth, visited a number of libraries in the U.S. with a focus on digital library activities. Cornell and Michigan were of special interest for Göttingen because of their extensive experience in fundamental research in the field of digitization techniques and their success in organizing the production of the digital conversion process. Subsequently, in order to share U.S. experiences with a larger audience of German librarians, Anne R. Kenney from Cornell was invited to the first national workshop of the two German Digitization Centers (Göttingen, January 1998). Later, for the third German workshop (Göttingen, October 1999), John Price- Wilkin from Michigan and Sandra Payette from the Cornell Department of Computer Sciences’s Digital Library Research Group discussed their successful efforts in Michigan and Cornell. As part of the continuing exchange with their US colleagues, Norbert Lossau again visited Michigan and Cornell in November 1999. During that visit and subsequently, the operations at Göttingen and Michigan have continued to exchange information and experience regarding procedures for sustaining a high quality of metadata capture and with regard to techniques to make digital documents available via the WWW. At Cornell, Norbert Lossau gave a lecture about “Document Management for digitized Books: RDF/XML as solution for mirroring complex metadata- and document structures,” and participated in intensive discussions about features of the Göttingen Agora and the various Cornell delivery systems. Beyond these personal contacts, the GDZ remains in close communication with both Cornell and Michigan with regard to various topics of digital library research and production. In January 1998, in collaboration with Anne R. Kenney, Lossau published an article about the GDZ in RLG DigiNews [Lossau, 1998]. Lossau and Klaproth also contributed a sidebar (“TIFF Header: A Reference Stamp for Image Files” to Anne R. Kenney’s and Oya Rieger’s new publication, Moving Theory into Practice: Digital Imaging for Libraries and Archives. From Göttingen’s perspective, the proposed formal project-cooperation with Michigan and Cornell is the result of significant ongoing dialogue and should prove of great value for digital library efforts in Germany and the U.S.

Objectives and Significance

This project will be a significant step toward a unified view of the growing number of digital collections hosted by research libraries. Within the Digital Library Federation, attention has been paid to the problem of developing an architecture capable of supporting distributed collections among the member institutions. By focusing our efforts on this large and important body of historical mathematical materials in our three institutions, and by continuing to insist on a high level of functionality and support for standards, we hope to demonstrate a viable path forward for our peer institutions.

Objectives for the grant period

The major objectives proposed by the partners are:

1. We will integrate the basic Dienst protocol in each of the systems maintained at the three institutions, while ensuring that we continue to maintain each of those systems as highly functional and heterogeneous. In doing this, we will create a mechanism whereby each of those unique and highly functional systems can retrieve information from the others. As a consequence, users will be able to access all three of the collections simultaneously from any of the systems mounted at the three institutions, thus simplifying both discovery and use.
2. By using the Dienst protocol with the three separate systems, we will implement a distributed repository system for richly encoded, standards-based historical literature. This will allow us to grow and maintain the large body of materials through loose coordination.
3. The project participants will integrate OCR in the document repositories, and will use this as a basis for evaluating the value and problems of full-text searching across multi-lingual text collections at each of the three institutions.
4. Based on the results of the evaluation of full-text searching across multi-lingual text collections at the three institutions, we will develop, implement, and evaluate mechanisms for cross-collection searching in the Dienst protocol.
5. Should cross-collection full-text searching prove impractical, we will explore the costs and value of adding searchable table of contents information to the collections.
6. In addition to providing local printing through the availability of PDF, the project partners wil attempt to integrate support for distributed printing (perhaps through CUPID) in each of the three systems.
7. The project partners will work to ensure that the digitized monographs are linked to available online reviews in Mathematical Reviews, Zentralblatt and the Jahrbuch
8. The project will create a coordination point for historical mathematical information.

Significance of the Developments

Accomplishing the above objectives promises significant benefits both for research and practice. It will add to our knowledge and understanding of the internationally important areas of research into of interoperable access and delivery systems. Lessons learned and solutions can be generalized to other disciplines and materials. Free access to this information will allow a diverse group of users to consult a large and historically significant collection of materials in mathematics. The resulting system will have major benefits to the mathematics community, as well as to the general public at large. By providing the collection without access restrictions, the effort will benefit not only large European and US research libraries, but also persons at institutions without these historically rich collections. Notably, small colleges such as many of the historically black institutions in the U.S., and other institutions world-wide will all benefit from access to this sizable and thematically focused collections. We also hope that ready access to a large, unified body of digitized monographs will help eliminate duplication in future digitization efforts at other libraries; that is, the raised profile of items in this combined collection should help make institutions aware that the items have already been converted.

The work on interoperability is a key element of the proposed activity. Digital library efforts at institutions such as Michigan, Cornell, and Göttingen have not adopted protocols such as Dienst, in the past, because of the inability of those protocols to support rich document encoding such as XML or standards-based storage and delivery formats. We have also insisted on high levels of functionality such as full-text searching across repositories, and these limitations in the protocol have been cited as well. At our institutions, parallel work has progressed in which highly functional systems have been developed around requirements to support complex, standards-based objects and high degrees of functionality (e.g., full-text searching). The Dienst protocol has matured significantly and the current version offers support for many of the requirements of the institutions hosting these rich collections [Dienst, 1999]. The incorporation of the Dienst protocol in the systems at Michigan and Göttingen, and the further development of the Euclid system at Cornell to support greater functionality will be extremely influential in communities such as the Digital Library Federation. Moreover, the ability of the systems at these institutions to be interoperable with initiatives such as the Open Archives Initiative will bridge an important gap in current and historical literature.

Plan of Work

Phase 1: Staff at Michigan and Göttingen will perform a basic mapping of Dienst protocol functions to DLXS TextClass functions and Agora system functions. Differences between those in the two systems and the functions in the protocol will be analyzed, and functions not found in the two systems will be identified. Basic functions found in the systems at Michigan and Göttingen but not found in Dienst will be identified and discussed for possible expansion of the Dienst protocol. Simultaneously, Cornell will begin generating OCR for its math collection, and Michigan will begin converting materials in its collection.

Phase 2: Michigan and Göttingen will incorporate the basic functions of the Dienst protocol in the DLXS and Agora systems, and Cornell will implement the current Math collection using a separate instance of the Euclid system. Dienst protocol mechanisms that support searching across bibliographic data, browsing metadata, and page/document retrieval will be incorporated in the DLXS and Agora systems. While this will be accomplished by developing a Dienst layer for each of the two systems, we expect that both DLXS’s TextClass system and Agora will need to be modified to take the protocol layer into account.

Phase 3: Project participants will begin testing basic interoperability between the three systems, using those titles that have been converted at Michigan, as well as the full collection at Cornell and any available titles in Göttingen. Basic deficiencies in cross-collection access will be identified. It is expected that, at the outset, each of the three sites will use mirroring as needed to address problems of network latency.

Phase 4: The work of providing specifications for full-text access will begin in parallel to work that seeks to address deficiencies identified in Phase 3. Specifications for full-text access should address issues of query formulation (e.g., should we use a subset of Z39.50 to encode query operators and search terms?) and the form in which results are returned. The problem of returning meaningful results, especially in collections of book-length materials, is especially important to address.

Phase 5: The systems at the three institutions will be released to the public for basic browsing and bibliographic searching. Implementation of full-text functionality in the developing protocol will begin. The Michigan and Göttingen systems will work to incorporate support for CUPID in order to provide distributed printing.

Phase 6: Full-text searching across the three repositories will be released. Evaluation and documentation of the distributed system will begin, and results will be shared with the appropriate communities, though especially with the Digital Library Federation in the U.S. and the VDF (Verteilte Digitale Forschungsbibliothek) libraries in Germany

State and University Library Göttingen

The Agora system of the GDZ has its strengths in a rich metadata mechanism, supported by a robust system architecture based on the relational database and a powerful full-text search engine (Verity Information Server). Specialized search capabilities are offered for a variety of types of metadata, and both object and metadata are managed and accessed in a distributed environment, connected via http protocol. To date, resources on the Agora document server are available both directly via the Web-Sites of the GDZ and via the PICA/GBV Online Union Library Catalog. Integrated access is provided at the bibliographic level for digital documents through traditional library catalog, a key issue for German DFGfunded digitization projects. Because it relies on RDF/XML as both an import and export format, the Agora system is designed for data interoperability.

Project work at Göttingen will focus on developing a software module to make the current version of the Dienst protocol available to Agora resources. The result of the development will be a gateway providing all Dienst protocol functions using the strength of RDF/XML as metadata format and TEI/XML as encoding for full text. One focus of the work at Göttingen will be the representation and mapping of different character sets found in a multilingual distributed repository. Serving requests from the distributed Dienst protocol will be an important addition to the sophisticated retrieval functionality of the highly structured digital objects supported by the Agora system. Enabling different ways of access to a single digital object repository, as well as a unified means of access to different decentralized document repositories, will be a very important accomplishment for internationally distributed libraries.

University of Michigan

Although the University of Michigan DLXS middleware offers significant functionality in repository retrieval and navigation, even in a distributed environment, it is not interoperable with other non-DLXS digital library systems. To date, DLXS components have been deployed in multi-machine environments within a single institution, with indexes distributed among several different functionally specialized servers [Weise, 2000]. DLXS resources have also been deployed in a multi-institutional environment, for example with cross-machine full-text searching of EAD-encoded finding aids demonstrated at five institutions, including Oxford University [DFAS, 1999]. Although the DLXS methods for retrieval are highly generalized (e.g., allowing many external projects to by-pass the interface and point to individual pages, sections, or works within collections), and although development has been heavily influenced by object-oriented design, the browse and search mechanisms are not interoperable with any existing higher level protocol (e.g., Z39.50).

Project work at Michigan will focus on developing a software module to make the current version of the Dienst protocol available to DLXS resources. For example, DLXS middleware will continue to use local database applications to manage resolution of object identifiers to their location(s), but will accept and interpret requests for those same objects through the Dienst protocol. Similarly, although digital objects (and parts of digital objects) will be delivered via the protocol, they will continue to capitalize on the digital object management strategies (e.g., the strong reliance on standards-based formats) underlying DLXS systems. The addition of this layer will ensure that a baseline of interoperability will be in place for large digital library projects without compromising the high level of functionality provided by the DLXS.

Cornell University

Cornell University Library has significant history and experience in implementing Dienst, an interoperable digital library architecture and protocol developed by the Cornell Digital Library Research Group in Cornell's Computer Science Department. In 1996, the library developed and implemented a document browse system based on Dienst 3.5. This layered system architecture included a user interface in Perl communicating with low-level repository information via the Dienst protocol. In 1998, library personnel experimented with Dienst 4.0, providing feedback to Dienst developers. Later, the library implemented a revised Dienst-based user interface, called Hunter. Built on Dienst 5.1 and developed by Cornell Information Technologies, Hunter offered increased user functionality by mediating access to a collection of Dienst services, such as repository information now expressed in XML, an indexing and searching service, and hand-off to full-text printing services. More recently, Project Euclid has modified and extended Dienst so that it supports an even more complete range of digital library services and functions.

The library is thus committed to the development of an open digital library architecture, with system functionality exposed through a defined protocol such as Dienst. Project work at Cornell will concentrate on building even greater functionality into the system developed for Project Euclid. A redesigned interface service is anticipated, extending functionality to allow for such services as true cross-collection searching of both bibliographic metadata and full-text OCR output, improved search results delivery and navigation, and enhanced navigation of internal document structure.

Plans for Documentation and Sharing of Content

The interoperability modules developed for each of the separate systems will be documented as part of their maintenance, support, and distribution. At the University of Michigan and at Göttingen, the modules will be supplied with formally supported digital library systems (i.e., DLXS and Agora), and thus will require formal documentation for customers. We will work with customers to determine the adequacy of the documentation, though both the module and the documentation will be made freely available.

The successful digital library of mathematical monographs will be maintained as a production system after the funded project phase. Each participating library considers its collection of digitized math books to be part of its core mission and consequently each institution is committed to maintaining its collection through its library budget. The proposed activity at each institution will be supported by permanently funded, production-level digital library operations, thus enabling us to allocate resources to this end. The three partnering institutions will continue to communicate with each other with the goal of coordinating further efforts.

All digitized material will also be linked from or to different databases, according to their publication year:

Up to 1943: Jahrbuch über die Fortschritte der Mathematik will include links to all online project items
1933-present: Zentralblatt für Mathematik; reviews will be linked from project resources, and we will request that editors add links to project resources in the database
1943-present: Mathematical reviews; reviews will be linked from project resources, and we will request that editors add links to project resources in the database.

This will ensure another important avenue for “discovery” for mathematicians. In addition, at Göttingen State and University Library, materials will be available through two European internet-based projects, EULER and REYNARDUS and by means of MATHNET.

Evaluation, Dissemination and Maintenance of the Results

The work of the project will be evaluated to determine the extent to which the digital library systems at the respective institutions can interoperate using the Dienst protocol, and the extent to which that protocol can accommodate the high degree of functionality in those existing systems. Criteria include successful searching of bibliographic data (including appropriate mechanisms to balance precision and recall), reliable browsing of bibliographic data (including multi-lingual sorting), and successful browsing of books containing page images. This must be accomplished through each interface at the three participating institutions, operating with the three separate systems. A further criterion, especially with regard to extending the capabilities of the current Dienst protocol, includes successful full-text searching across the three repositories, especially with regard to providing users with information on relevance and location of results within monographic volumes (i.e., to aid in user navigation). Support for both local and distributing printing will also be evaluated.

The partners will share the experience and lessons learned from the project with the profession through papers and conference presentations, and especially with Digital Library Federation partners (e.g., in the Architecture Committee). Further, access to the content of the collection will remain available free of charge to the international public. The modules written to accomplish interoperability will be demonstrated to participants of the Open Archives Initiative and will be made available to any requesting institution; further, we anticipate that eventually the entire Euclid system, developed at Cornell, will be made freely available. Each of the institutions will begin work to incorporate the interoperability layer in other collections.

Management Plan

Management of this project has the added challenge of coordinating the activities of developers at three different international institutions. Communication must be encouraged with both the informal use of email, the use of regularly scheduled telephone conferences, and occasional site visits by all of the major participants. A development infrastructure must be set up to allow easily sharing code, data and documentation. Project Management software tools for timeline management and documentation must be agreed upon and used at all three institutions. There must be a coordinated review of milestones for each individual institution at least once a month.

This project will be split into three overlapping phases:

1. Research & Design: Evaluation of the functionality of the Dienst protocol, assessment of the features of Michigan's DLXS middleware, Cornell's Euclid system, and Göttingen’s
Agora middleware, identification & design of the needed enhancements to the Dienst protocol, design of the addition of the Dienst protocol layer each middleware.

Milestones

Development Infrastructure in place: 1 month from inception of project
Evaluation of each institution's requirements for Dienst protocol: 2 months from inception of project
Partitioning of development responsibilities: 2 months from inception of project
Initial High Level Design complete: 3 months from inception of project
Design of Full-Text Searching complete: 12 months from inception of project

2. Development: Software development at three institutions including specified milestones for interim testing of interoperability.

Milestones

Detailed Development Timeline: 4 months from inception of project
Preliminary interoperability testing: 9 months from inception of project
Initial development complete: 12 months from inception of project
Additional functionality (Full-Text Searching complete: 18 months from inception of project

3. Integration Release: Release of fully interoperable system involving three institutions, testing, necessary enhancements, overall system evaluation, and final documentation.

Milestones

Initial public release from three institutions: 13 months from inception of project
Preliminary Evaluation of System: 18 months from inception of project
Public release of Full-Text Searching): 19 months from inception of project
Final Evaluation & Documentation of System: 24 months from inception of project

The Research, Design, and Development work will be partitioned between the three institutions to take advantage of the existing middleware development at each of the digital libraries. Michigan, Cornell and Göttingen will focus on integrating the Dienst software layer to each of their middleware implementations. Because the Dienst protocol is as a project of the CDLRG (Cornell Digital Library Research Group) and not a part of Cornell University Library, we will work to ensure that the addition of functionality to the Dienst software is communicated to CDLRG, and their feedback taken into account. Finally, the benefits of bringing the development team from all three institutions together at one location are invaluable, and it is anticipated that the full development team will meet once at each institution to coincide with each of the three project phases.

Site visit one (three months from inception of project): project manager and project programmer from Göttingen and the University of Michigan will travel to Ithaca, NY.
Site visit two (twelve months from inception of project): project manager and project programmer from Cornell University and the University of Michigan will travel to Göttingen.
Site visit three (eighteen months from inception of project): project manager and project programmer from Cornell University and the Göttingen will travel to Ann Arbor, MI.

REFERENCES CITED

[Bonn1, 1999] Bonn, Maria; Wendy P. Lougee, Jeffrey K. Mackie Mason and Juan F. Riveros “A Report on the Peak Experiment: Context and Design.” D-Lib Magazine, June 1999, online at http://www.dlib.org/dlib/june99/06bonn.html

[Bonn2, 1999] Bonn, Maria. “University of Michigan Polices and Practice for the Long Term Retention of Locally Produced Digital Projects and Materials: A Report Prepared for the Joint RLG/TASK Force on Digital Preservation” online at http://www.umdl.umich.edu/um-rlg.html

[DFAS, 1999] “Supporting Access to Diverse and Distributed Finding Aids: A Final Report to the Digital Library Federation on the Distributed Finding Aid Server Project,” available online at http://www.umdl.umich.edu/dfas/dfas-final.html

[Dienst, 1999] “Dienst, Overview and Introduction,” available online at http://www.cs.cornell.edu/cdlrg/dienst/DienstOverview.htm

[Lossau, 1998] Lossau, Norbert; Klaproth, Frank "Digitization Efforts at the Center for Retrospective Digitization, Göttingen University Library." RLG DigiNews, 3:1 (1998), 7-10.

[Lougee, 1998] Lougee, Wendy P. “The University of Michigan Digital Library Program: A Retrospective on Collaboration within the Academy.” Library Hi-Tech, 16:1 (1998), 51-59.

[Price-Wilkin, 1997] Price-Wilkin, John. “Just-in-time Conversion, Just-in-case Collections: Effectively leveraging rich document formats for the WWW.” D-Lib Magazine, May 1997, online at http://www.dlib.org/dlib/may97/michigan/05pricewilkin.html

[Price-Wilkin, 1999] Price-Wilkin, John. “Moving the Digital Library from ‘Project’ to ‘Production.’” DLW99, Tsukuba, Japan. March 1999. [TIF2GIF, 1997] TIF2GIF web site at http://kalex.engin.umich.edu/tif2gif/

[Weise, 2000] Weise, John, Alan Pagliere, and Matthew Stoeffler. “Integrating Heterogeneous Databases in a Distributed Environment: The ‘Pictures of Record’ System.” Proposed for publication in mid-2000.

[Guthrie, 1997] Guthrie, Kevin, and Wendy Lougee. “The JSTOR Solution: Accessing and Preserving the Past.” Library Journal 122:2 (1997), 42-44.

 





© Copyright 2004

The University of Michigan Library
Cornell University Library
the State and University Library Göttingen