Digital Mathematics Library

NSF Award Number: DUE-0206640
[DML Home] [DC presentations]

Digital Mathematics Library Planning Meeting

July 29-30, 2002

National Science Foundation

Meeting co-chairs:
Peter Michor, University of Vienna
David Mumford, Brown University

Meeting Minutes

Monday, July 29
Defining the issues:
[Content]
[Format]
[Copyright / intellectual property]
[Archiving]
Roles of participants:
[Mathematicians]
[Publishers]
[Technical specialists]
[Librarians]
[Monday summaries]
Tuesday, July 30
[Structure: process and organization]
[Next steps / working groups]
[Timetable]

Monday, July 29

Welcome and Opening

John Hunt (Mathematical and Physical Sciences Directorate, NSF) welcomed participants to NSF. He expressed the hope that the DML would make the mathematical sciences more useful to other sciences, noting that as it enables the other sciences, mathematics grows in response to the challenges that these pose.

Philippe Tondeur, (MPS Division of Mathematical Sciences, NSF) outlined the goals and anticipated problems of a Digital Math Library, drawing on the "manifesto" he presented at the Joint Mathematics Meeting in San Diego last January (at DML website in San Diego summaries). [View presentation].

Jean Poland (Cornell University Library; Co-PI on DML planning grant) welcomed participants and hosts and thanked Philippe Tondeur for his vision that inspired plans for the DML. She noted that the meeting agenda had been drawn up following points laid out in Tondeur's manifesto and Ewing's white paper. She welcomed meeting co-chairs David Mumford and Peter Michor.

David Mumford recalled his pleasure at first accessing scientific papers electronically. He acknowledged the important work done by small groups, such as the LANL, in this regard, but said that to continue to advance, digital projects must be coordinated internationally. Mumford asserted that, particularly in light of the work of the CEIC, the IMU would play a logical role in this kind of international coordination of efforts. He expressed his excitement at the potential of the DML to fuse past, present, and future scholarship in mathematics.

Peter Michor recalled conversations with Bernd Wegner that led to the development of the EMIS library. He recognized that wider involvement was needed to take the idea of a digital math library further. Michor expressed the hope that the current DML project would generate enough interest from commercial publishers that these would provide material.

Introductions around the table.

Defining the issues

[return to top]

Content

Bernd Wegner (TU Berlin/Zbl) outlined the "First Problems Related to Contents and Access" in building a DML, focusing on the question of the collection's scope and on technical matters relating to access and structure. [View presentation].

On the question of DML's scope, Jonathan Borwein (Simon Fraser U) urged that the group begin by eliminating the notion that the collection could be built without a comprehensive strategy. The work of defining the scope and setting priorities is necessary from the outset. A list is needed.

John Ewing (AMS) proposed a view of collection content in four dimensions (each posing its own problems):

  1. Up / Down (advanced / elementary) -- what about the middle?
  2. Horizontal (geographic) -- what becomes of non-Western mathematics, etc.?
  3. Side-to-side (along disciplinary boundaries) -- relationship of math to physics, economics, etc.
  4. Time (past / present) -- does DML contain current as well as past materials? How old is "past" literature?
It is fine to talk about a developing a list, but first we need principles to apply to each of these four dimensions.

Tim Cole (Urbana-Champaign): Like in the library, we need to start with collection development guidelines, then see what is available that fits the guidelines.

Mumford observed that the reviewing journals have already done the work of prioritization. It is a question of whether we want to accept the scope they represent.

Robby Robson (Eduworks): The question should be rephrased to ask what our audience wants. The reviewing journals' audience of research mathematicians is one audience, but not the only possible one.

David Tranah (Cambridge UP) noted that publishers tag materials by keyword and subject content and suggested this could aid selection.

Ewing responded that this would not suffice for reviewing and selecting materials.

Wegner added that the Jahrbuch project had shown that selection is not always cost effective. For instance, when working with a series it is more efficient to digitize the series as a whole, even if it includes work in math's border disciplines.

Michor: It is important to start immediately, so that the project has something to show. We can start with core mathematics materials, and then determine whether adjacent subjects should be added. In terms of deciding which level of materials to include, textbooks might be a dangerous area for the DML, since this would potentially cut into publishers' business model.

Poland: Yes, the DML needs something to show, but defining the collection is essential. We must identify the mathematics literature while we proceed with production.

Robson: With regard to Wegner's first question ("all mathematical publications" or a selection?) -- assuring quality implies a selection process.

Gertraud Griepke (Springer) recommended we start with material for which we have extensive metadata.

Cole: We must leverage what exists -- material that is already digitized and non-digital material for which we have metadata.

Robson: A focus on metadata and quality criteria should define the project, not what we can digitize. Others will digitize. The collection will be an aggregation of collections.

Borwein articulated what he called the "invisible item" on the agenda -- the question of funding and business models. How is this collection to be made available to the user community? (And how do we define this community?)

Michor: Preservation is a "cultural must" and should not be guided solely by business model. University libraries should begin to digitize and a registry should be developed of what is being digitized and what remains to be done. Legacy materials (manuscripts, etc.) should be a priority. Metadata can be at the collection level initially for some of the material and details left for future editorial work.

Poland reminded the group that libraries do not operate without funding.

Cole noted the importance of version control in building and maintaining the collection.

With regard to funding, Tondeur advised the group that support could be expected from NSF, DFG, EEC, etc. if it can be demonstrated that the project is useful for science. He suggested that funding on the scale of $100M was feasible, if the funding agencies see a project that justifies the expenditure. Long-term sustainability is essential to the project and for this reason libraries are the natural repositories. Tondeur laid out three phases:

  1. design phase (relatively easy)
  2. capital project phase (needs a good format to attract a distributed network of funders)
  3. sustainable phase (research libraries)
Borwein noted the possibility of parochial instincts influencing the national funding agencies. How can the national agencies be tied to an international project?

Conceding that he could speak only for the NSF, Tondeur insisted that the international nature of the project is compelling. He offered CERN as a model -- here, taxpayer funds from many countries support an international scientific project.

Marcy Rosenkrantz (Cornell University Library) added that the Cornell and Göttingen libraries are submitting a joint proposal to NSF and DFG for a math archiving project. DFG and NSF's joint funding program for International Digital Libraries Research requires international collaborative activities.

Sigrun Eckelmann (DFG) agreed that her agency has a strong interest in supporting international cooperative projects and global research. She said the DML should demonstrate:

With 3-5 years' experience, the project can expand. She encouraged the group to think in terms of a long-range timescale.

Tranah: Focus on works of exposition or application?

Robson: The question of the business model is not only a strictly financial question. Need to consider the cost-benefit analysis of a distributed digital library. Need to consider incentives (not only financial) for libraries to prioritize long-term sustainability.

Poland: Libraries are developing models for long-term sustainability of digital archives.

Pierre Bérard (NUMDAM): CNRS wants to support French projects, but is also interested in the DML as an international endeavor. Multilateral and bilateral efforts are seen favorably. There is even direct support for other countries in some cases (i.e. developing countries).

Hans Becker (SUB Göttingen): The question of users/audience is essential. For example, historians and educators have different needs. A survey of users is needed to define needs.

Robson: Electronic materials are now becoming important to the educational mission of the university. This can provide another avenue for funding the DML.

Griepke: Developing countries, which have had limited access to print materials, can be better reached via electronic means. They are not represented at this meeting, but they need access and they need funds. The World Health Organization's Health InterNetwork Access to Research Initiative (HINARI) provides developing countries access to leading medical journals through a medical and life sciences portal.

Poland concluded the session, reminding participants that this is an initial meeting in which questions will be defined, rather than solved. Hope to set up working groups that would meet in Beijing, etc. and produce interim reports.

[return to top]

Format

Ulf Rehmann (Bielefeld) presented on the DjVu file format as "A Suitable Format for the DML." [View presentation at http://www.mathematik.uni-bielefeld.de/~rehmann/DML/Planning/]. DjVu is at once a file format, an image compression technique, and a delivery method. It offers much smaller file size than comparable formats, and excellent searching and linking capabilities. It is compatible with other formats and its code is available in the public domain. Rehmann compared file sizes for DjVu and other formats using examples of digitized mathematics. He explained that an application has been set up on a public server (DjVuLibre) that will convert other formats to DjVu and run OCR.

Rehmann recommended that as part of the DML infrastructure, the national groups involved in DML set up public servers for DjVu conversion and OCR, optimized for the national language. Individual mathematicians should be encouraged to link papers on their homepages to DML. Need to establish tools -- like public servers for OCR -- for mathematicians who want to contribute. DML would need to make recommendations for submission to homogenize the presentation. It should be easy for working mathematicians to contribute their own work, or work that interests them. DML should be open -- not only for digitization centers.

Cole: DML should offer the user conversion to various formats, e.g. DjVu to PDF.

Rehmann: Links would be lost in the conversion.

Michor: It is too early to make DjVu the exclusive format for DML -- not everyone has it. PDF should be offered as the de facto standard.

Rehmann: Yes, offer multiple formats, but DjVu as the format of choice.

Ewing: DjVu is designed with the purpose of making scanned images searchable. PDF can be used in this way, but it is not set up for the purpose and is not optimal. Still should be cautious, however, because the superior technology is not always the one that survives as the standard -- example of Betamax and VHS in 1980s.

Robson asked whether DjVu would support a MathML layer. Asked whether DjVu is a suitable format for long-term storage. Does it support structured metadata? Can it support audio (issue of compliance with Americans with Disabilities Act)?

Keith Dennis (Cornell) said he believed the DjVu layers are extensible.

Robson: The client side is attractive, but the storage side is unclear.

Becker: Important to distinguish between archiving and representation formats. Today PDF is adequate for representation, but bad for archiving. Older versions are not compatible.

Ewing: And PDF is not an open standard.

Rosenkrantz: PDF is not a real standard. It is a de facto standard.

Bérard: DjVu is not useful when beginning from TeX. NUMDAM's policy is to keep the TIF files. These are more appropriate for archiving and maybe in the future we can run better OCR. Today (in production mode) the vendor for NUMDAM is unwilling or does not know how to run OCR for DjVu. Thus, NUMDAM's PDF is OCR'd and our DjVu is not. We need to distinguish between experimentation and production.

Borwein: If we want to adopt DjVu, at what stage is it appropriate to approach LizardTech and consider co-development? R and D and production can be intertwined.

[return to top]

Copyright / Intellectual Property

Bérard began by acknowledging the complexity of intellectual property matters. One complicating factor is that various kinds of media are governed by the same laws, which often reflect the interests of powerful players (such as the music industry) that are not immediately involved in scholarly communication. The international Berne Convention protects an author in his or her own country as well as in all signatory countries, according to the national laws of the individual signatories. Thus, a publisher can place an action in the country whose laws are most favorable to the publisher. Europe recognizes an author's moral rights to the intellectual product. For digitization efforts, this means permission must be sought from author as well as publisher. Adding links may constitute an alteration of the original. Must make clear that links are an annotation, the equivalent of footnotes. European law allows countries to make exceptions for not-for-profit, educational purposes. In France, this has not happened due to pressures from the entertainment industry. Need for pressure from the scholarly community.

It is essential that the DML effort work out the intellectual property question. While cases of individuals posting papers to the web against copyright are usually unproblematic, a large-scale effort must be careful to stay above reproach. Necessary to get permissions -- mathematicians are typically happy to grant permission (more complicated to track down authors of less recent material). This can provide leverage with publishers who may not be able to get permissions from authors as readily as the math community can.

Bérard cautioned that the DML group must not underestimate the political aspect of copyright or the work involved in complying with the law. But it is necessary to put in a good faith effort to comply.

Ewing: Also necessary to contact an author's heirs -- 70 years after the author's death. A real difficulty.

Bérard: Important to show good-faith effort. If NUMDAM is still unable to contact author, the work is posted, then removed if there is a complaint.

Ewing: Like NUMDAM, AMS followed Springer policy of asking permission three times, then assuming it was OK to proceed. One can take risks, but must be careful.

Mumford asked for clarification about differences in intellectual property among different countries.

It was pointed out that in the German situation an institute of authors, rather than individual authors (as in France), handles authors' permissions, so it is easier to secure permission.

Robson: In some cases, publishers may allow electronic posting, but not printing. Or posting in low resolution. DML must draw clear lines about what it is and isn't willing to do.

Tranah: In the UK, the publisher has the publication rights. Tranah suggested an article for the Notices on copyright. Ask publishers to make the future transfer of copyright to DML explicit. CUP would be willing to do this -- it is not making money on old material. [Corrigendum 6 Sep 2002 (David Tranah): "I don't believe I said that publishers would transfer copyright to DML, rather that publisher would lease rights to DML."]

Ewing: There is a split between the British sphere (UK, US, Canada, etc.) and rest of the world in this regard. The European notion of "moral rights" makes it impossible for the author to sign away copyright (as DML needs). For instance, JSTOR has assumed that as long as a publisher with which it has an agreement owns the copyright, JSTOR can digitize.

Borwein: Intellectual property issues in moving print to digital are complex. Also complex within electronic format: now being considered whether conversion from LaTeX to MathML requires permission of author.

Robson (?): What if author or publisher grants access (free) to DML, but insists on owning material on own server? If we are willing to allow publishers to keep material on their servers and DML points to the material, this is easier. All we need is the right to transmit. And the right to index.

Tranah / Ewing / Cole: Material must be distributed for archiving purposes.

Borwein: Otherwise, there is danger of the emergence of one large math publisher with exclusive control and the ability to switch off the archive.

Tranah: Authors needn't assign pure patrimonial rights: nonexclusive rights.

Ewing: It is in authors' interest to grant publishers as many rights as are necessary for publishers to steward the scholarship. Publishers should then give back rights to the author -- to post work, etc. When authors grant as much control as possible to publishers, we only have to deal with one body when we wish to digitize and distribute the work.

Bérard: NUMDAM is now asking authors to transfer rights to the society that owns the journal; the society then assigns partial rights to the publisher.

Robson: Much mathematics is generated by research in corporations, etc. Some is protected by patent law. What about copyrighted algorithms, etc.?

Tranah: It is not possible to copyright facts, e.g. theorems…

Bérard: We need to formulate guidelines taking different countries' laws into account. Must convince publishers that access to past literature is good for them. We must persuade the mathematics community to give rights responsibly and to go to publishers that follow guidelines.

Tranah: If we can educate authors about these issues, they will select publishers who support access.

Ewing: This is difficult -- mathematicians are not typically interested in copyright issues.

Tranah: This will change. Analogy to growth of environmental consciousness.

Wegner: Our recommendations need to take into account materials that are being published today.

Wegner / Tranah: We need to create a model rights contract and publicize it widely.

Griepke: With a well-defined project it will be easier to negotiate an agreement with publishers.

[return to top]

Archiving

Hans Becker and Heike Neuroth (SUB Göttingen) presented on Göttingen's experience with digital archiving, particularly with the ERAM project, focusing on metadata, long-term archiving, and access. They offered a set of theses concerning the development of the DML [See handout].

Neuroth raised the question of whether archiving amounts to storage or whether it also implies access and retrieval.

Michor: Need to provide for migration of data.

Neuroth: This is expensive.

Ewing: The question of when formats become obsolete is not obvious.

Mumford: Perhaps a stabilization in the evolution of formats is on the horizon.

Dennis: It is impossible to migrate all the documents in the world. Backward compatibility needs to be assured with each new development.

Michor predicted that the rate of change of formats will slow. Doesn't think TeX will stop being useful.

Tranah: If we charge even a nominal fee this could help pay for updates. This is what CUP does.

Mumford: Doesn't believe TIF will become unreadable. It is like ink on paper: patterns of dots.

Ewing: Willing to guarantee that TIF will become unreadable. Agree with Keith Dennis that backwards compatibility must be assured, but this amounts to a huge buildup over time.

Wegner: Long-term readability must be addressed as part of DML project. Not optimistic about the sustainability of formats, especially considering the unreadability of old PDF files with recent versions of Acrobat Reader.

Becker: Every library owns technologies bought on confidence from reputable companies that have become problematic. E.g. microfilm formats without corresponding reader. E.g. the loss of the digitized Domesday Book in Britain.

Robson: Agreed in part with David Mumford's point about TIF. It is important to distinguish between archiving the physical representation of material and its semantic content. These are different layers that must be cleanly separated.

Cole: We must pick formats with longevity. Best to use multiple formats, so that there is backup.

Griepke: Important to involve as many people as possible in whatever solution is chosen. Even if the solution is not perfect, it will be more powerful with buy-in.

Borwein: Must make DML project part of mathematics culture…

Roles of participants

[return to top]

Mathematicians

Keith Dennis opened the discussion with two questions, posed in lieu of a presentation:
  1. What do mathematicians want/need from the DML project?
  2. What can mathematicians contribute to the project?
Jean Poland added question 3: Who are mathematicians? Who is the DML for?

Dennis: DML is for "mathematical scientists" -- scholars who make use of mathematics.

Ewing: People who teach mathematics at institutions of higher education? Or research mathematicians? There are many more of the former than the latter.

Borwein: What about other consumers of math, e.g. Wall Street? …

Mumford: Graduate textbooks should be "in" (because they are consulted by researchers); college textbooks "out."

Ewing: Note that the Monthly is by far the most used mathematics title at JSTOR.

Dennis: JSTOR is now digitizing Math Magazine and similar. The MathDL project is focusing on the educational community.

Cole: It is easier to secure funding for projects with educational focus.

Tondeur: For NSF there is no contradiction between educational and research materials. As in the case of the relationship to the border disciplines of mathematics, NSF accepts a broad, open definition.

Michor: In terms of what mathematicians can contribute, they can promote the enterprise and lobby editorial boards to give license for digitization and give permission via IMU for scanning and distribution.

Mumford: When will the DML be deliverable to the average mathematician?

Dennis: Some of the material is already there. Timeline estimate: In nine months, complete a proposal. Perhaps one year later funding would be available for a multi-year project. So, two years till funding, plus perhaps two years till a significant amount is available.

Borwein: Yes, some of the material is available now, but there is no common look and feel.

Mumford suggested a white paper on what is available today.

Becker: Maybe in one year search engines / portals can be available to point to the different projects. We now have tools to start combining projects.

Dennis: For most mathematicians, digitized material only exists if it is linked from MR/Zbl (for example, there is little awareness of JSTOR). Either material is "there" of it is not – most mathematicians are not interested in the details.

Michor: EMIS could serve as the link between projects.

Cole: DOI could improve linking.

Wegner: Being covered by a reviewing database is considered a kind of authorization. This establishes work as standard mathematics literature. What is out is out. MR, Zbl, Jahrbuch, and Russian reviewing service should serve as the navigational mechanism for DML.

Dennis: Maybe European mathematicians know EMIS. But maybe three on the Cornell faculty know it. After three years, CU will become a MathNet member.

Wegner: EMIS is a good service, but not suited for DML. There is not enough manpower. EMIS needs to focus on its core tasks.

Borwein: It is counterproductive to publicize something before it is ready. Necessary to prepare the ground, but have something to show before making public.

Ewing: Can we clarify: Why digitize mathematics?

Tondeur: Mathematics is an enabling discipline. It is pervasive in all of science, a tool for doing science. To make it an active tool, mathematics must be universally accessible. Today, individuals have access to only a piece of the literature (and some have no access). Science, enabled by mathematics, is a project for development. Access to resources empowers people.

Cole: Digitizing math is also important as a model for other disciplines -- it will be useful for future digital library activities. Math is a small enough discipline that the project is doable. Math is enabling, but it is also prototypical for other disciplines.

Tondeur: The project can overcome the isolation of scientists and help form networks. Why mathematics? Because we believe we can do it. Because we are already doing it. Because we use older materials.

Ewing: We must address ourselves to non-mathematicians: What is the tangible benefit of spending public money on this project?

Tondeur: Public money is spent on such things as the worldwide network of telescopes. DML is modest in comparison. It is consistent with the scale of scientific infrastructure paid for by public funds.

Borwein: Is there a model for something like this being funded by a grant to an institution?

Tondeur: NSF likes to fund projects rather than institutions. But CERN is an example.

Borwein: The DML marks an epochal event in the capture and transfer of one of the world's great literatures.

[return to top]

Publishers

Gertraud Griepke presented on the stakes for publishers in participating in DML. [View presentation]

She emphasized differences among publishers in terms of their aims, scope, size, etc. Some publishers seek to control channels of distribution, others to develop within a community. Need to determine what the agenda of a publisher is and try to find what publisher has in common with the objectives of DML and possible points for collaboration.

Ewing: The key is to develop the project such that each party (including publishers) has something to gain by participating. So far, there is not much to entice publishers. It will not work to shame them into cooperation. Need to find ways to make cooperation beneficial. In exchange for open access to material older than 5 years, it may be necessary to offer to digitize the material and hand it back to the publisher to integrate the back file with the most recent subscription material. This adds value to the current subscription. But we would need to be sure of proper archiving and means of access should the publisher go out of business. The question of where the material resides is not as important as its accessibility. If open access is guaranteed, it does not hurt DML to have materials integrated into publisher's site -- and it provides a real incentive. This strategy might work not with an Elsevier.

Tranah: It might even work with Elsevier: publishers like traffic. What a publisher does is to help catalyze the production of information, help measure its quality, and promote its dissemination. Publishers add value to information and it is fair that they should benefit from their work. For publisher, it is important that they not give away that which has realizable value. What are publishers willing to give away? Content without realizable value and content for which they receive quid pro quo. There are two types of digital content: scanned images of slow-moving (out of print) material and material that is born digital. Born-digital material typeset in a fully-tagged manner can be used to produce print or electronic versions. Born-digital materials in math have no apparent opportunities for versioning because MathML is not yet available. In connection with DML, CUP would like: a list of tasks, a finished or at least stabilized MathML, research and development results (CUP not big enough to fund R&D in math).

James Crowley (SIAM): SIAM has put journals online, but has not digitized back literature. SIAM did discuss with JSTOR, but the cost would have been several hundred thousand dollars. Cost of maintenance seems prohibitive and SIAM does not know how to prepare for problem of format obsolescence. SIAM is interested in participating in DML -- but in a way in which costs are not overwhelming and in which SIAM can maintain connection to the literature.

Cole: Other societies also lack adequate prospects for research and development. How do we guarantee that publishers do not pull out if they maintain control of materials?

Ewing: DML needs irrevocable contract, which should not be a big problem. More difficult in Europe.

Wegner: The contract between author and publisher says that material will be given visibility. DML satisfies this agreement. This should be remembered as a principle.

Borwein: Open access for materials that predate 5-year window.

Ewing / Crowley: 5-year window from time of original publication.

Ewing: Books have a longer life. Would not agree to 5-year window for books. AMS is making books openly accessible online with a button for purchase of the print version. The increased visibility offsets the give-away of content. AMS might be willing to put many books online if DML would do the digitization (which costs AMS $1000 per book). Marketing tool. AMS has thousands of downloads for every ten books sold and it is still worth it. The point is, we can convince even book publishers that DML is worth their participation.

Griepke: Issue of how to measure the meaning of a download. We must proceed from the assumption that every download is "serious" and take the figures seriously into account.

Michor (summarizing Ewing, et al.): How can publishers profit from DML?

Here is another model: Ewing: Publishers are not interested in the whole library, only their own material. Model proposed by Ewing, et al. is not to suggest that materials reside only on publisher servers. Rather, the point is that ownership of the materials should be clear. With a distributed collection, "strange things" can happen to files, metadata, etc. In some ways, distribution can make files less secure.

Robson: Looking at what publishers do, owning content seems less important than services:

Ewing: Owning content is not unimportant. Holding and keeping 2500 titles in print is an asset.

Tranah: Journals are different -- a publisher may publish them on behalf of another entity.

Rosenkrantz: If the digitizer gives content back to the publisher, how is the cost of digitizing recovered?

Cole: The digitizer gets free access after five years.

Rosenkrantz: But so does everyone else. With Project Euclid, CUL digitizes for publishers who cannot afford to do it themselves. Some publishers pay CUL, some ask for pay-per-view option…

Cole: Euclid hasn't considered what happens five years from now.

[return to top]

Technical Specialists

Robby Robson opened the discussion with a PowerPoint presentation [View Presentation]. Robson discussed hardware, software, and finances as technical issues. He cautioned participants not to "get involved in religions" with regard to hardware. He considered three aspects of software: "content lifecycle tools," architecture, and standards. Discussion of finances included the question of the economic worth of spin-offs from research and development conducted as part of the project (the potential for the equivalent of the space program's "Teflon and Tang," as one participant put it).

Neuroth: Possible for project to develop harvesting tools for non-English languages? Do we have the experience to inicate whether a distributed or centralized model is better?

Robson gave his own opinion: distributed system is more manageable, stable, and reliable (modular). But maybe problematic from the standpoint of ownership.

Robson: Project should put out RFI: "these are our requirements." Developing metadata is a priority:

  1. Standardizing on metadata schema
  2. Building taxonomies (such as have been developed for educational mathematics)
Each of the dimensions Ewing proposed has a metadata component.

Cole: A distributed or centralized system? We must live with a hybrid. The developers of Dienst (a distributed system) are now working on OAI harvesting (centralized). Perhaps metadata needs to be stored centrally.

Borwein: A proposal: We are aiming to capture refereed book and journal materials reviewed in MR/Zbl (from 1940). After two years, scope can be expanded.

Ewing: Should be cautions about getting too involved in research and development. We should keep our eyes on the objective to digitize and use what is already available whenever possible. Important to set standards, yes, but not develop technologies.

Borwein: Because math is a niche, some things will not be done without the DML's initiative. Will we not pay to have anything developed?

Robson: If we develop a standard and enunciate requirements then the commercial world may respond. We should do minimal research and development to show what we want.

Wegner: Setting standards is already too much. How can we hook on to developments in related areas? Euler. EMANI. We need working group to propose a strategy and schedule, not lose time in this meeting.

OCR discussion:

Becker: Fine Reader (now version 6.0) handles OCR in 177 languages. Especially good for Cyrillic characters. A European project, METAe (http://meta-e.uibk.ac.at//), is working on OCR for old German Fraktur script. METAe will be in public domain.

[return to top]

Librarians

Jean Poland: Libraries store, preserve, and provide access to materials -- regardless of whether books, journals, etc. Library collections change as formats change. Libraries have long experience with collection development. Move toward digital library poses a number of challenges. Libraries want to make growing digital collections available to users, but often have problems tracking which materials we are providing via subscription. Libraries are grappling with the issue of sustainability of digital materials.

CUL is now the custodian of the physics ArXiv. Other activities include providing a live mirror for the American Physical Society's Physical Review Online Archive (PROLA) and now also for Numerical Recipes. CUL mirrors PROLA in exchange for access -- a new business model. CUL is thinking about digital repositories: subject-based and institutional (and subject-based matrices of institutional repositories).

Cole: An attractive aspect of the DML project is its wide scope. Library digital projects have tended to be about rare materials -- but the DML is material for daily use. We need a well-specified linking standard.

Ewing: The reviewing services are committed to providing links.

Cole: The library involvement in DML addresses the archival question -- mathematicians have not been comfortable relying on publishers to archive materials, and libraries play a logical role here.

Steve Rockey (CUL): DML cannot be too idiosyncratic to mathematics. Math is a small piece of the library budget and an idiosyncratic project will be difficult to sustain.

Becker: Need to link DML to national cultural heritage preservation movements.

[return to top]

Monday summaries

Mumford: For Tuesday, what is the best order in which to proceed? Michor: Poland: Rosenkrantz and Neuroth: A working group on archiving issues is necessary.

Borwein: Also, need to deal with content boundaries up front.

Dennis: Math is a small player with little clout.

Cole: On the contrary, math as a discipline provides the perfect scale of project to test OAIS model. But important to keep in mind scalability to other disciplines.

Tranah: But are math issues scalable to other disciplines?

Rosenkrantz: We should not worry about scalability -- project provides prototypicality.

Ewing: We cannot move forward without agreement on metadata standards.

Tranah: If math is supposed to enable other fields, we need to run decisions past engineers, etc.

Griepke: This is no longer the beginning of the digital era. All participants are building on past experience with handling digital materials. Springer is building on more digital publishing knowledge than has been revealed today, since this is not the immediate topic of today's meeting.

Rosenkrantz: The CUL-Göttingen NSF/DFG project is seen as a prototype. Also a funding concern: the prototype function disallows finger-pointing like "this is math, so mathematicians should pay for development."

Borwein: Yes, but as a mathematician, I am loyal to the discipline -- not willing to say what is good for engineers is good for mathematicians.

Robson: Project should function as a reference model, rather than a prototype. A working model used by the community that can be scaled, adapted by other communities.

Wegner: We can provide links, but access depends on agreements with the providers.

Dennis: The project must support archiving development, but it cannot solve digital archiving issues -- this is ultimately the responsibility of librarians.

[return to top]

Tuesday, July 30

[return to top]

Structure: process and organization

Day 2 opened with a conference call with Phillip Griffiths (Institute for Advanced Study). Griffiths, current secretary of IMU, discussed the IMU and its possible role in the DML project. The IMU is happy to provide any aid or to act as an umbrella for the DML structure, if desired. IMU is branch of UN's UNESCO, functions as not-for-profit entity. Since secretary is in US now, the IAS facility is used for IMU administrative functions. IMU is tax-exempt. DML would need to be a separate entity. Griffiths recommends 501C3 public charity status: Criterion for 501C3: support from broad spectrum (international math community). Role of IMU: would appoint board that would in essence report to the IMU, but would not be a part of IMU.

Mumford: Can public charity receive licenses?

Griffiths: Aware of a precedent.

Tranah: Is there any possibility that IMU could own DML or the digital rights of books/journals now out of copyright (which would come back into copyright once digitized)?

Borwein: Please clarify -- none of these suggestions is contingent on IMU secretariat remaining in US?

Griffiths: No, not contingent on this.

Ewing: If DML is an entity that owns rights, etc., it would need to exist in perpetuity -- is that what we want?

Griffiths: Better if DML is separate from IMU -- or a watertight department if a part (to avoid potential debt issues).

Borwein: IMU will support in a moral and organizational, but not in a financial fashion, correct?

Griffiths: Correct.

Ewing: The IMU can facilitate in a moral fashion. But the DML must organize itself and come to IMU with a proposal.

Griffiths signs off.

Dennis: Is it necessary to have a central organization? Or cooperation among local projects?

Borwein: If DML is only an aggregate of local projects, no one will be looking after integration (coordination of finances, etc.).

Ewing: We may be trapped by our own language: "Digital Mathematics Library." We want to digitize and link math literature, not build a brick-and-mortar library, perhaps not even an electronic-and-mortar one that owns rights in perpetuity, etc.

Mumford presented a chart:

-------------------
What is DML?

Coordinates:

Holds/manages rights

Board
Staff
Consortium of members
-------------------

Robson: Can W3C serve as a model?

Ewing: A loose structure…

Robson: It feeds other organizations, e.g. ISO.

Wood: But it is a bureaucracy with offices in Italy, Japan…

Eckelmann: A set of tasks -- coordinating the ongoing discussion about standards, metadata, etc. -- may be best for a board. Rights are a separate question.

Tranah: In most cases, rights would remain with the publisher. Only in certain cases would rights need to belong to DML.

Crowley: JSTOR as cautionary example. If we create a bureaucracy, it will cost more to perpetuate.

Mumford: Yes. IMU is positive example: no staff, no budget.

Ewing: We are approaching the issue backwards -- discussing structure before making decisions about what we want the DML to do.

Cole: Need to decide what we want to have after five years. Probably we do not want the organization to exist. For that reason, the organization should not own rights.

Robson: For DML to exist in perpetuity, someone has to manage technical support and records. Could this be passed on to a publisher? Need to coordinate funding. There are no other strong needs as long as DML holds the metadata, and not the materials.

Wegner: Agree with Ewing. Also in the case of EMANI, the structure is being kept loose, open, and focused on the task of building the collection. Perhaps a more formal organization will be created later.

Mumford: Seems to be consensus for light-weight organization / consortium.

Rosenkrantz: We can have a consortium without formal structure: an agreement to agree.

Borwein: Need for working groups.

Ewing: Need to cut down project -- not cut goal, but divide off phases. Propose phase 1: journal literature. Now ask: what do we need to begin phase 1?

Need for realism.

Griepke: Should concentrate on a distributed architecture, because no single entity can oversee with so many different sources. Concentrate on finding, not holding. Must ask publishers what they can give away, not what they are willing to give: they are bound by contracts.

Tranah: Publishers will not give rights away, but will grant permissions. In some cases, however, DML will need to be able to own rights.

Wegner: Must respect the fact that some (national) bodies will want to handle their own materials.

Robson: Agree with Ewing: we need to start, not endlessly discuss. We need a light organization and working groups.

Borwein: What about the relationship between countries in the position to fund and those where the literature sits?

Mumford: Russian example.

Tondeur: This is a good reason for a distributed funding model. EC is more likely to fund Russia, for instance. NSF is not likely to play a big role there.

Wegner: There is already a project involving the Russian Academy of Sciences, Russian libraries, and German partners.

Bérard: The first issue is funding. Funding requires people and a project. Need to identify groups and regions that can fund. Where is interest? Good to make a list of priorities for digitization, but must also allow local groups to set priorities. CNRS will put money on the table. Has money to fund developing countries, Eastern Europe. We need quality standards, quality control -- not central control, but guidelines. Individual groups also want freedom to develop their own software.

Borwein: Difference between presentation and production software.

Cole: Back to question of target audience…

Dennis: Hadn't group concluded that the project is focused on "the sort of material that would be listed in MR and Zentralblatt (and Jahrbuch)"?

Robson: Yes, as long as we can agree what this criterion would mean for 13th-century material, for instance. Can we start with Antiquity?

Dennis: This could be a local project for Greece.

Ewing: We would need to convince the funding agencies that this would make research more efficient. Digitizing 13th-century algebra is a mathematician's project -- not convincing to a congressman. Best to start with focus on journals.

Borwein: The bulk of what we want is from the relatively recent past: a large amount of literature used by a large part of the math community.

Robson: Backwards/forwards can be decided later -- the core is fairly clear. But we should put away 5% for non-core projects -- e.g. non-Western or historical material.

Dennis: Yes, test cases.

Robson: Note: Fostering research mathematics is not the same as fostering research by mathematicians.

Mumford drew up a chart:

-------------------
Stage 1:
Journals in MR, Zentralblatt, Jahrbuch and local choices
-------------------

Cole and Mumford: The central organization does legal work, helps local groups in negotiating rights.

Mumford: Need to announce to entire world mathematics community -- show digital offerings so far, explain rights situation, ask for mathematicians to give non-exclusive rights to DML. This is a "vote" for DML.

Borwein: DML would need to exist first.

Griepke: Such action would lose the participation of publishers. It would seem that DML was first asking for publishers' cooperation, and then chasing authors.

Ewing: Publishers understand copyright as providing economic incentive. DML must use gentle persuasion with publishers, not a club.

Borwein: Would come across as a fait accompli. Pulling rug out from under publishers.

Griepke: It gives the impression of a hidden agenda.

Bérard: Would not necessarily view Mumford's proposal as a threat to publishers. For digitization, individual authors' permission is required -- also for publishers who own the rights to the print. Letters to authors could be a service to publishers.

Cole: Also need to be cautious vis-ŕ-vis funding agencies, who may be wary of a big (US-based) organization exerting pressure.

Tranah: It is a wrong idea to publicize the DML and ask for permissions in a single letter. In any case, this would have no legal status without reference to specific works.

Ewing: Need legal entity that can sign contracts. Publisher hands back files to a body to digitize them with the agreement that they will be returned. Publisher needs to sign agreement -- but with whom?

Cole: Propose working group: project management.

Becker: Libraries constitute the archiving institution. What does the central organization (IMU) have to do with libraries?

Neuroth: Dublin Core presents a model: decentralized working groups.

Ewing: In terms of standards, no need for centralization. But we are doing more: exchange of intellectual property -- rights, contracts.

Cole: Propose working group to address question.

Ewing: Pass on that most participants want whatever decentralized structure to be light.

Robson: Presentation to outside may be different from the light structure that we all seem to agree on for internal purposes.

[return to top]

Next Steps / Working Groups

Working groups were established and names of members were proposed. [DML Working Groups]

Warchall: Need to concentrate on what the group wants to create. In order to request funding from NSF, it will be necessary to write a proposal.

Dennis: When a structure is laid out, this can be submitted to NSF and other agencies. Specific proposals will be submitted for specific projects as part of DML.

Need to multitask. Proposals will move forward in parallel.

Dennis: A short-term goal: within one year, a document to submit to NSF. The working groups write appendices to the document.

Rosenkrantz: Can the group write or contribute to NSF's RFP?

Warchall: No -- there is no plan to write another RFP. The NSF only funds grant proposals.

Poland: We have a planning grant to put together a proposal to the NSF.

Cole: The Institute of Museum and Library Services' "Framework of Guidance for Building Good Digital Collections" might be a useful model for DML's recommendations (http://www.imls.gov/pubs/forumframework.htm).

Warchall: The main task is to get the project off the ground. The NSF proposal flows from this.

Bérard: Need document that can be shown to various funding agencies. Now, we need money to make the immediate work possible. Willing to ask CNRS. Something to demonstrate DML's international scope.

Becker: At the end of nine months, need to show international structure to national funding agencies. Must show the interoperability of the parts of DML.

Robson: At the end of nine months, should have:

This requires time and meetings for detailed proposals. Planning to plan is not enough.

Borwein: The pieces that are mature need to be developed right away, to take advantage of funding opportunities as they arise.

Tranah: Does the NSF grant funds to "DML" or does DML formulate guidelines that determine which projects are eligible for funding?

Poland: Current funding is for the present meeting and for a second planning meeting. Can other international agencies (or corporations) help fund the planning process?

Cole: Can a planning document be prepared by the end of 2002?

Eckelmann: May not make sense to ask for funding for this planning stage.

Bérard: Would like to ask CNRS for funding now.

Michor: The most important thing is to produce the document for NSF. Needs to address:

Rosenkrantz: CUL only digitizes when it receives funding. There is no separate budget for this.

Robson: We are not reinventing standards; we are gathering them. But this is an expensive process.

Ewing: Agree. Each of these topics is complicated. Must prioritize or no progress can be made.

Cole: This prioritization can be accomplished with what we have.

Robson: Then there is no need for six committees with ten members each to make a priority list.

Wegner: Need to clarify tasks and obligations for the groups.

Dennis: At the end of the process, we need a planning document. Can we have a draft by the end of 2002?

Robson: Or by ICM in Beijing?

Dennis and Poland: This is not realistic.

Poland: Need to identify tasks and timelines for the working groups.

Possible sites for face-to-face meetings of working groups:

Next meeting of large group: Baltimore (mid-January)?

Toward a draft DML vision statement:
"To make digitally available for online [consumption] all past math [literature|scholarship|heritage] in the form of a distributed and persistent library"
[Add: affordable, sustainable, universal (addressing disability), fidelity. To move math research forward. Math as enabling discipline.]

[return to top]
Timetable
DC meeting29-30 July 2002
Set date for spring 2003 meeting15 September 2002
Vision statement available15 October 2002
Request for supplementary funds15 October 2002
Interim working group reports15 October 2002
Final working group reports1 February 2003
Next meetingMarch or May 2003
Proposal submissionsanytime

Warchall: NSF's fiscal year starts in October. Money is spent by May. Proposals (unsolicited) should come in fall or winter.

Tondeur: Each proposal to a national funding agency funds a specific piece of DML; each proposal is likely to be more successful as part of the whole. Each piece of the project should reference the others in the proposal. Important not to drop the project on the agency -- rather, prepare in discussion with the program officer.

What status does the group need to have to apply effectively as a group?

Robson: What if we want to move forward with something earlier than the timeline?

Borwein: Who authorizes an individual to go back to a national funding agency to ask for funding on behalf of the DML? If I want to apply for funds for a digitization project, how do I establish belonging to the DML?

Poland: Letters of support will be available from the steering committee. When the interim reports are ready, this establishes some structure.

Robson: The steering committee of what?

Mumford and Poland: For the time being, the steering committee of the current planning grant.

The group agreed that working group memberships and descriptions of working group tasks would go out as soon as possible. E-mail discussion lists will be established.

Jean Poland closed the meeting, noting the energy and enthusiasm for the project that had been apparent throughout the sessions.

[return to top]


DML website maintained by Kizer Walker, Cornell University Library (kw33@cornell.edu)
Last updated: 12 December 2002