We've heard the arguments that mathematics is global, and therefore it makes sense for mathematical information or content to be readily available worldwide; that it's fundamental to many other disciplines, and so should be readily available to practitioners of those; that it lasts much more than say physics or medicine, so mathematical information needs to remain available; that it has a history of rediscovery, and so ready access could avoid this. All these point therefore to the fact that mathematics is an excellent test case for the more general problem of digitisation.
But is it?
To address this question, let me ask 'what is the point of digitisation?'.
Is it to (electronically) archive for historical purposes, i.e. to ensure that information remains available indefinitely, but in a form that's more portable than paper? (One has to assume that teams of experts are on standby to switch information to new formats as older ones are superseded.) Should value be added to the information to make it more electronically useful; should that be a second stage in electronically archiving, secondary to the crucial one of establishing a plain store?
Or is the point of digitisation to build a repository of relatively current digital content that users can subscribe to, or buy from, or pay to view, or even access at no cost?
These two are not synonymous; information may perhaps be stored in a similar way, but the business model for each is different. In my superficial survey below, I shall ignore issues of copyright, except to distinguish between libraries, which are a sort of content warehouser, and the organisations that own or supply information, such as learned societies or publishers, which I call (copyright) owners. However, in the real world, this issue cannot be ignored, and unless the work in question is out of copyright, then warehousers need clearance before they can digitise and distribute information, for example over a network. Suffice it to say that clearance can be very problematic, simply because owners may not hold electronic rights in published work.
Back to the first model then. The plain historic archive, i.e. one for which there's no electronic enhancement, might require a large investment on the part of the warehouser or the owner to set up. But, the argument goes, it will be worth it because, for both the warehouser and the owner, it will be easier to use and maintain, safer, and cheaper to store than paper. Content can be printed on demand or locally for ease of use. The investment can pay for itself by reducing overheads. An enhanced archive benefits from all the above and has added value by permitting documents to be online, for example. An historic archive might be a single library. Or it might be something like the Cornell University enterprise, which digitises old mathematics works that are either out of copyright or out of print and for which digital reprint rights can be obtained at no cost, based on the assumption that such works have no value to the owner, or are out of copyright.
But really, as far as the plain historic archive is concerned, there is nothing special about mathematics here. All manner of content could be usefully archived in this way. However, there are peculiarities in mathematics (and other disciplines in which mathematical notation is used such as physics or econometrics) that make optical character recognition (OCR) problematic, and so it might well be that archives would not be fully enhanced without more work and expenditure.
With a plain historic archive, publishers will be able to print on demand, and so keep books in print indefinitely. For example, Cambridge University Press has already electronically archived some 2300 backlist titles, and has brought back into print some 600 titles, including 300 history titles. All our books, in principle, stay in print, under our imprint, indefinitely. Based on our own experience we reckon that annually, millions of dollars worth of orders are placed for out of print books: on demand printing has helped us to commercially realise the potential of our own out-of-print books. This is evidence that simply because a book is out of print does not mean to say it has no commercially realisable value; moreover there is evidence that this value is better realised by publishers if only because they know about commerce.
The investment in an historic archive has paid for itself partly through reduced overheads, and partly through new sales. Ironically, Cambridge currently only sells 'digitised' books in paper form, because historic content is stored as scanned images, a rather plain electronic archive; there is no OCR involved. There is no a priori reason why we couldn't derive income via pay-per-view or by selling subscriptions to an electronic archive; we'd need a system for capturing content, for checking it, and for selling it, at price that the market would pay. And the system would need to be as simple as possible, because each element of the archive -- each book -- almost by definition generates only a little income. Our programme may not be technologically fancy, but it actually works, and works commercially too. All manner of Camrbridge's content is usefully, if plainly, archived in this way, not just mathematics. An example of a less plain archive, but one that does not house mathematics, is the Electronic Text Center at the University of Virginia Library (see http://etext.lib.virginia.edu). In this electronic library are 1800 digital books mostly out of copyright, mostly fiction. In the period August 2000 to May 2002, some 6.4 million Microsoft Reader and Palm Pilot ebooks were shipped. In the Virginia model, free access is limited to some degree.
The cost of an historic archive can be reduced by warehousers or owners joining forces to spread the initial investment. Alternatively, the owner can lease rights to a third party who can take over the responsibility of archiving and distribution of information. This is already happening with some societies and JSTOR for example. I should add that publishers are also leasing rights to third parties such as Netlibrary or Ebrary who, to start with, digitised content without adding much functionality, and distributed books to institutions. Again, such activity was by no means limited to mathematics. Currently both Ebrary and Netlibrary require publishers to submit files, and undertake no digitisation themselves. They require mathematics books as pdf files.
In the second model, that is content that is born digital, there will of course be archival benefits in the future but the real aim is create digital content from the outset in order to exploit it more fully. The technology required to create the content is different from that required to create an historic archive from existing printed material. The investment is needed on the part of the information supplier (perhaps a university, or learned society, or research establishment, or publisher, or author, or some combination of them). For some of these suppliers it might be essential for there to be an income stream in order to pay for the initial investment, or to provide for the continued existence of the supplier. No special investment is needed on the part of the warehouser, though users might need to establish different ways of paying for information. For example, rather than relying on central library budgets or grants to pay for books or subscriptions, users might need to be able to pay online with departmental credit cards.
The digital content can be used to create paper or electronic products, or to create different bundles of information. At Cambridge, documents are XML-coded, and, so the theory goes, can be versioned as required. The theory does not yet extend to mathematics, but of course, all subjects should in principle be covered. Many of those do not involve much mathematics in an essential way, but might have other features that don't normally crop up in mathematical publications, such as half-tones. It might well be that the force driving the second model requires primarily that the system works for such other subjects; that it works for mathematics is of secondary interest, and indeed making it work might not be commercially worth the effort.
In the two models I've described, it's clear that there's nothing special about mathematics. Both models are required for other sorts of information that are just as valid and universal. Would a system that works for one subject be scalable to others? Mathematics might just complicate the issue unnecessarily. And what works for mathematics might well not be complicated enough to scale to other areas, for which there are greater arguments for ready access, such as medicine. In medicine or genetics, the demand for an historic archive is surely far outweighed by the need for born-digital content, and the volume and value of material in those areas mean that commercial effort is not being expended on mathematics.
If mathematics is to have an enhanced historic archive that relates smoothly to the born-digital content currently being created, then probably this has to be centrally planned and funded therefore. It's also necessary for cooperation between the relevant parties, in order to guarantee standards and cross-platform usability, to create universal tools and technology, to give guidelines and procedures for establishing ownership of and for clearing copyright, to make it easy to switch between archive types, to transfer ownership of archives, and to create sensible business models for the products. The research and investment required could not recouped, or even allocated, commercially, and so must be funded governmentally. Once set up, one would hope the running costs for various archives can be paid for out of income generated or costs saved.
In sum mathematics is different: it's small, but universal and long-lasting; it's important to many other disciplines. Its characteristics mean that whereas simple technology can create electronic historical archives and run them commercially, enhanced archives that parallel the born-digital content currently being created, are only marginally viable. A market for historic non-mathematical content evidently exists, whether that information is archived in plain or in enhanced electronic form. It's not too much of an extrapolation to say that the same is true for mathematical content.