Cornell University Library Staff Web Page
Back to StaffWeb Index
Cornell University Library
Prepared for
Sarah Thomas, Carl A. Kroch Librarian
by
Martha J. Crowe
May 26, 1998
CONTENTS
Benefits of Electronic Dissertations
Implications for the Cornell Library
The purpose of this report is to explore the issues surrounding the electronic publication of dissertations (ETDs) at Cornell and to examine the feasibility of submission, storage, and distribution of ETDs at the initiative of the University Library. It will sample the progress made at other universities in this direction, describe the Virginia Tech initiative, and detail the service now offered by UMI.
Conversations with representatives of other Cornell units whose support and cooperation would be necessary for a successful project helped round out the view of the factors that would need to be considered (see appendix [1] for a list of contacts' names and their units). A consideration of both the benefits and concerns that derive from offering ETDs is followed by a discussion of their implications for the library. In many cases the issues that one would assume to be relatively straightforward prove to be complex, and some that are just surfacing have not yet been dealt with.
In 1992 the Coalition for Networked Information (CNI), Virginia Tech, the Council of Graduate Schools, and UMI announced the project "The Capture and Storage of Electronic Theses and Dissertations" and issued a call for participation. CNI agreed to sponsor Cornell, the University of Michigan, and Penn State's joint proposal (called DAISY) and Virginia Tech's SGML project. Since sponsorship did not include any financial support, DAISY had to make its own way. Funding to further Tech's project to develop and disseminate a standard method using SGML to make dissertations available online came through a grant from the Southeastern Universities Research Association (SURA) for 1996/1997. In January1996 Tech also received a three-year federal grant from the Department of Education to create the Networked Digital Library of Theses and Dissertations (NDLTD). Thus Tech is actively recruiting other institutions, internationally, to join in the initiative.
At Cornell five dissertations from the College of Engineering were initially chosen to be scanned and mounted on a gopher server. The Department of Mathematics was also approached, but the chairman was unwilling to commit to the project without some clear policy statements from the library about the role of the Digital Library. From May 1993 through May 1995 CIT scanned 114 engineering dissertations in TIFF format, which are currently stored on library servers but not viewable by the public.
ETDs were a seminal idea at the time, and at Cornell they were an idea whose time had not yet comethe prevailing environment failed to provide sufficient impetus to continue the project. Even though the initiators of the pilot project at Cornell were themselves enthusiastic, and they had secured letters of support from three departments in the College of Engineering (Computer Science, Civil Engineering, and Electrical Engineering), broad institutional interest was lacking.
Documentation of the project's goal, the chronology of steps taken from its inception, and the exacting procedures devised show in hindsight that the participants were on track. In sum, those working directly on DAISY were innovative and committed, but the project was not as well supported as it would have been if Cornell had fielded a full team.
The publication of electronic theses and dissertations is for everyone still a work-in-progresseven Virginia Tech considers itself not yet in production mode. The consensus is, however, that the ongoing change in graduate education and scholarly publishing will inevitably take the dissertation genre with it. The level of current involvement of other institutions ranges from "mild interest" through a multitude of pilot or "would-be" pilot efforts to the full commitment of requiring that all dissertations be submitted in electronic format. Of eight peer institutions that were asked to inform us of their current stance, six responded, reflecting a diverse sample.
Berkeley discussed the possibility of ETDs a while back and agreed that it was worth pursuing. Cornell's inquiry has prompted the Digital Library R&D Committee to consider whether Berkeley should begin to move on the issue now. The Stanford Graduate School has determined that it would prefer not to lead in this area. Since the feeling is that the idea would require a lot of championing there, the library believes that UMI's service is its current best choice. There is mild interest at Columbia but too many competing priorities and no "champion" there now. That could change at any time, however, and it would be interested in Cornell's assessment. The library at Illinois has proposed to the Graduate School that it join with the NDLTD and has asked for approval for an experimental program. MIT has recently appointed a project leader and hopes to have a pilot project within a year. It will probably not follow the NDLTD model but will have a number of similarities. Some aspects might be modeled to the National Computer Science Technical Reports Library (NCSTRL) infrastructure. MIT would be happy to discuss ETD issues further with Cornell. The University of Michigan has partnered with the NDLTD and is currently helping test one of the proposed standardized coding models. It will mount the test dissertations to demonstrate the advantages and disadvantages to the Graduate School. Penn and Penn State did not respond.
On the Canadian scene the University of Waterloo already has functioning a small ETD pilot site of Acrobat files and was invited to collaborate with the University of Toronto and York University. Before embarking on its project, Waterloo conducted a survey of universities to which it received twenty-nine responses. All but five answered yes to the question: "If not involved now, is your institution considering electronic submission, storage, distribution of theses and dissertations?" When one extrapolates on this sampling, ETD publishing among the larger research libraries appears to be at the stage of a "hot topic" or nascent pilot project.
As an individual institutional model, the Virginia Tech initiative (624 titles
available online, 137 withheld from access, mandated electronic format for
all submissions) is unequivocally the most advanced and aggressive.
As the foundation of the NDLTD, the Tech initiative has taken its impetus from that national project. At a funding level of $210,000 from the Department of Education and over $1,000,000 from corporations that include Adobe, IBM, and Microsoft, it has unparalleled financial support. The federal grant, by officially creating the NDLTD, also assured Tech's position as the anointed national effort, which is now guided by an international steering committee since its scope has broadened. The NDLTD's focus on developing and testing models through collaborative efforts to arrive at standards for document formats and interoperability make it by nature a pace-setting undertaking.
Improvement of Graduate Education
Tech sees as its primary aim the improvement of graduate education by making students information literate, so that, as the scholars of the future, they understand the issues and technologies relating to electronic publishing and digital libraries. According to Ed Fox, the director of the project, they learn this firsthand by publishing their initial academic credentials in electronic format. To this end, Tech has packaged an extensive, integrated Web-based program that leads students through the steps of creating a satisfactory digital document and submitting it electronically to the graduate school. It had also established a New Media Center quite independently of the ETD initiative, which is now codirected by a librarian, with staff, software, and equipment that can help students with every stage of ETD preparation and submission.
Standards and Interoperability
Developing standards in collaboration with its member libraries is the underpinning of the creation of the NDLTD. For a long-lived archive, standards should be easily supportable and nonproprietary in nature, so there will be less difficulty in refreshing the archive when current standards and technologies become obsolete. At present the NDLTD accepts dissertations prepared in PDF format (Acrobat, a published standard developed by Adobe as a successor to PostScript) and in a version of SGML. Multimedia should be in other applicable standards such as JPEG for images and MPEG for video. Currently most students use PDF, but ultimately both types will be collected (see appendix [2] for a discussion of progress in establishing standard representations).
The NDLTD makes use of technology that is modular and distributed, allowing
for federated searches, that is, parallel queries across multiple search
sites. Adaptation of Cornell's Dienst server to access ETDs would mean that
end-users would have a single view of the distributed sets of ETDs and would
be able to search the full text of metadata (including the abstract) with
one query to all sites in parallel. Members must demonstrate interoperability
with Tech's ETD server via Z39.50 or, optionally, use Cornell's Dienst software,
as with NCSTRL. Tech currently has two options for searching ETDs, and it
is looking into other technology that may be easier to maintain than
Dienst.
Membership in the NDLTD
Currently there are twenty-two member libraries, including four foreign institutions. Membership in the initiative entails
A letter of commitment
Collaboration with other members in establishing standards and sharing of information to ensure interoperability
Sharing all ETD MARC bibliographic records
Member libraries are responsible for serving and maintaining their own dissertation files, as well as for implementing their preferred method of archiving. Although the NDLTD welcomes participation in, for example, developing templates and testing software, there is no obligation to do so. A sample letter of commitment is available on the Web, but a library need not follow that format. Basically it asks for a statement that the institution intends to establish a pilot program for serving ETDs with the ultimate goal of requiring electronic submission at some time in the future. There are many optionschoices depend on local politics, practice, economics, and other individual institutional needs.
Parallel to the Web site for students, a Web site detailing all the aspects of the NDLTD aids members step-by-step to develop an ETD initiative on their own campuses. On Tech's Scholarly Communications Project Web site information is available on hardware and software requirements, including instructions on how to implement them. Tech's training materials may be freely adopted, as well as the submission software by which students upload their documents to the graduate school. The sites even include public relations materials such as fliers to faculty and students, and a slide presentation for prospective members. Ed Fox, Gail McMillan (University Libraries), and John Eaton (dean of the Graduate School) are ready to provide copies of videotapes, papers, and a CD-ROM copy of the Tech Web site. They will also visit, or conduct phone discussions or a videoconference.
The NDLTD presents a high-level vision of an international digital library for theses and dissertations. From the standpoint of the academic community, if a research library has as its intention the free dissemination of the scholarly product without an intermediary, which also provides the academy with more control over its own intellectual product, then this is the definitive initiative.
UMI became the initiator of the movement toward digital dissertations by
convening a meeting in 1987 to discuss the concept. It has a representative
on the NDLTD's steering and technology committees and has cooperated with
Tech since the start of the project, continuing to be active with the NDLTD
while developing its own service. In addition, UMI is committed to using
and promoting the applications and standards developed by the NDLTD.
Beginning with 1997 dissertations, UMI,s ProQuest Digital Dissertations program will assure digital formatting for all submissions, either by accepting dissertations in electronic format or by scanning and digitizing paper or microfilm submissions. Submissions in electronic format will be printed out and microfilmed, and the digital format will be entered into the digital archive for distribution online. Documents on CD-ROM will be distributed only on CDs. Compound documents, consisting of both text on paper and an electronic format, will be processed in a manner similar to paper documents but will be distributed as hardbound copies only. A library binding is considered the only secure way to accommodate printed text with a CD or floppy disk attached to the binding.
Institutions that do not subscribe to the UMI service pay $19.50 for a copy of the digital format of a dissertation (with a discount for additional titles). Those libraries that do subscribe receive the following:
Web access to Dissertation Abstracts
Preview of the first twenty-four pages of all digital dissertations
Full-text copies. Access to all ETDs from the U.S. and Canada from 1997 onward is included in the subscription price. For downloading a dissertation, UMI opens up an FTP line for the requested title for forty-eight hours (with two, forty-eight-hour extensions if necessary).
Online access to MARC bibliographic records for the parent library
Current awareness service, "Current Research @," which gives users access to citations, abstracts, and previews for the most-recent twelve months of dissertations from participating schools. Each institution receives a unique URL for its postings.
Consistent indexing and quality standards
One easy-ordering interface
The quoted subscription price for full text for Cornell is $34,590 for the first year because the file is not yet completeUMI is adding about 55,000 items per yearand $47,090 for the second year, based on 500 or more dissertations submitted per year and fifteen simultaneous users. A thirty-day free trial is available. A detailed description of UMI's digital dissertation services is available through its pilot site at <http://wwwinfo.umi.com/solutions/>
UMI may well be the choice of smaller institutions that can not easily implement their own publishing mechanisms. Indeed, even large institutions that see their current goal strictly as providing access to the scholarly record, might find UMI's vendor model preferable. Since UMI has only recently begun its digital service, its potential success in the marketplace is currently an imponderable.
Access twenty-four hours a day, seven days a week; text delivery is immediate (the average turnaround time for a shrink-wrapped paper copy from UMI is three to four days).
The dissertation is ready for the public as soon as it is mounted on the server; paper copies are not available until three to four months after conferral.
Files are easily searchable and indexed.
Authors can exercise more creativity, such as the use of interactive elements, hypertext, raw data files, virtual reality, and multimedia.
Students who create their own digital dissertations learn the basic skills of scholarly publishing in electonic format.
Storage space is saved in the library; no staff are needed for circulation and reshelving.
Library processing can be simplified through use of metadata mapped to MARC (author and title information, etc. is coded to correspond to fields in the bibliographic records), and records enhanced by the addition of the abstracts.
Cooperative ventures in publishing and archiving can save universities costs in the future.
The costs of implementation are substantial.
Access to technology is still limited, especially at smaller libraries. But ETDs can still be placed with UMI, allowing researchers the option of ordering them in print or microform.
Students are concerned that distributing dissertations on the Web constitutes prior publication and thus that they will not be able to publish them later in a journal. However, the content of a full dissertation is usually sufficiently different from what would appear as an article in a scholarly journal that most publishers would not consider the dissertation to be the same publication. Students who are still uneasy about publication can restrict access to the online version.
Some claim that dissertations would not be widely read, but this belief stems from the current situation, in which they are not easy to procure. Tech's server logs reveal that its most-popular dissertation in 1997 was accessed 9,920 times; its second most popular, 7,656 times. In that year Tech received an average of 685 requests per day.
Copyright legislation is still incomplete concerning electronic publications, and, until regulations become more definite, questions could arise about students' economic rights to their work.
Long-term archiving technology is still unsatisfactory.
Technical Assessment
The following is a preliminary assessment of the technology needed for the library to get up and running, serving ETDs either on its own or as a member of the NDLTD.
Server Requirements |
CUL Current Status |
Necessary to Acquire |
Cost |
|---|---|---|---|
Server1 |
Sun Sparc & Ultrasparc |
Sun w/ 200 Mhz processor & minimum 128M RAM |
$11,000 |
RAID disk space2 |
9Gb w/ single partition |
$5,000 |
|
9Gb disk drive |
Recommended |
$250 |
|
Tape drive for back-ups3 |
8mm SCSI tape drive for 170m tapes w/ compressed storage capacity of 40Gb |
$3,500 |
|
Web server4 |
Netscape Enterprise |
Probably need an added license |
? |
Perl (script)5 |
Relatively new version |
Freeware |
|
CGI.pm (script) |
Yes |
Freeware |
|
Dienst software6 |
Developed at Cornell |
Yes |
Free |
CD-ROM recorder7 |
Optional |
$1,000 |
|
Acrobat 3.0 (2) |
Yes |
$80 |
These estimated costs show that for about $20,800
the library could acquire the necessary hardware and software for serving
ETDs. These products are only suggestionscosts would differ if CUL
chose other software or equipment. Total costs of implementation, including
migration and digital and/or microfilm storage can only be calculated when
a particular institution has decided on its level of involvement.
1. Server. The server should have enough memory to process the work load, that is, to handle dozens of cgi posts and thousands of hits each day. The two library servers are now running at over 86% capacity, which means that CUL would have to purchase either more disk space or a new server. George Kozak believes a new server would be preferable, as it would be "cleaner." In addition, it could also be put to other library uses.
2. Disk space. This would give 40Gb storage. It needs to be scaleable, so that drives can be added as needed. A large drive (9Gb) with a single partition is ideal. Redundant drives for backups or mirroring are a plus.
3. Tape drive. Ideally the tape drive should be able to back up the entire system in a few hours and store not only the current contents of the collection, but a significant amount of added material as the collection grows. It makes sense to purchase this in the beginning, rather than add it on later.
4. Web server. The Web server should be robust, fairly easy to configure, and have a sizable knowledge base among the online community. The Netscape Enterprise Server allows wildcards to be used for security, making it possible to restrict access to all URLs that specify a particular type of material, for example, dissertations with publication restrictions for a period of time.
5. Scripts. UNIX flavors are the best supported.
6. Dienst software. The newest version (5) will have CUPID running native, allowing distributed printing over the Internet. The software should be available at no cost, since it was developed at Cornell.
7. CD-ROM recorder. The recorder is optionalfor annual database captures for archival storage.
8. Acrobat 3.0. To create PDF documents and for helping students who
are having problems.
Library Staffing
Programmer. .25 FTE. Tech estimates that its programmer spends .5 to 1 hour per day during nonpeak times on maintenance and development. During peak periods he or she may spend 8 hours per day on problems, development, and system improvements.
Student assistant. .25 FTE. Tech estimates that the student (who knows programming) spends a maximum of 2 hours per week during nonpeak times and up to 10 hours per week during the periodic peaks.
Knowledge of PDF and HTML desirable
Programming experience desirable
Librarian. .25 FTE (Tech,s estimate).
Supervise staff, draft policies, collaborate with system maintenance and developers, monitor workflow
Work with faculty, staff, students, departments, and colleges so they become familiar with the processing, accessing, and archiving of ETDs
Conduct workshops, write articles, participate in graduate student seminars, prepare handouts and Web pages, collaborate with other universities and libraries
Graduate School Staffing
Although it is not an immediate concern of the library, the Graduate School has only 1 FTE for receiving and approving the format of theses and dissertationsthe thesis coordinator. The Graduate School has long been very short on human resources, and any increase in the time required for approval of submissions in electronic format could not possibly be absorbed by that one position. That would need to be considered in enlisting the support of this important stakeholder.
Student Support
Optional, but recommended, is a facility with staff, software, and equipment to help students prepare their electronic documents. It might well be expanded as a general center of support for faculty and other academic electronic publishing projects as such activity increases.
Costs for the Facility
CD-ROM recorder |
$1,000 |
Scanner |
$5,000 |
LaserJet and ColorPrinter |
$3,000 |
Digital camera |
$500 |
VCR, DVD |
$1,200 |
Drawing tablet |
$600 |
Acrobat 3.0 (2) |
$80 |
Microsoft Word |
Site license |
Photoshop (2) |
$300 |
Printing possibilities would vary at the different CUL libraries because the ability to download, which is governed by individual library policy, affects printing choices. In O/K/U the public kiosks are blocked so that patrons cannot download from the Internet. In the past, when downloading was allowed, there were too many virus problems and tampering with the machines. In Mann, and perhaps in some unit libraries, downloading is possible because, the staff say, they have heavy virus protection on the machines and they just "deal with" the results of hacking.
There are two printing services that patrons could use for ETDs: Net-Print and CUPID. Both of these allow printing on demand, but only CUPID can print without downloading.
Net-Print is a for-fee laser printing service that has printers in Olin and Uris libraries ($.08 per page) and in various ATS (Academic Technology Services) computer labs and the Residence Hall Network ($.06 per page). For this service the charges must be billable to a bursar's account, either as a regular charge or cash in advance (i.e., only students or former students can use itfaculty and staff do not have accounts). Within a year it should allow departmental charges by faculty. Patrons can request the print job to appear on any printer on the network. A large dissertation, however, would outrun the paper capacity of many of the printers (although O/K/U's hold 500 pages). In addition, printing a large dissertation would tie up a public printer for considerable time (depending on the RAM available on the computer and the type of communications lines). Because patrons would have direct access to the Acrobat files in the NDLTD, they could initiate printing at a library kiosk. UMI's documents, however, because they are converted to FTP when a customer orders them, can't be viewed online and must be downloaded to the customer's machine before they can be opened and read for printing. O/K/U's patrons would thus not be able to print UMI dissertations via Net-Print under current policy.
CUPID is a protocol for high-end, distributed network printing that does not require downloading the target document. The software accesses the remote file and prints to a location that is designated in advance by the user. Thus CUPID could also provide printing from UMI (if it can access a PDF file) as well as from the NDLTD. In the case of dissertations it would be ideal because it can handle the larger jobs, there would be no tie-up of the public printers, and the user can also stipulate custom output such as double-sided printing, stapling, or binding ($.03 per page and $1 for a tape binding). It has the capacity to process billing and assess a usage or copyright fee. CUPID is intended to work with high-end printers such as the Docutech, of which Cornell has one at Media Services and two at the Print Shop on Judd Falls Road. Rich Marisa is also negotiating with Kinko's to link to Collegetown. In addition, it would be possible to enable a library printer, although it would have to be a fairly high-functioning machine. The next version of the Dienst server will have CUPID running native.
Archiving and preservation of original digital documents is a thorny issue. The Association of Research Libraries (ARL) standards for archiving digital materials at this time specify one of two formats: HTML or PDF. No one has come up with satisfactory provisions for handling multimedia.
UMI still considers microfilming to be its dependable archival medium. Tech runs a script (similar to a macro) that automatically generates an e-mail message to UMI whenever a new dissertation has been mounted, giving the URL, author, and title. UMI then downloads whenever it chooses, prints, microfilms, and adds the file to its own digital dabatase. Tech considers UMI to be its "emergency" backup, since it does not keep an archival paper copy itself. Michigan also intends to continue to deposit a copy of its dissertations with UMI, but more in the spirit of continuing to make its publications available to those institutions who may not be able to acquire digital material themselves.
In deciding on archival procedures, the following factors need to be considered:
Making and maintaining tape backups of ETDs should be standard operating procedure; however, magnetic media are not secure from data loss.
Keeping an archival paper copy, at least for an initial period, could ease the transition to electronic format for those stakeholders who would resist abandoning print completely.
Writing the dissertations to CD-ROMs might provide more-secure long-term preservation.
For long-term preservation, which brings up the question of technological obsolescence, one must think of not only the content but also the medium. The two common methods of digital preservation are refreshing and migration. Refreshing data (copying the content periodically, such as from an old tape to a new one) is not likely to solve the long-term problem. For complex files, critical functionality in the original file can eventually be lost. The technique of migration, however, holds more promise. It is possible to migrate older files through newer stages of technology when the new system has been designed to emulate the old so that it can accept the old files. Migration preserves the data and form of the original. Most current hardware and software designers take care to ensure that new versions will accept older files.
UMI assures that it will maintain the usability of digital files. The bibliographic utilities OCLC and RLG have begun to position themselves to provide digital storage services for members if a market develops. But the question remains: Who will pay for such a service? Will it be the members of the organization or the users? No one knows yet what new preservation techniques may develop or what the cost will be to ensure that ETDs remain intact and accessible. (Tech published documentation on its archiving policies just recently, too late to be discussed in this report.)
Current Costs
$110 Filing fee (of this, $9.50 is for binding; $35 for
microfilming; $45 for UMI storage)
$35 Copyright fee (optional for registration of
copyright)
varies 2 photocopies of dissertation on archival paper
$.05 per page at Olin Copy Center
$.065 per page at Media Services
$.07 per page at Day Hall Copy Center
Digital Submission Costs
$40 Copy of Acrobat (or the university supplies a site
license in the computer labs)
$35 Copyright fee (optional for registration of
copyright)
$4.25 For binding (if the Graduate School wants to keep an archival
print copy)
varies 1 photocopy of dissertation on archival paper (photocopy prices
as above)
?? Filing fee by Graduate School to cover paper work? To this could be added a small
Currently the Graduate School accepts only text-based dissertations. If a student requests permission to submit an electronic format, he or she is told that the basic dissertation must consist of written text and that extraneous pieces may be attached in the back.
Most students already have their dissertations completed on a word processor, and the conversion to PDF format with Acrobat can be learned in a one- to two-hour training session. Academic Technology Services (ATS) in CIT has developed a self-taught course, "Help Writing a Thesis or Dissertation in Microsoft Word 6.0" and a dissertation template that includes most of the necessary formatting. It would be helpful if ATS could be enlisted to prepare materials instructing students in digital formatting.
Tech's approach of mandating electronic submission all at once is an aggressive approach, which does not discount the value of an incremental approach. Even Tech provides exceptions to its regulations in cases of hardship.
Improved access to dissertations is likely to increase their value as resources,
and the imperative for electronic publishing at academic institutions is
coming from the traditional suppliers of those resources, their libraries.
Standards of formatting, archiving, and distributing are becoming the domain
of the libraries that are taking the initial steps.
For a successful library-driven electronic publication effort, CUL would have to be prepared to take on new responsibilities for its planning and coordination. As well, it might have to assume a public-relations role toward students and faculty to obtain their buy-in (unless this role is better taken by the Graduate School and a representative who is influential with the faculty), and collaborative efforts with the Graduate School and CIT. It would also need to secure the support of ATS to prepare training and instructional materials for the students.
Whether the library chooses to provide ETDs through the UMI service, by joining the NDLTD, or by publishing on its own, it will not be an immediate cost-saving venture for either the library or the universityit will be a service. Any immediate savings would be on the part of the students for photocopying and the filing fee, since the Graduate School currently recovers all its costs through the fee. It would also cut the cost to researchers because they could download the dissertations at no charge. The cost now for a shrink-wrapped paper copy from UMI is $24.50, and the turnaround time is three to four days.
Providing ETDs will also require increased staff time in terms of technical support from LTD and from the librarians involved. Establishing electronic publishing programs must be seen as a means of cost savings to the library in the future.
The archival copies of dissertations grow about 117 linear feet per year (35 shelves) in the Library Annex. In both Olin and Mann the circulating copies grow at about the same rate (36 and 32 linear feet per year, respectively)both estimate 10 shelves. That means that 19 remaining linear feet (about 5 shelves) of circulating copies are distributed among the unit libraries.
In the last four years the number of dissertations submitted annually at
Cornell has ranged from 500 to 565. According to studies done at Tech, the
following are the common characteristics of their digital dissertations:
median = 1M ; mean = 2M
a dissertation with images and other media can take 510M
a video can take up to twice the space
45% of Tech dissertations contain images; only 5% have video
Tech estimates that the disk space to store 1,000 dissertations costs less than $3,000 (but students are assessed a $20 storage charge as part of the filing fee). Future developments in compression technology may increase storage capacity significantly, further reducing the cost.
Interlibrary Loan
The Cornell Library lends to all institutions but charges $15 (cost recovery) to non-RLG members. There is a reciprocal agreement on lending and a program for shared resources for which Cornell pays as part of its RLG membership. The library is credited with $7 for each loan and debited $7 for each item borrowed. Usually Cornell lends more than it borrows. CUL keeps no statistics on the number of dissertations lent or borrowed, but if it had no copies to lend, that could affect its consortial agreements. It would have to consider that many of the libraries to which it now lends have no access to electronic formats.
Digital dissertations would change the method of processing considerably. At present the unbound archival copies are stored in the basement of Olin Library until the circulating copies have been microfilmed at Challenge Industries. After UMI has received and verified the contents of the microfilm, the library sends both copies to the commercial bindery (only seventy-five can be sent at one time), where they stay about a month. During this time the preliminary bibliographic records are created from a list supplied by the Graduate School. When the bound volumes return, cataloging is completed and they are physically processed. The library assistant in Central Technical Services spends approximately ten weeks a year (.2 FTE) tracking and processing the items.
For an ETD, the submission form filled in by the student could be designed so that the metadata map directly to a MARC record. The library assistant could access the online file and transfer (if a script could be developed) or cut and paste the data directly into a bibliographic record. Students would be encouraged to include keywords with their submission data so that fields could be generated on the record to create subject access. Currently the library assistant does not supply subject headings, so this feature would give added value to the record. In addition, the abstract could be included in the catalog record. There would no longer be a circulation copy to security strip, bookmark, or shelve; the entire process would be streamlined.
The electronic dissertation activity should be seen as one facet of the
university's electronic publishing plan. Close coordination between the Graduate
School, the colleges, and the library is essential to ensure the reliability,
longevity, and success of this enterprise.
1. Cornell Contacts
Gould Colman, University Archivist, retired
Susan Currie, Access Services
Elaine Engst, Rare and Manuscripts Collections
Minnie Empson, Graduate School
George Kozak, LTD
Carl Lagoze, Computer Science
Richard Marisa, CIT
David Munden, CTS
John Saylor, Engineering/Physical Sciences
Ron Watkins, Graduate School
Steve Worona, CIT
2. Standard Electronic Formats
Agreement on the standard representations for ETDs
is particularly important, for fewer authorized formats mean more-widespread
compatibility and access, as well as fewer difficulties in developing effective
archiving techniques. At this time the most popular format for ETDs is portable
document format (PDF), although a number of publishers believe that SGML
would be more flexible in time.
Adobe Acrobat. Acrobat (PDF files) is software that allows documents created on a word processor to be made available on the Web. PDF files retain all the formatting and graphics of the original WordPerfect or Microsoft Word product. Larger and more-complex documents, especially in physics or mathematics, are often produced in LaTeX or TeX format, which use PostScript fonts because they are useful for rendering mathematical formulae. These can be converted to PDF files by using Adobe Distiller. Documents can be downloaded and read using the Acrobat Reader, obtainable at no charge on the Web from Adobe. PDF is a widely used format in electronic publishing.
SGML. Standard Generalized Markup Language, which uses tags to embed
formatting codes in documents, has great potential because it is platform
independent, converts easily to other modes of presentation, and would handle
multimedia better than PDF. The files are automatically converted to HTML
when accessed via the Web.
One of the distinctive, and most useful, characteristics of this language
is its application of the concept of document type definition (DTD),
which defines particular structural units for a particular type of document.
For example, a novel has an author, title, text as part of its DTDif
it doesn't, it's not a novel to the computer. Likewise, a dissertation has
an author, title, statement of degree-granting institution, abstract,
etc.if it doesn't, it's not a dissertation. This definition of a set
of tags for a particular document type specifies exactly what the document
will look like. The process is of course much more sophisticated than this
simple example.
Therein lies one difficulty in using SGML: there is not yet a completely
reliable DTD for dissertations. Tech has developed one that it says is basically
functional, but no template has been devised yet that would make it easily
applicable by students. Michigan is concentrating on applying the most-common
DTD (Text Encoding Initiative, or TEI) in a specific way to dissertations.
Michigan has had notable success with SGML and TEI publishing initiatives
over the past seven or eight years and is experienced and comfortable with
these applications. It is very committed to its project.
Probably the greatest disadvantage to SGML is its complexity. Even with a
suitable DTD, it would remain difficult for most students to apply themselves,
as there is no simple editing/authoring tool for turning a word processor
document into SGML. A very recent development, XML (Extensible Markup Language),
which is a simplified form of SGML, is becoming popular and may help to provide
a solution to the difficulty.
Back to StaffWeb Index
rev. 10/13/98 dih