|
|
A Demonstration Project 1994-1996
National Endowment for the Humanities PS-20781-94
Principal Investigator
Department of Preservation and Conservation Ithaca, NY 14853
TABLE OF CONTENTS INTRODUCTION THE
CORNELL AND YALE STUDIES SUMMARY
FINDINGS OF THE CORNELL PROJECT BACKGROUND ON THE CORNELL DIGITAL TO MICROFILM PROJECT FINDINGS
AND RECOMMENDATIONS: QUALITY FINDINGS
AND RECOMMENDATIONS: COST FINDINGS
AND RECOMMENDATIONS: PROCESS Appendices:
ACKNOWLEDGMENTS There are many individuals who contributed to the success of this project. It was truly a collaborative one, involving three divisions within Cornell, the imaging service bureau, Image Graphics Recording Center, and the Technical Advisory Committee. Cornell is particularly grateful to the National Endowment for the Humanities for its financial support and faith in our ability to conduct this important research and demonstration project. Five nationally recognized experts in the fields of imaging science, micrographics, and standards development comprised the Technical Advisory Committee to the project. The advisory committee members included: Paul Conway (Head, Preservation Department, Yale University and Principal Investigator of Project Open Book); Nancy Elkington (Assistant Director, Preservation Services, Research Libraries Group and the guru of preservation microfilming standards); Michael Lesk (Division Manager, Computer Sciences Research, Bellcore, who had served as a technical advisor to Mann Library in the CORE digital conversion project); Don Williams (Senior Image Engineer, Eastman Kodak Research Laboratories, who chaired the AIIM Committee that developed the technical report, TR-26-1993, Resolution as it Relates to Photographic and Electronic Imaging); and Don Willis (then Vice President of Electronic Publishing, INET, who had authored the influential publication, A Hybrid Systems Approach to Preservation of Printed Materials). The committee met at the beginning of the project when they traveled to Cornell to review responses to the RFP, to inspect sample COM and affirm its potential viability, and to select a vendor. The committee met for a second time at the end of the project at Yale University where they were able to compare and contrast the findings from the Digital-to-Microfilm Project and Yale’s Project Open Book. Many individuals at Cornell participated in this project: The Department of Preservation and Conservation served as the principal host, responsible for book scanning and project coordination. John Dean, Director of the Department, and Barbara Berger, Preservation Reformatting Librarian, provided invaluable administrative guidance. Five scanning technicians participated in this project over the course of two and a half years: Michael Friedman, Tami Williams, Tom Tierney, Mary Moon, and Allen Quirk. Marti Hanson (Syracuse University) and Steve Chapman (Harvard University) both served as preservation interns in the Department during the first year of this project, and were responsible for developing the Request for Proposal for Computer Output Microfilm (COM) recording services. Steve continued in the department beyond his internship, and played a significant role as liaison between the scanning team, Mann Library staff, and our COM provider, Image Graphics. He also coordinated the data gathering and initial analysis for the cost study component. The participation from the Albert R. Mann Library, the agriculture and life sciences library at Cornell, was substantial. Jan Olsen (Director), Sam Demas (Head, Collection Development and Preservation), Rich Entlich (Preservation Librarian), Marjorie Proctor (Preservation Manager), Stephanie Lamson (Preservation Assistant), Eniko Farkas (Conservation Technician), and Janet McCue (Head, Technical Services) were responsible for the discipline-based selection of the core agricultural historical literature included in the project; for the preparation of the volumes for scanning and the quality control of the Computer Output Microfilm as well as the paper facsimiles produced directly from the digital files; for providing guidance in the development of procedures and recommended practices; for the investigation into copyright clearance; and for the cataloging of the digital files and the resulting COM. Mann Library has also committed to making the digital files from this project accessible via a Web-based user system, and Ted Wong is to be thanked for his efforts in this area. Cornell Information Technology provided technical oversight for the project. Steve Worona (Assistant to the Vice President for Information Technology) served as the technical coordinator, and the following programmer/analysts, Sal Gurnani, Dave Fielding, Bill Fenwick, George Kozak, and Pela Varodoglu, contributed systems support and programming applications that enabled us to arrange and package the digital images for COM production and for the storage of the digital files. The Image Graphics Recording Center (IGRC) of Shelton, Connecticut served as the sole COM producer for this project. IGRC was very helpful in working with Cornell staff to undertake this research and demonstration project. We are particularly grateful for the support of Michael Beno (Customer Service Manager), Jeff Driscoll (Sales Representative), and Putnam Morgan (Marketing Manager). The staff of Yale University’s Project Open Book, and especially Paul Conway and Bob Halloran, were extremely cooperative in working with Cornell to make quality, cost, and process comparisons between the two projects. Yale staff graciously scanned microfilm and COM for Cornell, hosted the second meeting of the Technical Advisory Committee as well as several visits from Cornell staff, and provided process/data work forms and preliminary cost figures at timely intervals.
INTRODUCTION Digital technology holds great promise for the world’s research libraries, for it could revolutionize how we capture, store, preserve, and access information. From the preservation perspective, digital technology offers important reformatting advantages over photocopy and microfilm, including its capability to create a higher quality reproduction of a deteriorating original, the ability to reproduce digital images over and over again with no loss of image quality, great flexibility in terms of output and distribution, and potential cost savings associated with storage and distribution. Most important, digital technology offers unprecedented opportunities for access and use, since it could facilitate the expansion of scholarship by providing timely, distributed access to a variety of sources from a variety of locations. Although the advantages to digital technology for preservation reformatting and access enhancement are numerous, there are drawbacks as well. These center on the obsolescence associated with the rapid changes occurring in the development of hardware/software system design, a lack of experience on the part of institutions and service bureaus with digital imaging for preservation, and issues of permanency and standards. Digital technology has the potential to redefine preservation reformatting, but until the concerns associated with maintaining long-term accessibility to material stored in digital image form can be resolved, many libraries and archives are loath to initiate digital projects beyond the pilot phase. (1) In 1992, the Commission on Preservation and Access published a highly influential report by Don Willis, entitled A Hybrid Systems Approach to Preservation of Printed Materials. In this report, Willis argued convincingly for the creation of microfilm for preservation and digital images for access. He discussed the various options for creating both film and digital files, noting the advantages and tradeoffs associated with filming first and scanning from the film, or scanning first and creating computer output microfilm (COM) from the digital files. Willis predicted that the costs of producing both microfilm and digital images would be roughly the same in either approach, and that a hybrid system could serve as a viable preservation strategy until research institutions developed and implemented digital preservation programs. In the event that the digital master were to become unreadable, the microfilm (or COM) could be scanned to regenerate the digital copy (presumably at lower costs than the original capture process). The real issue, Willis concluded, would be determining the circumstances under which the "film first" approach or the "scan first" approach should be pursued. THE CORNELL AND YALE STUDIES It may seem ironic that microfilm, which has become the principal means for preserving information endangered by the "slow fires" of acidic paper, could become an important legacy measure for coping with the "fast fires" of digital obsolescence. Nonetheless, in 1994 the National Endowment for the Humanities funded two important and complementary projects, designed to test and evaluate the interrelationship between microfilm and digital imagery. NEH supported the production phase of Yale University’s Project Open Book, a comprehensive feasibility study on the digital conversion of microfilmed library materials. In partnership with the Xerox Corporation, Yale built a networked, multi-workstation conversion system to convert 2,000 microfilmed books to digital image files (representing 430,000 images). These books, chosen from the fields of American history, Spanish history, and the history of communism, socialism, and fascism, had been microfilmed in the late 1980s according to standards adopted by the Research Libraries Group, Inc. (2) Project Open Book studied the means, costs, and benefits of such an approach. The results of that project are summarized in Paul Conway’s final project report to NEH. (3) CORNELL'S DIGITAL TO MICROFILM CONVERSION PROJECT Cornell conducted a two and a half year demonstration project to test and evaluate the use of high resolution bitonal (1-bit, black and white) imaging to produce computer output microfilm (COM) that could meet national preservation standards for quality and permanence. (4) In the course of the project, 1270 volumes and accompanying targets (representing 450,000 images) were scanned and recorded onto 177 reels of film. The volumes selected for the project represented core holdings in 19th and 20th century agricultural history. All paper scanning was conducted in-house, and Cornell contracted the COM production to Image Graphics, Inc. of Shelton, Connecticut. With the assistance of a Technical Advisory Committee of outside experts (see acknowledgments), the project led to an assessment of quality, process, and costs, and to the development of recommendations for the creation and inspection of preservation quality microfilm produced from digital imagery. Both Cornell and Yale recognized the significance and complementary nature of each other's projects. The projects had in common:
These two projects benefit the larger preservation community as it seeks to understand the circumstances under which scanning first or filming first is most appropriate in achieving the twin goals of preservation and enhanced access through the use of digital technology. SUMMARY FINDINGS OF THE CORNELL PROJECT The following findings and recommendations have been reached as a result of the project:
Recommendation:
Recommendation:
Recommendation: BACKGROUND ON THE CORNELL DIGITAL TO MICROFILM PROJECT Since 1990, Cornell University has advocated the use of 600 dpi 1-bit scanning to capture the informational content of 19th and 20th century brittle books. This position is based on the use of a digital Quality Index approach to benchmarking resolution requirements; to an extensive assessment of common printer’s type sizes used by publishers from 1850-1950; and to visual inspection of digital facsimiles produced from over 100 different type fonts (including Roman and non-Roman script) used during this period. Until the mid-twentieth century, commercial books were produced using metal type, which had a tendency to spread with large print runs, so printers were limited to how small or closely spaced letters could be. All common typefaces used during this period were produced at 5 or 6 point type and above. Six hundred dpi 1-bit scanning adequately captures the fine detail, elaborate serifed script, italics, and small body heights that characterize these fonts. Cornell, therefore, concluded that 600 dpi 1-bit scanning was sufficient to capture fully the textual monochrome information contained in virtually all books published during the period of paper’s greatest brittleness. (5) These findings have been confirmed through quality inspection of over a million pages scanned in-house in the Preservation Department. In 1993, Cornell conducted a preliminary test to record digital files for one brittle book onto computer output microfilm. This test led Cornell to conclude that COM could be produced from these digital files to meet ANSI/AIIM standards for image quality. With funding from NEH the following year, Cornell sought to evaluate the feasibility of producing preservation quality COM for a significant volume of brittle books. A Request for Proposal (RFP) for the COM production was developed and distributed to 27 service bureaus by June 1994. (6) Of those, 14 expressed an interest in the project. Each was asked to prepare a sample roll of film from Cornell-produced digital image files for 5 books representing the range of material to be converted in this project. A number of the vendors could meet all requirements, excepting the need to produce film on 35 mm format. Most companies produce COM on 16 mm or 105 mm film. Others were able to record onto 35mm film, but could not handle the 600 dpi image files or the small reduction ratios. (7) In the end, the number of vendors who could actually perform the work as specified was very small, and only one company submitted a response that the Technical Advisory Committee would approve. In August 1994, this committee of nationally recognized experts in the fields of imaging science, micrographics, and standards development met at Cornell. They reviewed the responses to the RFP, inspected sample COM and affirmed its potential viability, and selected a vendor based on the overall quality of the proposal, technical capabilities, quality control measures, price, and consumer/vendor relations. Cornell awarded the COM production contract to Image Graphics Incorporated of Shelton, Connecticut (http://www.igraph.com/) in September 1994. Mann Library staff members prepared the 1,270 books chosen for this project and also assumed responsibility for local inspection of the COM. (See Appendix I for staff procedures for book preparation and COM inspection.) The actual scanning of the brittle books occurred in-house, in the Department of Preservation and Conservation, using the Xerox Document on Demand R2.x system (XDOD), though prior configurations of the prototype Xerox CLASS system were used in the early stages of the project. The pages were captured as a collection of TIFF files (with accompanying targets), compressed prior to storage using CCITT Group IV compression, then paginated and structured using the XDOD software. Following scanning, all images were sent to a Xerox Docutech printer to create printouts that supported two functions: quality control of the scanning and creation of facsimile volumes to replace the embrittled, disbound originals. Once the digital files had passed inspection, the XDOD images and accompanying metadata were exported from the proprietary Xerox database structure based on the ODA standard (Open Document Architecture) that is not in common use in the United States, to an open Cornell Digital Library format. Copies of the book images and accompanying targets were reel programmed, using a Cornell-designed "tape generation program," created to accommodate the requirements of the COM recorder used at Image Graphics. The files were then quality checked and read out to Exabyte tapes for shipping to Image Graphics. (See Appendix II for information on the reel programming). A relatively new division of Image Graphics, the Image Graphics Recording Center, carried out the COM conversion services. Using the Micrographics EBR System 3000 electron beam recorder manufactured by Image Graphics, the Center recorded the digital images directly from 8 mm Exabyte tapes to 35mm Kodak Image Link HQ microfilm. Electron beam recorders tend to offer better resolution, speed, and dynamic range than other COM recorders (utilizing laser and CRT technology). The electron beam produces a smaller spot (4-6 microns) and is capable of supporting higher resolutions. The equipment also has fewer moving parts and greater flexibility (e.g., multiple film formats). The EBR software controls the reduction ratio, density, image placement, and required spacing (between images, frames, and volumes) on each roll of film. Although the EBR is capable of processing digital data up to 1000 dpi at 24x, the specifications for Cornell were set to 600 dpi, with variable reduction ratios ranging from 6x to 10x. With these EBR settings, Image Graphics reported recording speeds of less than 4 seconds per page. The Center also assumed responsibility for the initial quality control inspection on a PEPCO MFI-Type R Microfiche Inspector for roll film; conducted density readings using a TD 504, 2mm densitometer; and submitted sample film for third party testing for the presence of residual thiosulfate (methylene blue tests). Cornell conducted a visual inspection of all microfilm on a light box under magnification, took resolution readings and density checks, and returned to IGRC any film that failed to pass inspection. (See Appendix III for information on Image Graphics and Micrographics EBR System 3000.) The scanning of 450,000 images was completed in June 1996, and the last of the COM produced by August 1996. The members of the Technical Advisory Committee (TAC) met for a second time in September at Yale University where they were able to compare and contrast the findings from the Cornell and Yale projects. The TAC reached a number of findings and recommendations at this meeting. FINDINGS AND RECOMMENDATIONS: QUALITY QUALITY FINDING NO. 1: 35MM COMPUTER OUTPUT MICROFILM CREATED FROM 600 DPI 1-BIT IMAGES SCANNED FROM BRITTLE BOOKS CAN MEET OR EXCEED ANSI/AIIM STANDARDS FOR IMAGE QUALITY AND PERMANENCE. The ANSI/AIIM standards cover a wide range of issues, including: the preparation of documents; composition of the film stock; quality of image capture as defined by reduction ratio, image placement, resolution, and density; film processing; and storage. Although there were a number of technical and procedural problems encountered in the process, Cornell’s inspection of the COM revealed a body of film that was of high quality, both in its overall consistency and faithful representation of text, line art, and halftones. The COM compared favorably—and in some cases exceeded—the quality of film produced via traditional, high contrast processes, particularly in the rendering of halftones. Throughout the course of this project, 18 of 177 reels (10%) failed to pass inspection. The majority of problems were encountered in the first 50 reels, and once these problems were resolved, the reject rate was remarkably low (4%). All reels eventually passed inspection as the functional equivalents to standard preservation microfilm.
QUALITY FINDING NO. 2: NO DETECTABLE LOSS OF RESOLUTION WAS OBSERVED IN RECORDING THE DIGITAL IMAGES ONTO COM. Having determined that 600 dpi bitonal scanning could produce digital files that faithfully rendered all textual information contained in brittle books, Cornell was interested in determining whether there was any loss of detail in recording those files onto COM. Cornell used three resolution test targets during scanning to evaluate scanner performance and recorded those targets onto the COM. The targets included were the RIT Alphanumeric Test Object superimposed on the IEEE Std 167-A-1987, and AIIM Scanner Test Chart #2. The RIT target, which consists of block characters and numbers represented in two directions, was judged to be the most useful target for measuring the effective resolution achieved on the COM. (9) Cornell staff also conducted subjective evaluation of the COM rendering of the smallest lower-case "e" contained in a volume, using the ANSI/AIIM Quality Index rating for microfilm inspection. We visually inspected the COM on a light box under 100x magnification. In all cases, the images met the "high quality" standard for Quality Index (8.0) in the rendering of the smallest "e" and RIT target readings on the COM ranged from line 8 through line 15, which proved identical to those read on-screen during quality control of the digital images. (10) Cornell did not discern any drop in resolution from the digital images to the microfilm copy. Given the capabilities of the COM recording device, the Electron Beam Recorder from Image Graphics, to record extremely fine resolution with excellent image acuity, virtually all of the information in the 600 dpi 1-bit images could be represented on the 35mm microfilm at reduction ratios between 5x and 10x. This is in sharp contrast to other forms of copying, where one can expect image degradation of 10% or greater when reformatting from one medium to the next or one generation to the next. To measure the COM's ability to duplicate halftones, we inspected the AIIM target with a 10x loupe to determine the number of distinguishable halftone wedges, and recorded the appropriate rating. Staff noted wide variation on the AIIM target readings, from 110 to 100, 85, 65, and even 0. Investigating the cause of these inconsistencies, staff inspected the paper facsimiles corresponding to the titles with low AIIM readings. It was discovered that those titles had consistently low levels of contrast between the text and background, with light text or darkened paper or both. Staff also inspected the quality of the halftones in those facsimiles, and closely examined detail in the same halftones appearing on the COM. It appeared that the illustrations in the facsimiles and the COM were represented as well or better than one would expect from a traditional camera producing high contrast preservation microfilm. Cornell concluded that a consistent AIIM target reading was not an effective measure of the COM's ability to faithfully duplicate halftones when the scanner had been set to optimize capture for volumes with low contrast. Cornell recommends further inquiry into ways to measure the ability of a COM recorder to accurately duplicate halftones. One possibility would be to define a distinct range of acceptable AIIM target readings for high-contrast, medium-contrast, and low-contrast material. QUALITY FINDING NO. 3: THE QUALITY OF DIGITAL IMAGES CREATED AT THE SAME RESOLUTION AND BIT DEPTH WILL BE SUPERIOR WHEN BRITTLE BOOKS ARE SCANNED FROM PAPER RATHER THAN FROM MICROFILM COPIES. With the assistance of Yale University, Cornell conducted a comparison of the quality of digital files produced from scanning paper versus film in bitonal mode. Challenge Industries, an Ithaca-based firm that produces preservation quality 35mm microfilm, filmed the same five books that had been scanned as the sample test for COM production in the RFP process. This roll of conventional negative film was sent to Yale, which produced a 2N copy and scanned both versions using the Mekel M400 scanner. The TIFF files created in this film-to-digital process were saved to optical disk, and printouts from these digital files, as well as those scanned directly from the paper originals, were produced at Cornell on the Xerox Docutech at 600 dpi resolution. Staff conducted an on-screen, side-by-side evaluation of the two digital files at full pixel display (100%) and compared the images to the original books. They also compared the printouts. Finally, the two sets of images were processed through the Xerox TextBridge 2.0 Optical Character Recognition (OCR) program. The OCR process resulted in 100% and 99.3% text accuracy for the paper-scanned versions of the two pages. The microfilm-scanned versions resulted in text accuracy rates of 98.7% and 99% respectively for the same two pages. These differences are slight, but if one were interested in creating text files from digital images, they suggest that direct scanning from paper may prove more accurate and less expensive than OCRing digital images scanned from microfilm. Figures 1 through 4 reflect the difference in the quality of recording text, fine detail, and halftone information from paper and film. The difference in the presentation of text-based information is most evident in the "thickening" of stroke widths on images scanned from the film. Some smaller hand-produced characters, measuring 0.6mm and 0.4mm, are inadequately rendered in digital images from film, while faithfully represented in the digital version scanned from paper ( Figures 1 and 2). The most obvious difference is seen in the production of the halftones ( Figures 3 and 4). High contrast microfilm, and current bitonal film scanners, cannot do justice to many halftones, as these following illustrations indicate. The version created directly from paper, however, retains much of the detail present in the original book. In this case, the halftone had received special "windowing" and processing on the XDOD scanner utilizing a descreening and rescreening filter to capture the halftone while treating the text portion of the page as text/lineart. (11) No such enhancement capabilities exist for conventional high contrast microfilming or are yet available in film scanners. Yale’s study confirmed these findings. "Scanning from the original, if permitted by the condition of the original and its size," Paul Conway noted in the final report, "will almost always produce better quality results than scanning from a microfilm intermediary." (12) Yale’s goal in Project Open Book was to aim for legibility and to determine what quality could be produced in a production environment where costs could be minimized. Cornell’s aim was slightly different: to create digitized images of sufficient quality in order to produce COM that could serve as the functional equivalent of preservation microfilm. QUALITY FINDING NO. 4: COM CAN BE SCANNED TO REPRODUCE HIGH QUALITY DIGITAL IMAGES IN THE EVENT THAT THE ORIGINAL DIGITAL FILES BECOME UNREADABLE. Staff at Yale University graciously scanned samples of the COM using the Mekel M400 film scanner. The quality of the digital images converted from COM was comparable to the quality of digital images produced from conventional microfilm, especially for printed text and line art, although there were aliasing and moire patterns introduced in the reproduction of some halftone information. Conventional microfilm suffers from similar problems when scanned in bitonal mode. (13) As noted above, bitonal film scanners currently do not have the same enhancement capabilities that are available on flatbed scanners, especially the XDOD scanner used in the COM project, although work in this area is underway. (14) An alternative approach may be to use grayscale film scanning for microfilm or COM that contains significant halftone, photographic, or fine lineart information. Preliminary tests utilizing grayscale film scanning offer promising results but the resulting file sizes and costs are likely to be significantly higher in the near term than those incurred with bitonal scanning (see process section). Given its consistent image size, placement, frame spacing, and density, the use of COM can expedite the rescanning process, which suggests that film quality and consistency may have a great impact on the costs of conversion to create suitable digital images from film. (15) RECOMMENDATION: STANDARDS FOR COM PRODUCTION AND INSPECTION MUST BE DEVELOPED AND ADHERED TO BY INSTITUTIONS AND SERVICE BUREAUS ALIKE. THE TECHNICAL ADVISORY COMMITTEE TO THE PROJECT RECOMMENDS THAT QUALITY STANDARDS FOR DIGITAL IMAGING OF PAPER SOURCE DOCUMENTS BE DEVELOPED, AND THAT MODIFICATIONS BE MADE TO THE STANDARD MICROFILM QUALITY CONTROL PRACTICES FOR EVALUATING DENSITY, RESOLUTION, REDUCTION RATIOS, TARGETS, FILM SIZE, AND BIBLIOGRAPHIC COMPLETENESS OF COM. Although COM can meet preservation microfilm standards, procedures for production and inspection of the COM will differ from those appropriate to conventional microfilm. Significant changes in film creation and quality control are introduced in COM recording. Images are generated digitally, not photographically, and factors affecting image quality, such as resolution and density, are made up stream—at the point of scanning—and not at the point of filming. This has significant ramifications for final film inspection. The quality of the resulting COM will in large measure be determined by the quality of the initial scanning, not the film recording. It is imperative, therefore, that adequate settings (e.g., resolution and bit depth) be established and used to capture fully the significant information contained in the source documents, and that a rigorous scanning quality control process be instituted, with visual inspection occurring both on-screen and via printouts from the digital images. Although there are currently no formal standards governing quality for digital imaging, work on this front is occurring. The Research Libraries Group, Inc. has established a working group on digital image capture requirements, and a report on their findings is expected by July 1998. A number of institutions and organizations have produced internal guidelines for digital image capture. (16) These efforts should be assessed by the broader preservation community so as to develop quality standards for digital imaging of paper source documents. Rigorous quality control processes should also be established. The Association for Information and Image Management has published guidelines for quality control of image scanners, which include information on the use of targets, and Cornell’s Department of Preservation has published recommendations on verifying image quality. (17) In reviewing the findings on image quality and COM inspection from this project, the Technical Advisory Committee recommends that the following modifications be made to the technical and bibliographic inspection procedures for preservation microfilming, as defined in the RLG Preservation Microfilming Manual: (18)
Bibliographic Characteristics:
FINDINGS AND RECOMMENDATIONS: COST COST FINDING NO. 1: IN A HYBRID PROGRAM TO CREATE BOTH MICROFILM AND DIGITAL IMAGES, THE COSTS ASSOCIATED WITH THE SCAN FIRST APPROACH APPEAR TO BE LESS THAN THOSE INCURRED IN THE FILM FIRST APPROACH. IF EXTANT FILM IS SCANNED, AS WAS THE CASE IN PROJECT OPEN BOOK, THEN THE COSTS FAVOR THE FILM FIRST APPROACH. IF ONLY DIGITAL IMAGES ARE TO BE PRODUCED, THE COSTS OF SCANNING FROM PAPER VERSES FILM ARE COMPARABLE. HOWEVER, THE COST FIGURES PRODUCED BY BOTH CORNELL AND YALE REFLECT THE NATURE OF DEMONSTRATION PROJECTS RATHER THAN FULL PRODUCTION PROCESSES. Yale conducted a very extensive cost study, the approach and findings of which are presented in its final report to the National Endowment for the Humanities. Early in both projects, Cornell and Yale agreed to collect data on the primary sub-processes of digital conversion. Yale took the lead in establishing a cost study model, and gathered time and cost statistics for the first 600 volumes scanned from microfilm. Yale also calculated the costs of equipment purchase, lease, maintenance, and replacement. Cornell undertook a more modest data gathering effort in March and July 1995 during a typical production phase of the project. Data was collected in the following categories: preparation, scanning, file management, tape creation, and COM inspection. These categories roughly correspond to the categories used in the Yale Cost Study. For comparison purposes to Yale’s Project Open Book, Cornell calculated a "Yale adjusted" mean time to reflect the difference in the average size of books scanned at Yale (216 pages) and Cornell (341 pages). Cornell also calculated costs based on Yale’s combined hourly wage/benefits rate of $15.38 ($0.2563/minute). Equipment and maintenance costs were calculated for the XDOD scanning system only even though the first 600 volumes were scanned using the prototype CLASS system. Cornell also calculated equipment costs for file management but did not calculate equipment costs associated with the production of COM, which are reflected in the per frame charges. Cornell used Yale’s method of equipment calculations for purchase, maintenance, and replacement. The equipment costs reported here and in Project Open Book document costs for a specific project conducted during a specific period and may not be generalizable to other projects or to the current costs of comparable equipment. The costs of COM production were based on a fixed project rate negotiated with Image Graphic, plus additional charges incurred for shipping and one-up recording of certain targets and page fold outs. The costs associated with project management, systems programming, facilities, and equipment down time due to the conversion from CLASS to XDOD scanners were not calculated, as these were seen as specific to the early ramp up phase of the project. (Yale did not record this kind of information either.) Finally, Cornell included typical costs associated with the creation of preservation quality microfilm in order to provide comparative data on an end-to-end hybrid process for a scan first versus a film first approach. Five tables are presented here that detail the findings from the Cornell cost study and offer comparisons to Yale’s findings in Project Open Book. Table 1 presents time and cost figures associated with the labor to collate, scan, index, and prepare digital files, as well as to inspect COM (the charges for COM recording are reflected here as well). The time figures were based on the number of volumes that could be processed at the various stages during a specific period. For instance, preparation figures are based on a sample of 150 volumes; scanning on a sample of 45 volumes; reel programming on 21 reels containing 120 volumes; and COM inspection on 11 reels, containing 4-10 volumes each. (20) Table 1 indicates that the labor cost in an-end-to end process to create microfilm from digital imagery averaged around $.30/image in this project. TABLE 1: CORNELL DIGITAL TO MICROFILM CONVERSION PROCESSES: TIME AND COSTS
Table 1 Notes: 1) Mean time. The mean time for each process has been calculated, although it is worth noting that the difference between the high and low figures in each process can be significant. Mean time (Yale adjusted) was calculated based on the difference in the average number of pages/volume in the Cornell and Yale studies. Cornell volumes averaged 341 pages; Yale volumes averaged 216 pages (63.3% of the size of the Cornell volumes). When volume size was relevant, the Yale adjusted mean times were used. 2) Process Cost. Again for comparison purposes, the translation of time spent to cost was based on the Yale combined hourly wage/benefits rate of $15.38 per hour or $0.2563 per minute. (See Conway, Appendix 7, "Cost Study Model and Principal Data.") The per image costs are calculated by dividing the per book costs by 216, the average page length of books scanned at Yale. 3) Preparation. Cornell recorded times for activities distinctive to preparing volumes for scanning (as opposed to microfilming). Each of the volumes had to be disbound and the binder’s margin trimmed for scanning on the XDOD. The structuring/indexing information was put on a work form that accompanied each volume. Reel programming was done at this stage as well, but could have been handled by Image Graphics at the point of COM recording. See Section on Scanning Preparation. The basic preparation costs common to both microfilming and scanning (e.g., selection, retrieval, collation, target preparation, etc.) were derived from times reported by Patti McClung in her landmark study "Costs Associated with Preservation Microfilming." Library Resources & Technical Services (Oct/Dec 1986, p. 363-374). 4) Scanning. Includes the costs for capturing technical and bibliographic targets, book set up, quality control, and two forms of scanning. The first form of scanning was done in an auto-mode in which standard settings were used to capture all pages of the volume. The second form of scanning, "manual mode," involved windowing halftone information on a page and treating it differently than the surrounding text. Only books containing halftones that were considered significant to the meaning of the text received such treatment. Mann Library staff devised guidelines for determining which books would receive such treatment (See Appendix I on Scanning Preparation). 5) Indexing. The proprietary Xerox Documents on Demand software was utilized to "structure" and paginate each book. Cornell used two hierarchical levels of tags. The base-level tag was used to paginate a book—matching image file names with the page numbers that appeared in the original book (including Roman numerals, no pagination, duplicate page numbers). The second level tag clusters the pages into groups, such as title page, table of contents, text, index, back matter. Appendix I contains an example of the structuring tags applied to a typical book in this project, and to the list of standard terms used to refer to these units. 6) File Management. Activities associated with this step involve setting up for batch move of image files and moving the relevant targets. 7) Tape Creation. Includes creating work form, moving images to staging area, preparing and running tape creation script and log file for quality control, and running the tape generation script. See Appendix II on Reel Programming. 8) COM Production. Cornell held a contract with Image Graphics to record the digital images on to COM at $0.09/image. Additional costs included shipment of the completed COM and costs associated with one-up imaging of certain targets and foldouts. The per image cost therefore totaled $0.96. 9) COM Inspection. Staff recorded the time spent in performing density and resolution checks as well as the visual inspection over the light box. Table 2 details the costs of hardware, software, maintenance, network connection, and optical storage media for the paper scanning and reel programming to create COM. Costs associated with equipment replacement are based on the Yale model that assumes an additional 50 percent replacement surcharge for increased functionality. The equipment costs are then calculated on a per book and per image basis, using the adjusted Yale figures for a 216 page book. The costs of COM recording equipment are subsumed under the per frame charge, reported in Table 1. The bottom line is that the average per image cost for equipment ranges from 6.6 cents to 8.1 cents. TABLE 2: CORNELL ANNUAL EQUIPMENT COSTS
Table 2 Notes: 1. Per book costs are based upon estimates of scanning production times for a year, for a single daily shift. A scanning technician works 215 days/year x 7.3 hours/day = 1,570 hours/year. The actual scanning production time is estimated at 75% of capacity, allowing for some down time, meetings, phone calls, etc. We assumed, therefore, full production capacity to be 1177.5 hours/year. A book scanned and indexed in auto mode takes 1.078 hours (64.7 minutes), so 1,092 books/year can be scanned in auto mode. A book scanned and indexed in manual mode takes 1.36 hours (81.8 minutes), so 864 books/year can be scanned in manual mode. 2. Per image costs are calculated by dividing the per book costs by 216, the average number of pages in books scanned at Yale. 3. Equipment. Cornell used Yale’s methods for calculating equipment costs. A system is assumed to have a five-year life, so the annual equipment cost is one-fifth the purchase price. The XDOD system, including scanner, optical storage subsystem, computer, monitor, and software came to $26,930, with a partnership discount of $10,000. The scanning equipment costs would be 37% higher without this discount. Annual charges for maintenance agreements reflect actual costs of the Xerox contract covering the XDOD scanning system. 4. Hardware replacement calculated at fifty percent of annual equipment cost. Assumes decline in costs for similar functionality upon replacement. 5. File management and tape creation was handled on a Sun Sparc workstation, which cost $15,000 with peripherals. If the system operates 1,570 hours/year, and each volume takes 9.8 minutes (or .163 hours) for file management and tape creation, then the equipment cost is the annual equipment cost ($3,000) divided by number of hours per year (1,570) times the amount of time per volume for file management and tape creation (.163), which equals $0.31 per book. Maintenance and replacement costs are similarly calculated. 6. Optical media. Cornell uses 600 MB Magneto Optical Disks, which cost $80 each. Each volume requires around 15 MB, for a total of $2/book. 7. Ethernet connection. The monthly charge for a line is $9, for a total of $108/year.
Table 3 compares the labor and equipment costs reported in the Cornell and Yale projects to produce digital images. The costs to create either the microfilm or the COM are not included here. The Cornell/Yale findings indicate that the costs to produce digital images from either microfilm or paper are comparable. This table also indicates that there are great disparages in time and costs for the various steps involved in each approach. For instance, preparation time at Cornell was 78.8 minutes per volume, nearly 15 times the amount spent in preparation at Yale. This is because the Cornell project began with the original books and a good deal of the time was spent in identification, assemblage, collation, and physical preparation. With extant microfilm, many of those costs were incurred at the point of microfilming, not at the point of scanning. Second, the time and costs associated with the actual scanning indicate that it takes somewhere between 50% and 90% longer to scan from paper than it does from film. On the other hand, the time and costs associated with indexing, file management, and equipment were significantly greater when scanning film than scanning paper. In the end, the time and costs associated with the two approaches were remarkably similar. TABLE 3. PRODUCING DIGITAL IMAGES FROM PAPER VS. MICROFILM
Table 3 Notes: 1. The time and costs are adjusted for comparison purposes to represent a 216 page book and an average salary/benefits rate of $.2563/minute, used at Yale. 2. Figures for producing digital images from microfilm are taken from Conversion from Microfilm to Digital Imagery, Performance Report, Appendix 7, Digital Image Conversion Processes: Time and Costs. 3. Preparation for paper scanning from Table 1, minus the time to reel program the pages. 4. Indexing. Cornell structured the self-referencing portions of a book (e.g., title page, table of contents, list of illustrations, index, bibliography); Yale structured to the chapter level, one level deeper. 5. Other. For Cornell includes File Management (does not include tape creation or COM inspection). 6. Equipment costs. Yale provided two equipment rates, based on high and low capacity production.
Table 4 compares the labor and equipment costs reported in the Cornell and Yale projects. Note that the costs in the Cornell project ran 50-60% more than the costs of the Yale project, although the costs differed greatly at various stages of production. The Yale Project began with extant microfilm and resulted in the production of digital images. The Cornell project began with the original paper documents and resulted in the production of both COM and digital images. This table does not include the costs of creating the microfilm in the first place (Table 5 compares costs of full, end-to-end projects). TABLE 4. COMPARING CORNELL’S COM PROJECT TO YALE’S PROJECT OPEN BOOK
Table 4 Notes: 1. The time and costs are adjusted for comparison purposes to represent a 216 page book and an average salary/benefits rate of $.2563/minute, used at Yale. 2. Figures for producing digital images from microfilm are taken from Yale’s final report. 3. Indexing. Cornell structured the self-referencing portions of a book (e.g., title page, table of contents, list of illustrations, index, bibliography); Yale structured to the chapter level, which was one level of indexing deeper than Cornell’s. 4. Other category includes for Cornell: File Management, Tape Creation, and COM Inspection. For Yale, it includes: Quality Control, Registration, File Management. 5.Equipment costs. Yale provided two equipment rates, based on high and low capacity production
Table 5 presents costs for a full, end-to-end project to create both microfilm and digital imagery, comparing the scan first versus the film first approach, when beginning with the original source documents. The figures for labor, equipment, and contractual services are drawn from Cornell and Yale, but they may be more indicative than concrete regarding the relative costs associated with each approach. Obviously, these costs may vary dramatically as research institutions move from demonstration to production levels, and as more of the work is outsourced. Nonetheless, this table indicates that the costs of the film first hybrid approach averaged 20% more than those of the scan first approach, when all costs associated with microfilming and digital imaging were taken into consideration.
TABLE 5. COSTS ASSOCIATED WITH THE HYBRID APPROACH: SCAN FIRST VS FILM FIRST
Table 5 Notes: 1. The time and costs are adjusted for comparison purposes to represent a 216 page book and an average salary/benefits rate of $.2563/minute, used at Yale. 2. Figures for producing digital images from microfilm are taken from Conversion from Microfilm to Digital Imagery, Performance Report, Appendix 7, Digital Image Conversion Processes: Time and Costs. 3. Preparation. For Yale, this would include the costs of preparing the microfilm; figure based on Patti McClung’s landmark study, "Costs Associated with Preservation Microfilming," Library Resources & Technical Services (Oct/Dec 1986, p. 363-374). 4. Microfilm costs are based on costs for creating preservation quality microfilm at $.28/frame, which is what Cornell paid for filming contract costs in 1994-95. 5. Microfilm inspection costs are calculated on the mean inspection time as reported in Patti McClung’s study, with costs calculated on the Yale rate of $.2563/minute. 6. Indexing. Cornell structured the self-referencing portions of a book (e.g., title page, table of contents, list of illustrations, index, bibliography); Yale structured to the chapter level, which was one level of indexing deeper than Cornell’s. 7. Other category includes for Cornell: File Management, Tape Creation, and COM Inspection. For Yale, it includes: Quality Control, Registration, File Management. 8. Equipment cost. Yale provided two equipment rates, based on high and low capacity production.
COST FINDING NO. 2: CREATING IN-HOUSE SCANNING SERVICES MAY NOT BE AS COST-EFFECTIVE AS OUTSOURCING THE WORK, PROVIDED CLEAR GUIDELINES ARE DEVELOPED TO ENSURE COMPLIANCE WITH QUALITY (AND PRICING) REQUIREMENTS SUITABLE TO LIBRARY AND ARCHIVAL APPLICATIONS. In the fall of 1995, the Association of Research Libraries distributed a questionnaire to its 119 member libraries, inquiring about the extent of digital imaging for preservation purposes. The results of this survey were published as SPEC Kit 214, Digitizing Technologies for Preservation (March 1996). Of the 78 responding institutions, 29 (37%) reported undertaking digital projects for preservation. Most projects relied on in-house conversion, but in 8 of the 42 projects cited, digital conversion was outsourced to a service bureau. Although this figure represents only 19% of all projects, the number of documents converted through outsourcing comprised over 60% of the total volume. The largest digital imaging efforts in the humanities currently underway—the Mellon-inspired JSTOR project undertaken by the University of Michigan (approaching 2 million images), the Library of Congress’ National Digital Library Project (5 million images), and the Cornell/University of Michigan Making of America Project (1.5 million images)—are all outsourcing a range of digital imaging services. According to the editor of Imaging Service Bureau News, there are over 2,000 imaging bureaus in the United States, and well over half offer digital conversion capabilities. As more of them begin to provide products and services that meet the specific requirements of libraries and archives, the outsourcing of digital imaging services will likely follow the trend set in preservation microfilming. Two decades ago, many libraries operated their own in-house microfilming facilities to ensure a uniform product. Today most rely on the excellent service provided by those vendors who specifically cater to cultural institutions. Imaging is now in a similar transitional period. The majority of vendors are still marketing their services to the business community, but a number are beginning to view the library/archival community as a special market niche. As quality definitions are developed and "best practices" adopted, the move toward outsourcing may well become economically compelling. In his final report on the production phase of Project Open Book, Paul Conway argued that in-house conversion costs could be lowered if:
Unfortunately, few institutions are in a position to make an on-going commitment to develop, maintain, and upgrade scanning production facilities over a long period of time. During the past 9 years, Cornell has scanned over a million images in-house, and has developed considerable expertise in bitonal scanning. Two years ago, however, The Andrew W. Mellon Foundation funded the Making of America Project noted above. Cornell’s contribution to this project was to scan and make accessible 3,000 volumes of 19th century U.S. journals and monographs. For this project, Cornell used an outside service bureau to scan and index 900,000 pages within the course of 14 months. If the scanning had been done in-house, all in auto-mode, with the two XDOD workstations available, this project would have taken 2 years to complete, assuming that there was no staff turnover or equipment downtime. The in-house cost to perform this work would have averaged $.0149/image (excluding preparation costs, see Table 3). If Cornell were to add its indirect cost recovery rate of 60% to cover the cost of rent, utilities, and overhead, the cost would increase to $.0238/image. In the Making of America Project, Cornell is paying the service bureau $.0125/image for a product equivalent to the quality obtained in auto mode on the XDOD scanner.
RECOMMENDATION: THE FINDINGS FROM THE CORNELL COM PROJECT REPRESENT A FINANCIAL BENCHMARK AGAINST WHICH TO MEASURE COSTS ASSOCIATED WITH DEVELOPING AND MAINTAINING A DIGITAL ARCHIVING PROGRAM. Numerous conferences and reports have been dedicated to issues associated with digital archiving—ensuring continuing access to digital materials across hardware/software configurations and subsequent generations of computer technology. (21) The clearest articulation of these issues is provided in the Joint Task Force Report of the Research Libraries Group and the Commission on Preservation and Access, entitled Preserving Digital Information: Final Report and Recommendations. As the report makes clear, currently there are no agreed-upon processes or model institutional programs for preserving digital collections over time. (22) There is even less consensus on the costs of such efforts. Cost estimates for digital preservation range wildly, with some arguing that declines in storage costs will make archiving an economically compelling process, while others are claiming that the costs of digital migration will dwarf the costs of conversion in the first place. Most would agree with the position taken by the National Research Council in its 1995 report on preserving scientific information resources that "a successful archive [must be] affordable, durable, extensible, evolvable, and readily accessible." (23) Obtaining reliable information on costs of digital image archiving is difficult, and there are few published figures. The CPA/RLG joint report contains a Yale cost model for digital archives that is based on Project Open Book. The costs to store the digital files for one book are calculated at $2.58 for year one. The model projects that costs will decline annually, so that by decade’s end, the figure becomes $.82. The total cost over the ten year period for digital storage is estimated to be $15.37. (24) In a paper presented at a conference in Australia last year, Michael Lesk of Bellcore calculated that $4.00 would cover all future copying requirements to preserve a digital book on tape. (25) This led him to conclude that "if you can afford the first copy, you’ll be able to afford all the rest." A contradictory position is being offered by some government agencies. Unofficial figures from studies conducted by the Environmental Protection Agency peg the total costs of supplies, services, and hardware to maintain digital material for ten years at 4 to 7 times the cost of creation. For instance, if the total costs to create a digital image (preparation, contract service, indexing, and equipment) runs $20, then the cost to maintain that image over a decade would average $100. (26) If these figures prove true, it will probably be wiser to rescan the material at a later date when necessary than to try to migrate the digital images over time. The National Archives and Records Administration (NARA) Electronic Records Division has calculated the cost of archiving machine-readable databases (as opposed to digital images) at $30 per instance of refreshing. If the average database represents 60-70 megabytes (Mb) of information and it must be refreshed every five years, then the cost per gigabyte (Gb) of storage and maintenance would be $923.40 for ten years. This figure is comparable to estimates offered by an imaging service bureau that predicted the digital archiving of 1Gb of digital image information to be $100/year or $1000 for a decade. (27) How do these figures compare to the cost of creating COM as an archival backup? We can derive the costs from Table 1 by isolating those activities exclusively associated with COM creation: reel programming, tape generation, COM production, and COM inspection. These figures (for the Yale-adjusted book) would run $23.87 per volume, or $.116 per page image. If we included the costs of storing the COM in a secure, environmentally controlled vault, the per book cost would be $1.43 for a decade ($1 per year per reel, 7 books per reel), for a total cost to create and store a book on COM of $25.30, or $.117 per image. (28) This figure is 65% higher than the digital archiving figure of $15.37 for one book in the Yale cost model cited above, and over 6 times the cost for archiving a digital book on tape ($4) quoted by Michael Lesk. If we assume that the digital files for one book equal .015Gb of space (as reported in the Yale cost model, p. 57), and 1Gb of information costs $1,000 to archive for a decade, estimated by Northern Micrographics, Inc. (NMI), we could store 66 books in 1Gb and each book would cost $15.15—a figure remarkably similar to the Yale projection. If we applied the NARA estimate for database migration of $923.40 per Gb of information, then the price for each digital book drops to $13.99 for the decade. On the other hand, if we were to assume that the costs of archiving for a decade ran 5 times the amount of creation, as reported by the EPA, then the cost of maintaining the digital images for one book for a decade would be 5 times the cost to create the digital images (averaging $50/book, see Table 3)—or $250. This figure is nearly 10 times the cost of creating and storing a COM version. Table 6 below compares the relative costs of creating and storing COM to the various estimates for digital archiving offered by Yale, NARA, and others.
TABLE 6: COMPARISON OF COM COSTS TO DIGITAL ARCHIVING
So what are we to make of this? It appears those cost estimates for archiving a digital book for a decade range from one sixth the cost of COM to ten times that amount. In this age of uncertainty regarding technical processes and associated costs, the costs to create and maintain COM can help put into perspective widely varying predictions for digital archiving. (29) If the lower figures prove true, as represented by Michael Lesk or even by the Yale cost model, then it makes good economic sense to develop and implement digital archiving programs now. If, on the other hand, the costs run ten times the amount of COM production, then COM may represent a viable interim preservation solution. The COM can serve as an analog safeguard to the digital images. If all goes well, the film will reside in a properly controlled environment in perpetuity, and the digital images will be migrated over time. In the event of a "digital disaster," digital files can be re-created by scanning the COM. The cost of COM scanning should be lower than rescanning the originals, provided that efficiencies of film scanning and indexing are introduced (see below). FINDINGS AND RECOMMENDATIONS: PROCESS PROCESS FINDING NO.1: THE FILM FIRST AND SCAN FIRST APPROACHES ARE BOTH VIABLE SOLUTIONS IN A HYBRID PROGRAM. THE DECISION TO GO WITH ONE APPROACH OVER THE OTHER WILL DEPEND ON A RANGE OF VARIABLES ASSOCIATED WITH THE ATTRIBUTES OF THE ORIGINALS, INSTITUTIONAL CAPABILITIES, AND THE AVAILABILITY OF APPROPRIATE IMAGING PRODUCTS AND SERVICES. This approach offers the principal advantages of meeting preservation objectives on the film according to well-defined standards, and determining digital image quality requirements to meet current and future access objectives. As has been shown, however, the costs associated with the film first approach may be greater than those incurred in the scan first approach, and the quality may not be as high. Nonetheless, considerable cost savings can be achieved by scanning extant film, dependent on the quality and condition of the film itself. In the future, film should be consciously created with digital conversion in mind. A number of institutions, including the Deutsche Forschungsgemeinshaft (German Research Institute), Harvard University, and the Australian Cooperative Digitization Project (ACDP) are preparing recommendations for the creation of microfilm for subsequent digitization. (30) Their findings should inform the preservation community in its efforts to determine the viability of the film first approach (see final recommendation for international conference on the hybrid approach). Advantages of the Film First Approach
Drawbacks to the Film First Approach
Key challenges in using this approach will be to work with vendors to develop means for effectively scanning older film, to increase enhancement capabilities of bitonal scanning, to reduce costs of grayscale scanning, and to create new film in a manner than will expedite its eventual conversion to digital form. This approach offers the principal advantages of creating high quality digital images directly from originals, and producing microfilm from the images with little to no detectable loss in quality. With image enhancement capabilities, today’s document scanners can provide a cost-effective means to capture text and illustrations that are faithful to the originals. As a result, both the digital images and the COM can meet exacting image quality requirements. Depending upon the items selected for conversion, the scanning first approach may result in higher quality—and lower costs—for creating both microfilm and digital images. The principal drawbacks are in converting oversize/bound volumes, ramping up institutional capabilities to manage digital imaging processes (both in scanning and COM production), and working with vendors to develop COM standards for preservation.
Advantages of the Scan-First Approach
Drawbacks to the Scan First Approach
Key challenges in using this approach will be to develop and adopt standards to cover image quality and permanence for both digital imagery and COM, and to encourage a range of vendors to offer COM services that can meet these specifications.
Table 7 compares the key conversion steps in the film first and scan first approaches..
PROCESS FINDING NO. 2: THE CORNELL AND YALE PROJECTS EVALUATED THE USE OF HIGH RESOLUTION BITONAL SCANNING TO PRODUCE DIGITAL REPRESENTATIONS OF BRITTLE BOOKS. FURTHER INVESTIGATION INTO THE QUALITY, PROCESSES, AND COSTS ASSOCIATED WITH GRAYSCALE AND COLOR SCANNING SHOULD BE CONDUCTED. There is great promise for improving image quality in grayscale or color scanning of film or paper, particularly for the capture of illustrated material. The same holds true for COM recording of grayscale or color image files. This improved quality, however, comes at the price of increased cost and image file sizes. Experience to date with grayscale/color film scanning or COM recording is limited—the Library of Congress, the Beinecke Library (Yale), Cornell, Harvard, the National Archives and Records Administration, the Swedish National Archives, the University of Florida, and other research institutions have conducted experimental tests—but as of this writing, only the Library of Congress has undertaken a production film scanning effort involving grayscale film scanning. (32) Many imaging service bureaus are in a similar experimental state, and are loath to quote specific prices for production work. As noted earlier, Image Graphics reported COM recording times of under 4 seconds per 600 dpi bitonal image. To record a 300 dpi 8-bit version of that image can take 18-20 seconds, with the per image price running 2 to 4 times the amount of the bitonal image. (33) Informal conversations with a number of film scanning vendors indicate that grayscale paper scanning can cost 4 times more than bitonal scanning, while grayscale film scanning can take 16-20 times longer than bitonal capture. These increases are generally attributable to the larger file sizes produced in grayscale and color imaging. The file size for an uncompressed grayscale image will run 8 times that of a bitonal image at the same resolution; an uncompressed color image will be 24 times larger than a bitonal image at the same resolution. The proportional file size differences will likely increase as these images are compressed. Because of the significant increase in expense and file size associated with grayscale/color imaging, bitonal film scanning and COM recording have received more attention. Unfortunately, bitonal film scanners currently lack the enhancement capabilities that paper scanners offer. There is great promise in the development of bitonal film scanning enhancement capabilities, especially "grayscale algorithms" that are optimized for the capture of illustrations. And, as noted earlier, Picture Elements has developed an Imaging Subsystem Engine (ISE) Board to be used with a range of scanners, including film scanners, that will add an array of features, such as grayscale deskew, grayscale scaling, highest quality binarization, as well as sophisticated manipulation of halftones and other illustrations. The resulting image quality and speeds of film scanning should improve as a result of this effort. Additional research into the quality/cost tradeoffs associated with grayscale/color film scanning and COM recording for library and archival materials must be conducted (see recommendations for conference, below).
PROCESS FINDING NO. 3: CORNELL ADOPTED A "REASONABLENESS" STANDARD FOR DETERMINING THE COPYRIGHT STATUS OF TWENTIETH CENTURY BRITTLE BOOKS TO BE INCLUDED IN A HYBRID APPROACH. Because this project focused on scanning and COM production for core agricultural literature, Mann Library wanted to include 20th century titles, which potentially could fall under copyright protection. Staff investigated alternative procedures for determining the copyright status of books published after 1920. Project staff conducted copyright searches both in Washington, DC in the files of the U.S. Copyright Office, and at Cornell in the printed Catalog of Copyright Entries (CCE), and compared results to determine the most efficient procedure. Searches (averaging 7 minutes per title) in the CCE were 97% in agreement overall with the results obtained from the more time-consuming searching (averaging 13 minutes per title) at the Copyright Office. CCE searches were 100% in agreement concerning instances of renewal of copyright. This finding calls into question the assumption that it is necessary to conduct such searches, at considerable cost, in the complex files of the Copyright Office. Mann Library staff have developed a CCE search procedure that they propose as a "reasonableness standard" for copyright searching (see Appendix IV). It is their belief that this approach demonstrates a legally responsible effort to respect the rights of copyright holders while advancing preservation aims. (34)
RECOMMENDATION: CORNELL AND YALE RECOMMEND THAT THE NATIONAL ENDOWMENT FOR THE HUMANITIES SUPPORT A HIGH LEVEL CONFERENCE TO ASSESS THE FINDINGS OF THEIR PROJECTS; TO MAKE RECOMMENDATIONS FOR BEST PRACTICES IN THE CREATION AND USE OF CONVENTIONAL MICROFILM AND COM IN A HYBRID APPROACH; TO CONSULT WITH VENDORS OF IMAGING SERVICES AND PRODUCTS IN ADOPTING THESE PRACTICES; TO IDENTIFY AREAS NEEDING ADDITIONAL RESEARCH AND DEVELOPMENT; AND TO EVALUATE THE ROLE OF THE HYBRID APPROACH IN BROADER DIGITAL PRESERVATION EFFORTS. THE PROCEEDINGS OF THIS CONFERENCE SHOULD BE PUBLISHED AND MADE WIDELY ACCESSIBLE IN PRINT AND VIA THE INTERNET. These two projects have laid important groundwork in determining preservation requirements in a digital age, but there are remaining issues that must be resolved which will require the support and attention of the broader preservation community. As a first step in that direction, Cornell and Yale recommend that a high level conference be held in Washington, DC in early 1998, with representatives from: institutions that have undertaken hybrid projects, imaging service providers, key industry and technology developers, and preservation and cultural organizations. Given the amount of interest and effort this issue has generated world-wide, the conference should include representatives from other countries that have instituted hybrid projects, including Australia, Canada, Germany, Sweden, and the United Kingdom. The Technical Advisory Committee to the COM project strongly endorses this proposal. Among issues to be addressed by this conference are:
1. The report of the Joint RLG/CPA Task Force on Archiving of Digital Information provides the clearest articulation of issues associated with providing continuing access to digital information: http://www.rlg.org/ArchTF/ (Return to Text) 2. Nancy Elkington, editor, RLG Preservation Microfilming Handbook, The Research Libraries Group, Inc. (March 1992). (Return to Text) 3. Paul Conway, Conversion of Microfilm to Digital Imagery: A Demonstration Project. Performance Report on the Production Conversion Phase of Project Open Book, Yale University Library, August 1996. See also, Paul Conway, "Yale University Library’s Project Open Book: Preliminary Research Findings," D-Lib Magazine, February 1996 (http://www.dlib.org/dlib/february96/yale/02conway.html) (Return to Text) 4. Elkington, op. cit., and ANSI/AIIM MS23-1991, Practice for Operational Procedures/Inspection and Quality Control of First-generation, Silver Microfilm and Documents, Association for Information and Image Management. (Return to Text) 5. Anne R. Kenney, "Digital-to-Microfilm Conversion: An Interim Preservation Solution," Library Resources & Technical Services (Oct 1993) pp. 380-402; (January 1994) pp. 87-95; ANSI/AIIM TR26-1993, Resolution as it Relates to Photographic and Electronic Imaging, Association for Information and Image Management. (Return to Text) 6. Cornell University Department of Preservation and Conservation, RFP for Digital COM Services, available at ftp://lyra.stanford.edu/pub/diglib/cucomrfp.txt. (Return to Text) 7. We attended the 1994 AIIM show in New York City and talked with a number of vendors about our concerns. Many of them immediately assured us they could do the work, but it was clear in further discussions that they did not have a clear grasp of what we wanted. When I mentioned to one vendor that we were looking to create 35 mm silver halide microfilm at 8-10x reduction ratios, his first concern was, "how are you going to read anything at such low reduction?" (Return to Text) 8. Image Graphics achieved variable reduction ratios by recording all pixels across the width of an image onto 15mm of the film. There was a 3 mm spacing between images in the 2A position, and 3 mm of space reserved between frames. The physical page dimensions of foldouts were recorded on the production note. If foldouts exceeded 11" x 17", they were reduced via preservation photocopy and the photocopy scanned, excepting in cases where significant information would be lost by the reduction process. To maintain information on the actual size of the foldouts, and to calculate the reduction ratio used, the size of the reduced photocopy was also recorded (the pixels representing the smaller dimension of the foldout were always recorded on 32mm of film). (Return to Text) 9. ANSI/AIIM MS44-1988 "Recommended Practice for Quality Control of Image Scanners," advises against the use of standard resolution test patterns for scanning at resolutions less than or equal to 600 dpi because of "(1) the problems associated with the random placement of samples, and (2) the conflicting requirement placed on the threshold." (Return to Text) 10. The readings on the RIT target when scanned on the XDOD at settings optimized for its capture represented at least line 15 legibility in all four quadrants. However, when the settings optimized for the brittle books were used, the RIT readings differed considerably, with lower readings seeming to correlate to the capture of low density originals. The quality of the resulting COM was excellent in all cases. This led us to suspect that the target was not a sufficiently accurate indicator of resolution when its density varied considerably from that of the original book. Many of these books exhibit low contrast between text and background. The RIT target used in this project was a high contrast target (density of 1.9). We subsequently scanned three different versions of the RIT target with high density (1.9), medium density (1.3), and low density (.7) at various settings analogous to ones we would use to capture high, medium, and low contrast books. The best readings were uniformly observed on the low density (.7) RIT target, with the exception of the instance when the "autosegmentation" feature was used, which interpreted portions of the low density RIT target as a halftone and applied descreening and rescreening filters to it. (Return to Text) 11. The use of the windowing feature adds considerable time to the scanning process, as noted in the section on costs. To minimize costs, Cornell relied on curatorial review of illustrated materials to determine scanning requirements (automatic versus best quality) at the time of scanning. Appendix I contains a copy of the "Guidelines for Autosegmentation/Manual Windowing." (Return to Text) 12. Conway, Op.cit., p. 19. (Return to Text) 13. This finding was reached by Yale: "Bitonal scanning is not appropriate for preservation microfilm containing materials with rich tonal qualities, such as photographs, halftones, and dense line art, even if the microfilm containing these types of illustrations is of high quality." See Conway, op. cit., p. 10. (Return to Text) 14. Picture Elements is currently developing an Imaging Subsystem Engine (ISE) Board that manipulates grayscale data from a variety of specialized scanners, adding to them an array of features such as grayscale deskew, grayscale scaling, highest quality binarization, as well as sophisticated manipulation of halftones and other illustrations. This board has been integrated with a number of flatbed scanners, and Picture Elements is currently working with Minolta to integrate the ISE Board with the Minolta PS-3000 Book Scanner. In future projects, the Board will be integrated into film scanners as well. The resulting image quality and speeds of film scanning should improve as a result. (Return to Text) 15. Ron Whitney, Manager of Electronic Production, Primary Source Media, scanned the COM using the Sunrise SRI-50 film scanner. He noted that it was "a pleasure working with this film overall." Its consistent density and image placement resulted in "flawless edge detection and distinction between frames," and made film scanning "a snap." Conway supports this finding: "as a general rule, preservation microfilm created with the utmost in technical rigor will yield higher quality results at a lower cost" (p. 11). His experience with Project Open Book, however, led him to conclude that film characteristics have relatively little impact on conversion costs "even if they can make or break digital image quality…We can obtain or exceed quality conversion from ‘poor film’ with only a marginal increase in the overall conversion costs of scanning ‘good film.’" For this reason, he argues "that significant investment in improving the quality of new film will probably not pay off in terms of reduced conversion costs." (p. 17-18, Final Report) This conclusion may not hold true if the same quality is expected in the digital files. Preliminary tests by the Swedish National Archives using the SunRise SRI-50 grayscale scanner suggest that digital resolution requirements are higher when scanning low resolution film than high resolution film. In other words, ensuring high quality in the initial conversion step may pay off in higher quality and lower costs at subsequent stages. (Return to Text) 16. See Roger S. Bagnall, Digital Imaging of Papyri: A Report to the Commission on Preservation and Access (September 1995); Janet Gertz, Oversize Color Images Project, 1994-95 Final Report of Phase I (August 1995); Anne R. Kenney and Stephen Chapman, Tutorial: Digital Resolution Requirements for Replacing Text-Based Material: Methods for Benchmarking Image Quality (April 1995); Picture Elements, Inc., Guidelines for Electronic Preservation of Visual Materials, Part I (March 1995); German Research Institute, "Digitization as a Method of Permanent Preservation (German edition available at http://www.lad-bw.de/dfgdigh1.htm); National Archives and Records Administration, "Technical Specifications" for the Electronic Access Project (to be made available at http://www.nara.gov). (Return to Text) 17. ANSI/AIIM MS44-1988, Recommended Practice for Quality Control of Image Scanners (updated to MS50-199x, in process), and Anne R. Kenney and Stephen Chapman, Digital Imaging for Libraries and Archives, Cornell University Library, June 1996, pp. 28-31. (Return to Text) 18. Elkington, Op. Cit., Appendix 18, pp. 160-176. (Return to Text) 19. Cornell adopted fifteen terms as standard structure elements, which are included in Appendix I. Yale also was rigorously consistent in the use of terms at the highest level of structuring. Harvard is investigating means to alter the preparation of microfilm for production scanning, including the creation of a Collation Target to gather information on physical features and page numbering. See: http://preserve.harvard.edu/resources/digitization/index.html. (Return to Text) 20. The scanning sample contained 42 volumes scanned in "manual mode" and 3 volumes scanned in "auto mode." Fewer "auto mode" times were recorded because they confirmed the figures for scanning identified in an earlier cost study conducted during the Cornell/Xerox Project (1990-1991), as reported in Anne R. Kenney and Lynne K. Personius, Joint Study in Digital Preservation. Report: Phase I (Commission on Preservation and Access, 1992), pp. 25-30, and Appendixes II and III. (Return to Text) 21. See for example the Web site for "Documenting the Digital Age Conference" sponsored by the National Science Foundation, MCI Communications, Microsoft Corporation, and History Associates Incorporated (Http://dtda.mci.com). This conference, held in February 1997, included a session "How Do We Archive Digital Records?" with presentations by Don Waters and Brewster Kahle. The National Historical Publications and Records Commission of the National Archives funded an electronic records conference last summer at the University of Michigan which made recommendation for revising the NHPRC research agenda, including the need to support more research to address issues of long-term maintenance and preservation and to develop methods to estimate costs of preserving electronic records. (http://www.si.umich.edu/e-recs/Report/FR.report.html). The National Library of Australia maintains a Web site, Preserving Access to Digital Information, on which is gathered a range of information on ensuring long-term access to digital information, including issues of costs (http://www.nla.gov.au/dnc/tf2001/padi/padi.html). (Return to Text) 22. Task Force on Archiving of Digital Information, Preserving Digital Information: Final Report and Recommendations, May 20, 1996 (http://www.rlg.org/ArchTF/). The report distinguishes between the technical processes of "refreshing" and "migration" in a digital archiving program: "The purpose of migration is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration includes refreshing as a means of digital preservation but differs from it in the sense that it is not always possible to make an exact digital copy or replica of a database or other information objects as hardware and software change and still maintain the compatibility of the object with the new generation of technology." (p. 6) (Return to Text) 23. Preserving Scientific Data on Our Physical Universe. A New Strategy for Archiving the Nation’s Scientific Information Resources, National Research Council (National Academy Press, Washington, DC 1995), p.6 . The American Astronomical Society has begun to factor archival maintenance costs into the subscription price of their scholarly journals. They plan to update their journals every five years, moving them to the current standard technology to ensure continued easy access (reported in PADI site). (Return to Text) 24. Preserving Digital Information, Report of the Task Force on Archiving of Digital Information, by The Commission on Preservation and Access and The Research Libraries Group, Inc., Appendix 2, pp. 57-59. The report compares the cost of digital archiving to the cost of storing and maintaining access to the physical source documents. (Return to Text) 25. Michael Lesk, "Preserving Digital Objects: Recurrent Needs and Challenges," p. 7 (http://community.bellcore.com/lesk/auspres/aus.html) (Return to Text) 26. Mike Miller, formerly of the EPA and now at the National Archives, phone conversation, July 8, 1997 with Anne R. Kenney. In 1991, the EPA conducted a feasibility study for the Superfund Document Management System (SDMS). Costs were based on a six year life cycle, including development, implementation, and operation. The report recommended a hybrid solution of digital imaging and microforms to minimize risk, although this recommendation was not implemented. Email, Steven Puglia to Anne R. Kenney, July 2, 1997, and fax, Mike Miller to Anne R. Kenney, 7/24/97 summary for Benefit-Cost Analysis, SDMS. The National Archives has estimated digital image acquisition costs at $8.00-$10.00 per image (for a master image, 1 derivative, and 1 thumbnail version), and document preparation costs at $10.00 per item; phone conversation, Dan Jansen (NARA) and Anne R. Kenney, July 3, 1997. (Return to Text) 27. Telephone conversation Bruce Ambacher (NARA) and Anne R. Kenney, July 2, 1997; conversation with John Sarnowski, Technical Support, Northern Micrographics Inc., Lacrosse, Wisconsin, June 13, 1997. (Return to Text) 28. This figure is one half to one quarter the cost for the creation of a microfilm version from the original book. (Return to Text) 29. The idea of measuring the costs for electronic preservation against the costs of producing COM or microfilm was suggested by Don Waters (Yale University) during the second Technical Advisory Committee meeting. (Return to Text) 30. See Digitalisierung gefährdeten Bibliotheks- oder Archivguts, Abschlußbericht der Arbeitsgruppe "Digitalisierung" des Unterausschusses Bestandserhaltung der Deutschen Forschungsgemeinschaft (7. Oktober 1996), http://www.lad-bw.de/dfgdigh1(an English translation will soon be available from the Commission on Preservation and Access); Harvard Preservation Center, "Investigation: Preparing Microfilm for Production Scanning," http://preserve.harvard.edu/resources/digitization/index.html; and the Australian Cooperative Digitization Project (http://www.nla.gov.au/ferg/ ). Project staff have investigated scanning extant film and creating new film that will expedite digital conversion. It was determined that 20% of existing film stock is suitable for scanning. The prototype for the ACDP for digitized film is http://www.nla.gov.au/acdp/serials.html. (Return to Text) 31. For a discussion of this point, see Paul Conway, "Yale University Library’s Project Open Book: Preliminary Research Findings," D-Lib Magazine, February 1996 (http://www.dlib.org/dlib/februar96/yale/02conway.html). (Return to Text) 32. The Library of Congress issued Requests Proposals for Conversion of Microfilm to Digital Images for the National Digital Library Program (RFP96-5) earlier this year, which calls for both bitonal and grayscale film scanning. (Return to Text) 33. These figures reported by Mike Beno, Customer Service Manager, Image Graphics, Inc., at the Technical Advisory Committee meeting, September 26, 1997. (Return to Text) 34. Samuel Demas and Jennie L. Brogdon, "Determining Copyright Status for Preservation: Defining Reasonable Effort," submitted for publication. For more information on this study, contact Sam Demas, email: sgd1@cornell.edu. (Return to Text)
©
2001-2002 Cornell University Library/ |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||