Volume 1, no.1 (Spring 2004)
 
Return to Backstory homepage
Raiders of the lost MARC:
Mining the Voyager database for fun and profit

When CUL migrated to Endeavor’s Voyager LMS in June 2000, the change affected everyone who handled or accessed bibliographic data, staff and users alike. Whether staffing the reference desk, approving invoices in accounting, placing items on reserve, checking in serials or cataloging, all CUL staff had to adapt, and sometimes completely revamp, their own jobs to accommodate the capabilities and the limitations of the system.

Among the most significant positive changes has been the greatly enhanced reporting capability. In the NOTIS environment, only library systems staff could write queries against the database. The work was time-consuming and complex. With limited staff time available, requests for reports had to be queued and handled in priority order system-wide. But with Voyager, the ability to get at the underlying data was democratized. Anyone with a PC, the proper software and some training could begin writing custom reports for themselves or their units.

In Technical Services, we’ve used a number of programs to perform data analysis, data mining and reporting. Three of them --Microsoft Access, VgerSelect, and Harvest—represent our primary tools for these purposes. Each has particular strengths and shortcomings, but in combination, they give us a very powerful array of tools to get at hidden data to aid in decision-making and planning, reporting and even end-user applications.

Microsoft Access is the most flexible of these tools. From their own workstations, staff can create queries that can look at virtually any record in the Voyager database, whether it be a MARC bibliographic or holdings record or a purchase order, invoice, or item record. Linking fields from the various Voyager tables, TS staff have written sophisticated queries that count backlogs by category, generate new acquisitions lists, determine what we are spending on electronic resources, or tell us how many items we’ve cataloged for a particular location in a given time span. Often, these queries have been written with the assistance of our resident Access guru, Lydia Pettis of Library Systems. Many TS staff have taken Lydia’s excellent Access class and have become adept at finding creative ways to use it in their daily work (see sidebar).

As useful as it is, Access does have a number of limitations when working with bibliographic data. Getting at specific fields or subfields in a MARC record is often difficult, and performance can be very slow. When item records or acquisitions data aren’t needed, VgerSelect often comes to the rescue. VgerSelect is a freeware tool for harvesting MARC data from Voyager bibliographic and holdings records developed by Gary Strawn of Northwestern University Library. Although VgerSelect can only extract bib and holdings data, in many ways it is more useful, and faster, than MS Access. VgerSelect allows a user to pinpoint exact data in a bibliographic or holdings record, down to the subfield level. Results can be output in text format, or the full MARC record can be written to a file. VgerSelect has been used extensively in database cleanup projects, in tracking cataloging errors, and in preparing data for our automated e-journal maintenance routines.

Harvest is a locally-developed, Web-based tool created by Peter Hoyt of Library Systems. Harvest can examine bibliographic, holdings or authorities records. Queries are created using a simple interface, and users can easily customize or revise the requests. Harvest queries can also be linked to one another, depending on need. Harvest has been used extensively in database cleanup projects, in the e-journal maintenance work, for various statistical reports in support of LARIS, and for other planning in technical services, such as the implementation of classification on receipt.

Use of the three primary tools isn’t mutually exclusive; often they are used in conjunction with one another to produce highly customized reports. For example, a set of results retrieved from Access or Harvest may be re-run through VgerSelect. And follow-on processing can be multi-faceted as well. A VgerSelect result may be imported into an Access table, an Excel spreadsheet, or even converted to an XML or HTML file. Once the data is in another format, charts, reports or Web pages may be generated from the data.

 
CUL TS miners of the Voyager Database
Anna Korhonen
Jean Pajerek
Scott Wicks
David Banush
Lois Purcell
 

 

Cornell University Library homepage

©Cornell University, 2004