Volume 1, no.1 (Spring 2004)
 
Return to the Backstory  home page
Taking it to the HILCC:
Automated classification and subject analysis under study in CTS

Providing enhanced subject analysis to library resources via automated means is a long-standing research problem in the cataloging community. "Self-declaring resources", automated classification from subject headings, and other means of using machines to analyze information content and provide access via subjects or index terms have been subjects of study, both theoretical and practical. Some real-life applications have been realized. For example, OCLC has offered mapping from Library of Congress Subject Headings to Dewey Decimal Classification numbers via WorldCat for several years. But for the most part, the task of distilling the "aboutness" of a given resource and assigning it to a particular category or classification scheme has largely been resistant to complete automation.

Two librarians in CTS are currently working with a new tool, the Hierarchical Interface to LC Classification, or HILCC, to explore the possibilities of using automation to expand and potentially customize subject access to library resources. HILCC was developed at Columbia University Libraries and is currently being used there to generate a structured menuing system for subject access to electronic resources. Library of Congress call number ranges are mapped to a table of related subject terms. The table allows a browsable subject category "tree" to be generated to assist users in navigating through e-resource subject content on the Web. Columbia is currently using the HILCC software in its browsable E-Journal lists and an A-Z subject list for electronic resources of all kinds.

The HILCC Web site at Columbia

In early February, Karen Calhoun, Associate University Librarian for Technical Services, Adam Chandler, CTS Information Technology Librarian, and Jim LeBlanc, head of CTS Post-Cataloging Services, traveled to New York to visit colleagues at Columbia and talk about the HILCC system. Columbia library staff and administrators graciously agreed to share HILCC with CUL for research purposes, and CTS is now working with HILCC to explore its potential in a different way. Currently, Adam and Jim are comparing the Columbia HILCC classification to the call numbers contained in the Uris collection. Their goal: to see if it is feasible to use the HILCC topics as a navigation aid to an undergraduate print collection. Uris Library currently holds about 150,000 titles. One of the research questions under study is whether HILCC "scales-up" as a browsing method for a collection of this size. Initial observations suggest that, at least in its current form, it does not. One reason is a mismatch between the HILCC subject distribution and that of the Uris materials. For example, two Uris call number ranges contain over 13,000 titles, while dozens contain none or only a few; the working assumption is the distribution should be more even.

The next phase of the plan involves comparing the results of the Uris frequency distribution to see if it may be possible to adjust HILCC to better suit the Uris collection. Adam and Jim are working to match the Uris collection HILCC histogram to the complete LC Classification tables for certain HILCC subjects. It seems clear that HILCC would need to be modified at the high end, to break subjects like "Languages & Literatures -- English -- English Literature" into smaller pieces. To keep this work moving, a working assumption about what the upper browse threshold is likely to be (300 titles? 500?) needs to be established. One possible outcome of the HILCC research would be a user-friendly, Web-based interface that will assist undergraduates in finding print resources appropriate to their needs and interests. Assuming that model were to work, it might be possible to further divide the catalog into other subject-focused browse lists, so that specialists in a particular subject area could look for resources tailored to their needs. Adam and Jim plan to publish the findings of their work in a forthcoming article.

For more about HILCC, visit the HILCC site at:

http://www.columbia.edu/cu/libraries/inside/projects/metadata/hilcc/

 

CUL's investigations into HILCC can be viewed here:

http://www.library.cornell.edu/cts/browseandextend/

 

Cornell University Library home page

©Cornell University, 2004