Every word on the Internet can be ephemeral and fleeting – which presents unique challenges for Cornell’s digital archivists, whose mission is to pin down information and preserve it so that researchers can use it in the future.
At the library, staff members from two departments – Library Technical Services and the University Archives in the Division of Rare and Manuscript Collections – are trying to capture the university’s intellectual output by archiving all of the websites in the cornell.edu domain.
The process began in 2011. Now, several collections are available through Archive-It, a paid service of the Internet Archive, which developed the Wayback Machine. Both the Internet Archive and Archive-It rely on the Wayback Machine for public access to archived websites.
The archive allows users to find older versions of Web pages and content that is no longer available online, such as the popular “Dear Uncle Ezra” column.
Because websites are not static entities, the group had to set up a process that can be repeated at regular intervals. Much of cornell.edu domain has already been archived twice, with plans to redo it every January and June.
“We will continue to improve the archiving process each time we run it,” said Jason Kovari, metadata librarian for humanities and special collections. “Web archiving allows us to archive cornell.edu as it appears at determined moments in time, so that researchers can view the progression of change.”
Visit the Cornell Chronicle to read the full story.