
Glossary of E-Text Terms
ASCII text: Machine-readable text where letters and punctuation are stored, but no additional information is saved. Sometimes called simple or unstructured text.
Bit-Mapped Images: Scanned images that present the appearance of a page or illustration. The content of bitmapped images cannot be searched unless they are converted by OCR software into machine-readable text.
Corpus, corpora: A corpus is a collection of texts. Corpora are multiple collections of texts.
Electronic Text: Text available in machine-readable or computerized form. The text may be saved on magnetic media, such as diskettes or hard drives, or on optical media, such as CD-ROMs.
HTML [Hypertext Markup Language]: The tagging system used to turn text into Web pages.
OCR [Optical Character Recognition]: Software that converts scanned, or bit-mapped, images of text into machine-readable from. Individual words in OCRed text can be found and displayed using searching software.
SGML [Standard Generalized Markup Language]: The mother language of HTML, SGML is used to tag complex texts for preservation, searching, and display. SGML tags are organized into sets for use with various document types: poetry, plays, letters, or technical manuals, for example. SGML tagging may be preserved in both electronic and printed form. It is hierarchical in structure, but allows great flexibility in incorporating links to other documents as well as variant textual readings, marginalia, and other ways of marking specific textual characteristics for future analysis or retrieval.
TACT [Text Analysis Computing Tools]: A DOS-based text analysis program.
Tags: Labels added to ACSII text to add value to the text: searchability, display formatting, hypertext links, scholarly notes, and preservation information, for example.
Revised March 6, 1998
http://www.library.cornell.edu/olinuris/ref/cet/eglossary.html
Michael Engle![]()
Olin and Uris Libraries, Cornell University, Ithaca
NY 14853
PSA: Public Services and Assessment
Information and reference: 607-255-4144, okuref@cornell.edu
Circulation: (Olin) 607-255-4245, (Uris) 607-255-3537, olincirc@cornell.edu
