• "Q i-jtb the Raven": Taking Dirty OCR Seriously

    Author(s):
    Ryan Cordell (see profile)
    Date:
    2017
    Group(s):
    Digital Humanities, LLC 19th-Century American, TM Bibliography and Scholarly Editing, TM Book History, Print Cultures, Lexicography
    Subject(s):
    Bibliography, Bibliography, Critical, Books, History, Digital humanities, Mass media--Study and teaching, Archaeology
    Item Type:
    Article
    Tag(s):
    Descriptive bibliography, Book history, Media archaeology
    Permanent URL:
    http://dx.doi.org/10.17613/M6WG2S
    Abstract:
    This article argues that scholars must understand mass digitized texts as assemblages of new editions, subsidiary editions, and impressions of their historical sources, and that these various parts require sustained bibliographic analysis and description. To adequately theorize any research conducted in large-scale text archives—including research that includes primary or secondary sources discovered through keyword search—we must avoid the myth of surrogacy proffered by page images and instead consider directly the text files they overlay. Focusing on the OCR (optical character recognition) from which most large-scale historical text data derives, this article argues that the results of this "automatic" process are in fact new editions of their source texts that offer unique insights into both the historical texts they remediate and the more recent era of their remediation. The constitution and provenance of digitized archives are, to some extent at least, knowable and describable. Just as details of type, ink, or paper, or paratext such as printer's records can help us establish the histories under which a printed book was created, details of format, interface, and even grant proposals can help us establish the histories of corpora created under conditions of mass digitization.
    Metadata:
    Published as:
    Journal article    
    Status:
    Published
    Last Updated:
    6 years ago
    License:
    Attribution-NonCommercial-ShareAlike

    Downloads

    Item Name: pdf 2017-bookhistory-qitjbtheraven.pdf
      Download View in browser
    Activity: Downloads: 1003