• Training Algorithms to Read Complex Collections: Handwriting Classification for Improved HTR Models

    Author(s):
    Bhagawat Acharya, Katherine Faull, Brian King, CARRIE PIRMANN (see profile)
    Date:
    2020
    Group(s):
    DH2020
    Subject(s):
    Artificial intelligence, Digital humanities, Research, Methodology, Machine learning, Transcription
    Item Type:
    Conference paper
    Conf. Title:
    DH2020
    Conf. Org.:
    ADHO (Alliance of Digital Humanities Organizations)
    Conf. Loc.:
    Virtual
    Conf. Date:
    July 20-24, 2020
    Tag(s):
    Digital humanities research and methodology, Text transcription
    Permanent URL:
    http://dx.doi.org/10.17613/73kh-7g63
    Abstract:
    This paper will present a new handwriting grouping algorithm that has been developed to decrease the Character Error Rate (CER) for a collection of manuscript documents written in various hands and in multiple languages. The Moravian Lives project (moravianlives.org) focuses on tens of thousands of handwritten ego-documents; to facilitate transcription of these, the team has been using Transkribus. Numerous and varying handwriting styles found in the documents present challenges to creating highly accurate HTR models. Human identification of similarities in handwriting is tenuous; automated scribe identification or grouping of handwriting styles could result in much more accurate models. An undergraduate computer science student and professor of computer science and are experimenting with deep learning to author a grouping model, designed to group or sort memoirs by handwriting styles. These groupings should enable the creation of more accurate models in Transkribus, as well as more accurate transcription outputs.
    Notes:
    PLEASE read slide notes in PowerPoint
    Metadata:
    Status:
    Published
    Last Updated:
    3 years ago
    License:
    Attribution-NonCommercial

    Downloads

    Item Name: pptx pirmann-acharya-king-faull-dh2020-slides.pptx
      Download
    Activity: Downloads: 45