-
Training Algorithms to Read Complex Collections: Handwriting Classification for Improved HTR Models
- Author(s):
- Bhagawat Acharya, Katherine Faull, Brian King, CARRIE PIRMANN (see profile)
- Date:
- 2020
- Group(s):
- DH2020
- Subject(s):
- Artificial intelligence, Digital humanities, Research, Methodology, Machine learning, Transcription
- Item Type:
- Conference paper
- Conf. Title:
- DH2020
- Conf. Org.:
- ADHO (Alliance of Digital Humanities Organizations)
- Conf. Loc.:
- Virtual
- Conf. Date:
- July 20-24, 2020
- Tag(s):
- Digital humanities research and methodology, Text transcription
- Permanent URL:
- http://dx.doi.org/10.17613/73kh-7g63
- Abstract:
- This paper will present a new handwriting grouping algorithm that has been developed to decrease the Character Error Rate (CER) for a collection of manuscript documents written in various hands and in multiple languages. The Moravian Lives project (moravianlives.org) focuses on tens of thousands of handwritten ego-documents; to facilitate transcription of these, the team has been using Transkribus. Numerous and varying handwriting styles found in the documents present challenges to creating highly accurate HTR models. Human identification of similarities in handwriting is tenuous; automated scribe identification or grouping of handwriting styles could result in much more accurate models. An undergraduate computer science student and professor of computer science and are experimenting with deep learning to author a grouping model, designed to group or sort memoirs by handwriting styles. These groupings should enable the creation of more accurate models in Transkribus, as well as more accurate transcription outputs.
- Notes:
- PLEASE read slide notes in PowerPoint
- Metadata:
- xml
- Status:
- Published
- Last Updated:
- 3 years ago
- License:
- Attribution-NonCommercial
-
Training Algorithms to Read Complex Collections: Handwriting Classification for Improved HTR Models