• PoetryLab​. An Open Source Toolkit for the Analysis of Spanish Poetry Corpora​

    Author(s):
    Javier de la Rosa (see profile) , Elena González-Blanco, Álvaro Pérez, Salvador Ros
    Date:
    2020
    Group(s):
    DH2020, Digital Humanists
    Subject(s):
    Natural language processing (Computer science), Poetry, Rhetoric
    Item Type:
    Presentation
    Meeting Title:
    Digital Humanities 2020 Virtual Conference
    Meeting Org.:
    Carleton University and the University of Ottawa
    Meeting Loc.:
    Virtual Conference
    Meeting Date:
    July 20-24, 2020
    Tag(s):
    Infrastructure, Natural language processing
    Permanent URL:
    http://dx.doi.org/10.17613/rsd8-we57
    Abstract:
    The study of the poetic features of text, especially their rhythmic structure when forming verses, pertains to the different traditions, whose scholars established the rules that might govern poetry. Within this context, the POSTDATA Project formalized a network of ontologies able to express any poetic expression and its analysis at the European level, enabling scholars all over Europe to interchange their data using Linked Open Data. However, varied research interests result in corpora that might not share the same facets of an analysis. To alleviate this concern and foster the completeness of the interchanged corpora, our team set out to build a software toolkit to assist in the analysis of poetry. This paper introduces PoetryLab, an extensible open source toolkit for syllabification, scansion (extraction of stress patterns), enjambment detection (syntactical units split in two lines), rhyme detection, and historical named entity recognition for Spanish poetry. Our toolkit achieves state of the art performance in the tasks for which reproducible alternatives exist.
    Metadata:
    Status:
    Published
    Last Updated:
    3 years ago
    License:
    Attribution

    Downloads

    Item Name: pptx dh2020-poetrylab-presentation.pptx
      Download
    Activity: Downloads: 79