-
Toiling with the Pāli Canon
- Author(s):
- David Alfter, Frederik Elwert (see profile) , Jürgen Knauth, Manuel Pachurka, Sven Sellmer, Sven Wortmann
- Date:
- 2015
- Subject(s):
- Religions, Linguistics
- Item Type:
- Conference paper
- Conf. Title:
- Corpus-based Research in the Humanities
- Conf. Org.:
- Institute of Computer Science, Polish Academy of Sciences
- Conf. Loc.:
- Warsaw, Poland
- Conf. Date:
- 10 December 2015
- Tag(s):
- buddhist studies, computational linguistics, Comparative religion
- Permanent URL:
- http://dx.doi.org/10.17613/M6P023
- Abstract:
- The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi resolution), lemmatisation, and POS tagging.
- Metadata:
- xml
- Published as:
- Conference proceeding Show details
- Pub. Date:
- 2015
- Proceeding:
- Proceedings of the Workshop on Corpus-Based Research in the Humanities
- Page Range:
- 39 - 48
- Status:
- Published
- Last Updated:
- 7 years ago
- License:
- All Rights Reserved
Downloads
Item Name: elwert-et-al_2015_toiling-with-the-pāli-canon.pdf
Download View in browser Activity: Downloads: 253