-
Quantifying Commentator Interest (as a feature of Biblical text)
- Author(s):
- Joshua Waxman (see profile)
- Date:
- 2021
- Group(s):
- CSDH-SCHN 2021: Making the Network
- Subject(s):
- Philology, Digital humanities, Natural language processing (Computer science)
- Item Type:
- Lecture
- Tag(s):
- Biblical studies, Natural language processing
- Permanent URL:
- http://dx.doi.org/10.17613/v6xy-3g14
- Abstract:
- The Hebrew Biblical corpus is composed of text of different genres. There are narrative sections, genealogical accounts, poetry, commands as to daily and seasonal ritual practices, legal systems, details of sacrifices, and instructions on how to construct the Tabernacle. The content, and often style, of each of these differs from the others, and might be distinguished by features such as tf-idf vectors, sentence length, and grammatical structure. This corpus has been the focus of many Biblical commentators. Focusing just on a few classical Jewish Biblical commentators (Rashi, Ibn Ezra, Ramban, Rashbam, Seforno) who composed commentary across the Pentateuch, we quickly observe that these commentators often pay attention to different subsets of Biblical verses. This attention or "interest" is influenced by the commentator's purpose and vision. For instance, Rashi is concerned with presenting a consistent and comprehensive interpretation drawn from earlier traditional sources, even where these are not the most literal straightforward readings. However, there is not much to say regarding genealogical verses. Ibn Ezra is concerned with the literal interpretation as well as grammatical analyses of difficult words and grammatical constructions, and won't be so concerned with details of sacrifices, either because he agrees with Rashi (who preceded him), or because he does not read as much into every word choice. To quantify commentator interest, for each chapter, we divide the # of verses discussed by a particular commentator by the total verses in that chapter. Each chapter is thus represented by a vector of commentator interest ratios. We label each chapter with multiple tags corresponding to content type, as different spans of verses might correspond to different content, for instance a narrative section ending with a genealogical list. We trained binary Logistic Regression classifiers for each genre using these vectors as features, and see some good results.
- Metadata:
- xml
- Status:
- Published
- Last Updated:
- 2 years ago
- License:
- All-Rights-Granted