-
Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust
- Author(s):
- NIKOLAUS PARULIAN (see profile)
- Contributor(s):
- Stephen Downie, Ryan Dubnicek, Kristina Hall, Yuerong Hu
- Date:
- 2020
- Group(s):
- DH2020
- Subject(s):
- Digital libraries, Machine learning, Natural language processing (Computer science)
- Item Type:
- Conference proceeding
- Conf. Title:
- Digital Humanities 2020
- Conf. Org.:
- ADVANCE ISSUE OF DIGITAL SCHOLARSHIP IN THE HUMANITIES
- Conf. Loc.:
- Carleton University and the University of Ottawa, Ottawa, Canada
- Conf. Date:
- 22-24 July 2020
- Tag(s):
- Natural language processing
- Permanent URL:
- http://dx.doi.org/10.17613/3nfw-tx25
- Abstract:
- HathiTrust currently provides metadata, scanned images, and full text for all public domain volumes. However, it’s likely there is content that is of interest to scholars and free from restriction within the front matter of most volumes, regardless of rights status. For example, the title page or table of contents may contain information that is likely non-expressive and useful to understanding the content’s structure and subject matter. It’s also likely that some volumes include materials that have expressive/creative content in the first 20 pages, so front matter cannot be made open for all volumes without understanding the most frequent type of content within the first 20 pages. This task is time-prohibitive for entirely manual exploration, so we seek to evaluate a machine learning approach for this task.
- Metadata:
- xml
- Status:
- Published
- Last Updated:
- 3 years ago
- License:
- All Rights Reserved
Downloads
Item Name: ischool_poster_htrc_parulian.pdf
Download View in browser Activity: Downloads: 87
-
Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust