Statistics of Numerals in the Text: Development of a New Method of Stylometry
- DOI
- 10.2991/aebmr.k.200114.106How to use a DOI?
- Keywords
- stylometry, attribution of texts, text processing, numerals, first significant digit
- Abstract
Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurrence in a coherent literary texts. The first approach is related to the analysis of the frequency distribution of various first significant digits of numerals occurring in the text. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author’s style feature, consistently manifested in all (sufficiently long) literary texts of any author. This approach is convenient for quick testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of the numerals themselves (not their first significant digits). The approach yields non-trivial information about the author’s style peculiarities and is suited for the advanced study of authorial texts. The proposed approaches are illustrated by examples of computer analysis of the literary works by L. Dobychin and A. Platonov.
- Copyright
- © 2020, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Andrei V. Zenkov PY - 2020 DA - 2020/01/18 TI - Statistics of Numerals in the Text: Development of a New Method of Stylometry BT - Proceedings of the First International Volga Region Conference on Economics, Humanities and Sports (FICEHS 2019) PB - Atlantis Press SP - 448 EP - 451 SN - 2352-5428 UR - https://doi.org/10.2991/aebmr.k.200114.106 DO - 10.2991/aebmr.k.200114.106 ID - Zenkov2020 ER -