Abstract
This article analyses the influence of various combinations of mixed-level stylometric characteristics on the quality of verification of the authorship of Russian, English and French prose texts. The study is carried out both for low-level stylometric characteristics based on words and characters, and for higher-level structure ones. All stylometric characteristics are calculated automatically using the ProseRhythmDetector program. This approach provides the analyses of works of a large volume and many writers at the same time. In the course of the work, character-level, word-level, and structure-level stylometric vectors are associated with each text. During the experiments, the sets of parameters of these three levels were combined with each other in all possible ways. The resulting vectors of stylometric characteristics were submitted to the input of various classifiers to perform verification and identify the most suitable classifier for solving the problem. The best results were obtained using the AdaBoost classifier. The average F-measure for all languages was over 92%. Detailed verification quality assessments are given for each author and analyzed. The use of high-level stylometric characteristics, in particular, the frequency of using N-grams of POS tags, opens the prospect of a more detailed analysis of author’s styles. The results of the experiments show that when combining the characteristics of the structure level with the characteristics of the word level and/or character level, the most accurate results of authorship verification for literary texts in Russian, English, and French are obtained. Additionally, the authors concluded that stylometric characteristics have different degrees of influence on the quality of authorship verification for different languages.
REFERENCES
Tuchkova, N.P. and Ataeva, O.M., Approaches to knowledge extraction in scientific subject domains, Inf. Mat. Tekhnol. Nauke Upr., 2020, no. 2, pp. 5–18. https://doi.org/10.38028/ESI.2020.18.2.001
Altamimi, A., Clarke, N., Furnell, S., and Li, F., Multi-platform authorship verification, CECC 2019: Proc. Third Central European Cybersecurity Conf., Munich, 2019, New York: Association for Computing Machinery, 2019, p. 13. https://doi.org/10.1145/3360664.3360677
Halvani, O., Graner, L., and Regev, R., TAVeer: An interpretable topic-agnostic authorship verification method, ARES ’20: Proc. 15th Int. Conf. on Availability, Reliability and Security, Ireland, 2020, New York: Association for Computing Machinery, 2020, p. 41. https://doi.org/10.1145/3407023.3409194
Kestemont, M., Martens, G., and Ries, T., A computational approach to authorship verification of Johann Wolfgang Goethe’s contributions to the Frankfurter gelehrte Anzeigen (1772–73), J. Eur. Periodical Stud., 2019, vol. 4, no. 1, pp. 115–143. https://doi.org/10.21825/jeps.v4i1.10188
Corbara, S., Moreo, A., Sebastiani, F., and Tavoni, M., The Epistle to Cangrande through the lens of computational authorship verification, New Trends in Image Analysis and Processing—ICIAP 2019, Cristani, M., Prati, A., Lanz, O., Messelodi, S., and Sebe, N., Eds., Lecture Notes in Computer Science, vol. 11808, Cham: Springer, 2019, pp. 148–158. https://doi.org/10.1007/978-3-030-30754-7_15
Drozdov, V.A., The authorship of the poem Ushshaq-Nama from the prospect of academic orientalist studies and modern computer technologies, Orientalistika, 2020, vol. 3, no. 5, pp. 1360–1378. https://doi.org/10.31696/2618-7043-2020-3-5-1360-1378
Kestemont, M., Manjavacas, E., Markov, I., Bevendor, J., Wiegmann, M., Stamatatos, E., Potthast, M., and Stein, B., Overview of the cross-domain authorship verification task at pan 2020, CEUR Workshop Proc., 2020, vol. 2696, p. 264.
Potha, N. and Stamatatos, E., Intrinsic author verification using topic modeling, SETN ’18: Proc. 10th Hellenic Conf. on Artificial Intelligence, Patras, Greece, 2018, New York: Association for Computing Machinery, 2018, p. 20. https://doi.org/10.1145/3200947.3201013
Adamovic, S., Miskovic, V., Milosavljevic, M., Sarac, M., and Veinovic, M., Automated language-independent authorship verification (for Indo-European languages), J. Assoc. Inf. Sci. Technol., 2019, vol. 70, no. 8, pp. 858–871. https://doi.org/10.1002/asi.24163
Boenningho, B., Hessler, S., Kolossa, D., and Nickel, R.M., Explainable authorship verification in social media via attention-based similarity learning, IEEE Int. Conf. on Big Data (Big Data), Los Angeles, 2019, IEEE, 2019, pp. 36–45. https://doi.org/10.1109/BigData47090.2019.9005650
Benzebouchi, N.E., Azizi, N., Aldwairi, M., and Farah, N., Multi-classifier system for authorship verification task using word embeddings, 2nd Int. Conf. on Natural Language and Speech Processing (ICNLSP), Algiers, Algeria, 2018, IEEE, 2018, pp. 1–6. https://doi.org/10.1109/ICNLSP.2018.8374391
Li, J.S., Chen, L.-C., Monaco, J.V., Singh, P., and Tappert, C.C., A comparison of classifiers and features for authorship authentication of social networking messages, Concurrency Comput.: Pract. Exper., 2017, vol. 29, no. 14, e3918. https://doi.org/10.1002/cpe.3918
Tuccinardi, E., An application of a profile-based method for authorship verification: Investigating the authenticity of Pliny the Younger’s letter to Trajan concerning the Christians, Digital Scholarship Humanit., 2017, vol. 32, no. 2, pp. 435–447. https://doi.org/10.1093/llc/fqw001
Reddy, P.B., Mohan, T.M., Raja, P.V.K., and Reddy, T.R., A novel approach for authorship verification, Data Engineering and Communication Technology, Raju, K., Senkerik, R., Lanka, S., and Rajagopal, V., Eds., Advances in Intelligent Systems and Computing, Singapore: Springer, 2020, pp. 441–448. https://doi.org/10.1007/978-981-15-1097-7_37
Castillo, E., Cervantes, O., and Vilarino, D., Authorship verification using a graph knowledge discovery approach, J. Intell. Fuzzy Syst., 2019, vol. 36, no. 6, pp. 6075–6087. https://doi.org/10.3233/JIFS-181934
Ahmed, H., The role of linguistic feature categories in authorship verification, Procedia Comput. Sci., 2018, vol. 142, pp. 214–221. https://doi.org/10.1016/j.procs.2018.10.478
Al-Khatib, M.A. and Al-qaoud, J.K., Authorship verification of opinion articles in online newspapers using the idiolect of author: A comparative study, Inf., Commun. Soc., 2020, vol. 24, no. 11, pp. 1603–1621. https://doi.org/10.1080/1369118X.2020.1716039
Lagutina, K., Lagutina, N., Boychuk, E., Vorontsova, I., Shliakhtina, E., Belyaeva, O., and Paramonov, I., A survey on stylometric text features, 25th Conf. of Open Innovations Association (FRUCT), Helsinki, 2019, IEEE, 2019, pp. 184–195. https://doi.org/10.23919/FRUCT48121.2019.8981504
Polin, Y., Zudilova, T., Ananchenko, I., and Voytiuk, T., Decision trees in classification problems: Application features and methods for improving the quality of classification, Sovrem. Naukoemkie Tekhnol., 2020, no. 9, pp. 59–63. https://doi.org/10.17513/snt.38215
Xu, B., Guo, X., Ye, Y., and Cheng, J., An improved random forest classifier for text categorization, J. Comput., 2012, vol. 7, no. 12, pp. 2913–2920. https://doi.org/10.4304/jcp.7.12.2913-2920
Kim, S.-B., Han, K.-S., Rim, H.-C., and Myaeng, S.H., Some effective techniques for naive Bayes text classification, IEEE Trans. Knowl. Data Eng., 2006, vol. 18, no. 11, pp. 1457–1466. https://doi.org/10.1109/TKDE.2006.180
Lagutina, K., Poletaev, A., Lagutina, N., Boychuk, E., and Paramonov, I., Automatic extraction of rhythm figures and analysis of their dynamics in prose of 19th–21st centuries, 26th Conf. of Open Innovations Association (FRUCT), Yaroslavl, 2020, IEEE, 2020, pp. 247–255. https://doi.org/10.23919/FRUCT48808.2020.9087430
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by A. Kolemesin
About this article
Cite this article
Manakhova, A.M., Lagutina, N.S. Analysis of the Influence of Mixed-Level Stylometric Characteristics on the Verification of Authors of Literary Works. Aut. Control Comp. Sci. 56, 744–761 (2022). https://doi.org/10.3103/S0146411622070148
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411622070148