Здесь нашел интересный обзор

UDC: 51-74

 

Murodov P.S., Prutskov A.V.

 

Mathematical methods are widely used to formalize the solution of problems of automatic text processing, including text classification. To classify texts, a “naive” Bayes classifier, methods of -nearest neighbors, decision trees, support vector machines, distribution of letter combinations (character -grams), logistic regression, and approaches based on artificial neural networks are used. These methods are used by modern computational linguistics.

Classical computational linguistics considers text as a carrier of meaning. The purpose of the study is to formalize the definition of the topics of scientific articles using syntactically related words by a mathematical model, and therefore classical computer linguistics. We propose an abstract mathematical model of fuzzy classification based on common objects. The model assumes that some objects belong to only one class and are class identifiers. The result of a fuzzy classification can be one or more classes. For each class a degree of membership is determined. We propose a specification of the model with in which the common objects are syntactically related pairs of words. We conclude syntactically related words are more promising in research than verbal bigrams determined by sentence order. We propose the model in our international study. The study involve the creation of a corpus of scientific articles and their distribution by topic. The corpus will be used to classify scientific articles by topic.

 

Key words: natural text processing, text classification, syntactic analysis, mathematical model, fuzzy classification, scientific article, text corpus.

 

Information about the authors

 

Murodov Parviz Saizhafarovich – Tajik National University,

Ph.D student of the Department of Informatics.

Address: 734025, Dushanbe, Republic of Tajikistan, Rudaki Avenue, 17.

Phone: (+992) 904-49-77-11. E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it..

 

Prutzkow Alexander Viktorovich – Ryazan State Radio Engineering University named after V.F. Utkin,

Doctor of Engineering, professor of the department of computational and applied mathematics.

Address: 390005, Ryazan, Russian Federation, Gagarina str., 59/1.

Phone: +7 (4912) 72-03-64. E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it..

Murodov P.S., Prutskov A.V.

 

Reviewer: Komilien F.S., Doctor of Physical and Mathematical Sciences, Professor

 

REFERENCES

 

  1. Prutzkow, A.V. Problems of automatic text processing in natural languages and possible mathematical approaches to their solutions / A.V. Prutzkow // Bulletin of the Ryazan State Radio Engineering University. – 2016. – No. 1 (55). – Pp. 81-86.
  2. Prutzkow, A.V. Mathematical and algorithmic formalization of models of morphological analysis and synthesis of word forms in natural languages / A.V. Prutzkow // Cloud of Science. – 2018. – Vol. 5. – No. 4. – Pp. 729-748.
  3. Prutzkow, A.V. Determination and generation of complex forms of words in natural languages in morphological analysis and synthesis / A.V. Prutzkow // News of the Taganrog State Radio Engineering University. – 2006. – Vol. 70. – No. 15. – Pp. 10-14.
  4. Kosimov, A.A. Determining a specialty code using symbolic unigrams / A.A. Kosimov // Information exchange in interdisciplinary research: collection. tr. All-Russian scientific-practical conf. with international participation. – Ryazan: RGRTU, 2022. – 163 p.
  5. Kurbonov, N.M. On automatic recognition based on unigrams of ciphers of abstracts on pedagogy / N.M. Kurbonov // Polytechnic Bulletin. Intelligence Series. Innovation. Investments. – 2021. – No. 3 (55). – Pp. 47-51.
  6. Karimov A.A. On automatic recognition based on word forms of ciphers of abstracts in economics / A.A. Karimov // Polytechnic Bulletin. Intelligence Series. Innovation. Investments. – 2021. – No. 4 (55). – Pp. 54-58.
  7. Polyakov, P.Yu. Study of the applicability of thematic classification methods in the problem of classifying book reviews / P.Yu. Polyakov, M.V. Kalinina, V.V. Pleshko // Computer linguistics and intellectual technologies: based on annual materials. International conf. Dialogue. – M.: IPI RAS, 2012. – Vol. 2. – Pp. 51-59.
  8. Musidze, V.T. Application of methods for classifying text data in accordance with a given topic in contextual processing of general scientific information / V.T. Musidze, I.M. Lazareva // Problems of the Arctic region: tr. XVI International scientific conf. – Murmansk: Polygraphist, 2017. – Pp. 78-82.
  9. Kozlov, P.A. Comparative analysis of binary classifiers on an array of scientific publications / P.A. Kozlov, A.S. Mokhov, N.A. Nazarov, Sh.I. Safin, V.O. Tolcheev // Factory laboratory. Diagnostics of materials. – 2022. – Vol. 88. – No. 7. – Pp. 79-87.
  10. Maksudov, Kh. T. Comparative analysis of the “decision tree” and “random forest” methods - in determining the specialty of scientific texts / Kh. T. Maksudov, B. B. Inomov, N. M. Mullojanov // Bulletin of the Tajik National University. Natural Sciences Series. – 2019. – No. 3. – Pp. 23-29.
  11. Leonova, Yu.V. On the approach to solving the problem of thematic classification of abstracts / Yu.V. Leonova, A.M. Fedotov // Bulletin of Novosibirsk State University. Series: Information technologies. – 2017. – Vol. 15. – No. 1. – Pp. 47-58.
  12. Gürbüz, T. Research Article Classification with Text Mining Method // T. Gürbüz, Ç. Uluyol // Concurrency Computat Pract Exper. – 2023. – Vol. 35. – No. 1. – P.7437. DOI: 10.1002/cpe.7437.
  13. Dien, T.T. et al. Article Classification Using Natural Language Processing and Machine Learning / T.T. Dien et al. // 2019 International Conference on Advanced Computing and Applications (ACOMP). – 2019. – Pp. 78–84.
  14. Rivest, M. Article-Level Classification of Scientific Publications: A Comparison of Deep Learning, Direct Citation and Bibliographic Coupling / M. Rivest, E. Vignola-Gagne, E. Archambault // PLoS ONE. – 2021. – Vol. 16. – No. 5. – P.0251493. DOI: 10.1371/journal.pone.0251493.
  15. Batura, T.V. Methods for automatic text classification / T.V. Batura // Software products and systems. – 2017. – Vol. 30. – No. 1. – Pp. 85-99.
  16. Danilov, G.V. Comparative analysis of statistical methods for classifying scientific publications in the field of medicine / G.V. Danilov, V.V. Zhukov, A.S. Kulikov, E.S. Makashova, N.A. Mitin, Yu.N. Orlov // Computer research and modeling. – 2020. – Vol. 12. – Issue. 4. – Pp. 921-933. DOI: 10.20537/2076-7633-2020-12-4-921-933.
  17. Li, Q. A Survey on Text Classification: From Traditional to Deep Learning / Q. Li, H. Peng, J. Li, C. Xia, R. Yang, L. Sun, P. Yu, L. He / /ACM Trans. Intel. Syst. Technol. – 2022. – Vol. 13. – No. 2. – Art. 31 (April 2022). – 41 p. DOI: 10.1145/3495162.
  18. Gasparetto, A. A Survey on Text Classification Algorithms: From Text to Predictions / A. Gasparetto, M. Marcuzzo, A. Zangari, A. Albarelli // Information. – 2022. – Vol. 13. – No. 83. – 39 p. DOI: 10.3390/info13020083.
  19. Melchuk, I.A. Russian language in the “Meaning-Text” model. Semantics, syntax / I.A. Melchuk. – M.: Sch. “Languages of Russian culture”, 2005. – 682 p.
  20. Artificial intelligence. In 3 books. Book 1. Communication systems and expert systems: a reference book / ed. E.V. Popova. – M.: Radio and Communications, 1990. – 464 p.
  21. Gladky, A.V. Elements of mathematical linguistics / A.V. Gladky, I.A. Melchuk. – M.: Nauka, 1969. – Pp. 193.

 

 

Article received: 19.12.2023

Approved after review: 26.02.2024

Accepted for publication: 29.03.2024

 

   
© ALLROUNDER