K.V. Nenausnikov – Post-graduate Student, Junior Research Scientist, St. Petersburg Institute for Informatics and Automation of RAS
V.V. Alexandrov – Dr.Sc.(Eng.), Professor, Main Research Scientist, St. Petersburg Institute for Informatics and Automation of RAS
Natural language words are ambiguous. Resolving the ambiguity of a word in a text is one of the main directions in the development of automatic word processing. This task has not been completely solved. Under a number of conditions, some approaches allow solving the problem with high accuracy, but in this case the requirements for the analyzed sources increase significantly.
To develop an algorithm for representing word meanings for small cases. A method based on an associative ontological approach is developed.
In this paper a new method for the automatic construction of word meanings based on the associative ontological approach is proposed and tested. This approach can be applied to both small and large corpus of texts without preliminary preparation (annotation of corps, compilation of thesauri, etc). The model of the meaning of a word is represented by a knowledge graph. The meaning of the central word described by the connections between him and the surrounding words. The central word is a word, one of the meanings of which is reflected in the model. Connections image the associative closeness of words. All the words presented are often found in the context. Words located on the first level are words whose connection with the central meaning is unambiguous. Words located on the second level are also associated with a central meaning, but have a higher rating of the relationship between the describing words – this may be a word associated with the described value, collocation, syntactic turnover, etc.
This model does not make it possible to give a formal description of a word, by definition or by constructing an ontology, which complicates its formal analysis by an expert. However, due to the non-discreteness of word meanings and the absence of definitions for all possible meanings, a formal description for all word meanings may not be possible.
The method allows you to automatically select from the proposed set of texts descriptions of objects (objects, facts, etc.) and a general description. It is up to the expert to determine when the obtained concepts are sufficiently distant from each other, so with a small number of iterations of the union real objects (objects, facts, etc.) will be obtained, and with a large number of iterations abstract concepts of the word will be highlighted – various lexical meanings of the word. The system is convergent, therefore, it has a certain final state even with a large amount of input data.
The idea of having a single concept on one page of the Internet turns out to be erroneous, therefore, in the future, before processing, texts will be preliminarily divide into thematic areas.
- Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proc. of the 5th Annual International Conference on Systems Documentation. 1986. P. 24−26.
- Apresyan Yu.D., Zholkovskii A.K., Melchuk I.A., Sannikov V.Z. i dr. Tolkovo-kombinatornyi slovar sovremennogo russkogo yazyka. Opyty semantiko-sintaksicheskogo opisaniya russkoi leksiki. Wien. 1984. 992 c. (In Russian).
- Miller G.A. WordNet: A Lexical Database for English. Communications of the ACM. 1995. № 11(38). P. 39−41.
- Veronis J., Ide N.M. Word sense disambiguation with very large neural networks extracted from machine readable dictionaries. Proc. of the 13th Conference on Computational Linguistics. 1990. P. 389−394.
- Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space. Proc. of the International Conference on Learning Representations (ICLR 2013). 2013. P. 1−12.
- Iacobacci I., Pilehvar M.T., Navigli R. Embeddings for word sense disambiguation: An evaluation study. Proc. of the 54th Annual Meeting of the Association for Computational Linguistics. 2016. P. 897−907.
- Dominikan A.I. Lingvisticheskii korpus kak instrument identifikatsii znacheniya polisemantichnogo slova. Slovo i tekst: psikholingvisticheskii podkhod. 2018. № 18. S. 48−53. (In Russian).
- Kang M.Y., Kim B., Lee J.S. Word sense disambiguation using embedded word space. Journal of Computing Science and Engineering. 2017. № 11(1). P. 32−38.
- Karyaeva M., Braslavski P., Kiselev Y. Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings. International Conference on Analysis of Images, Social Networks and Texts. 2018. P. 76−87.
- Karyaeva M.S., Braslavskii P.I., Sokolov V.A. Vektornoe predstavlenie slov s semanticheskimi otnosheniyami: eksperimentalnye nablyudeniya. Modelirovanie i analiz informatsionnykh sistem. 2018. № 6(25). S. 726−733. (In Russian).
- Bartunov S., Kondrashkin D., Osokin A., Vetrov D. Breaking Sticks and Ambiguities with Adaptive Skip-gram [Elektronnyi resurs]. 2015. URL = https://arxiv.org/abs/1502.07257 (15.08.2019).
- Ganguly D., Dwaipayan R., Mandar M., Gareth J.J. Word embedding based generalized language model for information retrieval. Proc. of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. P. 795−798.
- Kuleshov S.V., Zaitseva A.A., Markov S.V. Assotsiativno-ontologicheskii podkhod k obrabotke tekstov na estestvennom yazyke. Intellektualnye tekhnologii na transporte. 2015. № 4. S. 40−45. (In Russian).
- Paulheim H. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web. 2017. № 8(3). P. 489−508.
- Ciampaglia G.L., Shiralkar P., Rocha L.M., Bollen J., Menczer F., Flammini A. Computational fact checking from knowledge networks. PLoS ONE. 2015. № 10(6).
- Uyar A., Aliyu F.M. Evaluating search features of Google Knowledge Graph and Bing Satori. Online Information Review. 2015. № 39(2). P. 197−213.