Publishing house Radiotekhnika

"Publishing house Radiotekhnika":
scientific and technical literature.
Books and journals of publishing houses: IPRZHR, RS-PRESS, SCIENCE-PRESS

Тел.: +7 (495) 625-9241


Extraction of the significant information from files of not structured texts


А.I. Zaharenkov, A.V. Sokolov

Function of extraction of the facts from the natural language data should be one of the basic functions of information-analytical system. For this purpose in structure IAS the model of the facts and knowledge base model should be generated. This models should meet following requirements: Thesaurus presence; The data demanding processing should be rubricated; The data, should be registered taking into account maintenance of contextually-reference search; Presence of base of the rules, expanded while in service IAS. The model of a following kind is applied to the description of the facts: , Where – slot with a name of the subject of the fact (initiates action); – slot with a name of object of the fact (the object describes result of action); – slot with a predicate (the semantic relation between the subject and object); Existing methods of extraction of the facts [1–4] are based on extraction from text given сущностей (names of the organisations, names of settlements, etc.) with the subsequent to searches of interrelations between them. However the specified methods have following restrictions: Possibility of formation of summaries and endurances from considered text documents is not provided; There is no possibility of elimination of uncertainty in the presence of identical factors in various sources of the initial data. The method of extraction of the facts from the text documents presented in Russian and English languages, based on use of the rules set by experts is developed. For realisation of a method of extraction of the facts from texts the knowledge base defining rules of extraction entities and relations between them is preliminary generated. The method represents sequence of stages on revealing of minimum syntactic units and to an establishment of communications between them with maintenance of automatic formation of summaries.

© Издательство «РАДИОТЕХНИКА», 2004-2017            Тел.: (495) 625-9241                   Designed by [SWAP]Studio