the natural language data
a semantic network
a logic conclusion
an estimation of relevance of the data
the thesaurus of subject domain
One of the basic functions which are carried out IAS, association of the facts revealed in various documents is. Use of the given function assumes creation of the mechanism of a logic conclusion which as initial yielded takes results of extraction of the facts from texts in Russian or English languages of various documents.
By results of the analysis of possible ways of realisation of the mechanism of a logic conclusion, it is offered to build a logic conclusion on the basis of semantic networks. Thus the facts taken from documents, will form set of tops relations between which are represented by the arches (communications) defining their semantics .
Realisation of function of association of the facts revealed in various documents is based on the mathematical apparatus assuming working out of a method of formation of a semantic network. Thus, it is supposed that one of the basic requirements to the generated network is necessity of the description of the facts revealed in various documents.
The method of formation of the semantic network describing communications between the facts is developed. The initial data for method realisation is results procedure of extraction of the facts from the documents containing texts in Russian or English languages.
The specified procedure, provides presence in IAS knowledge bases.
Generalised model knowledge base(BZ) personal computer IAS is defined by the following:
Where – The thesaurus of subject domain described in the BZ; – Set of rules of a rubrication; – Model of base of rules.
Model BZ should be presented according to expression:
Where – Set of descriptions of objects of research; – Set of descriptions of objects and an object of research; – Set of descriptions of sources of the data;
The description of the facts at realisation demanded функционала is made by model MF set according to expression:
Where SlotS –85; slot, containing elements of set of names of subjects of the facts; SlotO – slot, containing elements of set of names of objects of the facts; SlotP – slot, containing elements of set of the predicates defining the relation between elements SlotS and SlotO).
The basic procedures necessary for realisation of extraction and description of the facts from texts in Russian or English languages is the following:
Structurization («marking») of the text on paragraphs and offers;
Structurization («marking») of offers on components (words, blanks, punctuation marks etc.);
Formation of sets (SlotS, SlotO);
Formation of the facts - relations between sets (SlotP, can be two kinds: the Subject-predicate-object, Object-predicate-property);
The description of the facts.
In model BZ any sets of poorly structured documents can contain. After realisation functionals extraction of the facts it is necessary on the given facts of each of documents BZ to spend document correlation to one or several lexical sets of subject domain, and also to solve a grouping problem in corresponding priznakovo-faktovye sets on the basis of degree of semantic affinity. That is, it is necessary to spend an estimation relevanstions documents on the basis of the revealed facts.
The Facts taken from each document, represent, as a matter of fact, the metadata formulated on the basis of given subject domain set in the form of semantic network.