Creating Directed Weighted Network of Terms Based on Analysis of Text Corpora
One of the most urgent problems of natural language processing - formalization and creation ontological models of subject domains based on the text corpora is considered. In this work, a new approach for determining the weights of links in the network of terms that correspond to certain concepts of the considered subject domain is considered. In particular, applying the proposed approach for determining the weights of links in the network of terms, the terminological ontology of the subject domain that related to an ecological footprint was created as an approbation. Further analysis of the created model made it possible to determine the most influential and significant links between the corresponding nodes in the network of terms that in turn correspond to certain concepts of the considered subject domain. The proposed and considered approaches and methods were programmed and using the software for modeling and visualization of graphs . Gephi the built directed networks of terms were visualized for better visual perception. The weighted directed networks of terms built according to the proposed approach can be used for automatically creating terminological ontologies of subject domains with the participation of experts. Also, the research result can be used to create personal search interfaces for users of information retrieval systems and also can be used in navigation systems in data-bases. It should help users of such systems simplify the process of searching the relevant information.
Keywords: information space, information flow, text corpus, terminological ontology, subject domain, horizontal visibility graph, undirected networks of terms, directed weighted networks of terms |