Dmytro Lande and Oleh Dmytrenko
Methodology for Extracting of Key Words and Phrases and Building Directed Weighted Networks of Terms with Using Part-of-speech Tagging
//
Selected Papers of the XX International Scientific and Practical
Conference "Information Technologies and Security" (ITS 2020).
CEUR Workshop Proceedings (ceur-ws.org). - Vol-2859. - pp 168-177. ISSN 1613-0073.
Today, the rapid globalization of the information space leads to the
rise of huge arrays of text data on information resources, including unstructured
data. Therefore, developing new and improving existing methods and techniques
or finding necessary and relevant information from this text data is important.
This article is devoted to solving an urgent and important task related to conceptualization and further formalization in the form of a network of terms of unstructured data contained in thematic information flows distributed on the Internet.
This work proposes a new method for extracting of key words and phrases
from thematic information flows and a new method for determining the directions
of links between nodes in undirected networks of terms with using Part-of-speech
tagging. An idea of determining the weighted values of links between nodes in
the directed network of terms. Also, the holistic methodology of computerized
text corpora processing and building the directed weighted networks of terms (of
key words and phrases) that extracted with using a previous words' classification
process into parts of speech, which is based on the phrase syntactic context .
Part-of-speech tagging, are presented. Based on PoS tagging a statistical terms
weighting is applied as the next step. The proposed methodology is tested on the
example of a children.s allegorical story-tale, .The Little Prince. by Antoine de
Saint-Exup.ry. Applying the proposed method, the key terms were extracted and
the directed weighted network of words and phrases related to single key concepts in the studied text was built.
Keywords:
Text Corpus, Natural Language Processing, Part-of-Speech (PoS)
Tagging, Terminological Ontology, Network of Terms
|