Dmytro Lande and Oleh Dmytrenko
Methodology for Extracting of Key Words and Phrases and Building Directed Weighted Networks of Terms with Using Part-of-speech Tagging

// Selected Papers of the XX International Scientific and Practical Conference "Information Technologies and Security" (ITS 2020). CEUR Workshop Proceedings (ceur-ws.org). - Vol-2859. - pp 168-177. ISSN 1613-0073.

Today, the rapid globalization of the information space leads to the rise of huge arrays of text data on information resources, including unstructured data. Therefore, developing new and improving existing methods and techniques

or finding necessary and relevant information from this text data is important. This article is devoted to solving an urgent and important task related to conceptualization and further formalization in the form of a network of terms of unstructured data contained in thematic information flows distributed on the Internet.

This work proposes a new method for extracting of key words and phrases from thematic information flows and a new method for determining the directions of links between nodes in undirected networks of terms with using Part-of-speech tagging. An idea of determining the weighted values of links between nodes in the directed network of terms. Also, the holistic methodology of computerized text corpora processing and building the directed weighted networks of terms (of key words and phrases) that extracted with using a previous words' classification process into parts of speech, which is based on the phrase syntactic context . Part-of-speech tagging, are presented. Based on PoS tagging a statistical terms weighting is applied as the next step. The proposed methodology is tested on the example of a children.s allegorical story-tale, .The Little Prince. by Antoine de Saint-Exup.ry. Applying the proposed method, the key terms were extracted and the directed weighted network of words and phrases related to single key concepts in the studied text was built.

Keywords: Text Corpus, Natural Language Processing, Part-of-Speech (PoS) Tagging, Terminological Ontology, Network of Terms

PDF

HOME