cover
Lande, Dmitry; Kuzminskyi, Viktor
Automatic Extraction and Analysis of Direct Speech from Texts Using Large Language Models

Available at SSRN: https://ssrn.com/abstract=4953304,
DOI: http://dx.doi.org/10.2139/ssrn.4953304 (Oct 17, 2024). - 15 p.

This paper proposes an approach for the automatic extraction of direct quotes from texts using large language models (LLMs) and their analysis to build semantic networks of authors and concepts. After retrieving relevant documents, LLMs are employed to extract quotes, their authors, and metadata, which are stored in a structured JSON format. Based on this data, a semantic network is constructed, which is then clustered using LLMs. The concept of a "swarm of virtual experts" is introduced for more precise extraction of key concepts. The model illustrates how authors form groups based on shared interests and discussion topics. One of the innovative aspects of the approach is the automatic generation of cluster names.
Keywords: Quote extraction, Semantic network, Clustering, Swarm of virtual experts, Automatic text analysis, Large language models (LLM)