Digital Library NAES of Ukraine

Integration of large language models with semantic processing tools as an instrument for knowledge digitization

- Сініцин, Ігор Петрович (orcid.org/0000-0002-4120-0784), Рогушина, Юлія Віталіївна (orcid.org/0000-0001-7958-2557) and Юрченко, Костянтин Юрійович (orcid.org/0000-0003-3150-0027) (2025) Integration of large language models with semantic processing tools as an instrument for knowledge digitization Проблеми програмування, 2. pp. 63-76. ISSN 1727-4907

[thumbnail of 838-1899-1-SM.pdf] Text
838-1899-1-SM.pdf

Download (629kB)

Abstract

The paper addresses the task of automating the analysis, generation, and management of complex natural language documents based on the integration of generative artificial intelligence with semantic technologies, in particular Semantic MediaWiki. It analyzes how the use of ontological models of subject domains and semantic markup makes it possible to prevent such critical shortcomings of large language models as the tendency to “hallucinations” (generation of false statements) and the lack of transparency in decision explanations. This integration is explored using the example of the instrumental system “LINZA,” which is being developed for automated intelligent processing of content from heterogeneous documents with complex and weakly formalized structure, with the aim of generating natural language reports according to specified requirements in various domains, such as public administration, jurisprudence, certification, and standardization. The system is based on the combination of the flexibility and adaptability of large language models with formalized ontological knowledge and support for semantic queries about pertinent facts in the Semantic MediaWiki environment or external sources (Retrieval-Augmented Generation). The proposed approach will significantly reduce the risks of typical errors in generative models and ensure factual accuracy and transparency in the decision-making process. Special attention is paid to mechanisms of transparency, reliability, and the possibility of human control to increase trust in the generated data, which is especially important in areas with high information security requirements, and ensures greater confidence in automatically created documents. The multi-level architecture of the system defines the tasks of agents and services that perform specialized functions of data collection, analysis, transformation, and verification, and ensures flexibility, scalability, and adaptability of the system to changes in input data and requirements.

Item Type: Article
Keywords: agent technologies, large language models, LLM, Semantic MediaWiki, semantic technologies, knowledge base, formalized documents.
Subjects: Science and knowledge. Organization. Computer science. Information. Documentation. Librarianship. Institutions. Publications > 00 Prolegomena. Fundamentals of knowledge and culture. Propaedeutics > 004 Computer science and technology. Computing. Data processing > 004.4 Software > 004.42 Computer programming. Computer programs
Science and knowledge. Organization. Computer science. Information. Documentation. Librarianship. Institutions. Publications > 00 Prolegomena. Fundamentals of knowledge and culture. Propaedeutics > 004 Computer science and technology. Computing. Data processing > 004.4 Software > 004.43 Computer languages
Science and knowledge. Organization. Computer science. Information. Documentation. Librarianship. Institutions. Publications > 3 Social Sciences > 37 Education > 37.01/.09 Special auxiliary table for theory, principles, methods and organization of education > 37.09 Organization of instruction
Divisions: Institute for Digitalisation of Education > Department of Digital Transformation of the NAES of Ukraine
Depositing User: н.с. Х.В. Середа
Date Deposited: 26 Jan 2026 17:17
Last Modified: 26 Jan 2026 17:17
URI: https://lib.iitta.gov.ua/id/eprint/748320

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item