Machine Learning-powered Taxonomization: AI Lends Taxonomists a Hand | Alena Vasilevich
Автор: Connected Data
Загружено: 2025-05-08
Просмотров: 208
                Описание:
                    In realm of data-driven businesses, formalized knowledge is a valuable resource for AI projects, created at great expense.
IATE, with almost one million concepts storing multilingual terms and metadata, holds a large part of the textual knowledge of the EU. However, it can only be accessed lexically, and the database concepts stand alone.
Taxonomization is linking a flat set of concepts into a hierarchical knowledge graph. So if IATE were converted in a full-fledged ontology, its data could not only be consumed by linguists, but would also become accessible for machines through e.g. a SPARQL endpoint.
In this talk, we will present our approach to a semi-automatic generation of taxonomised concept maps, elevating a sub-domain of IATE terminology into a multilingual knowledge graph. We taxonomized a flat list of concepts within the COVID sub-domain, benchmarking two approaches to tackle this task: automatic concept map creation using an enhanced ML-powered language model and manual creation of the graph by a linguist expert.
We will dwell on performance and resource-saving advantages of our collaborative method, made easy by Coreon user-friendly UI, and show how the achieved productivity rate can make the taxonomization of even larger terminology databases economically viable.
To demonstrate empirically the effectiveness of the semi-automatic approach in a typical industry use case scenario, the resulting IATE/Covid graph was used to initialize a CNN for a multilingual document classification task. Leveraging the created taxonomy, we got a classification granularity that is not reachable by state-of-the-art models, such as non-initialised CNNs and zero-shot classifiers.
--
Alena Vasilevich. Computational Linguist, Coreon GmbH
Alena Vasilevich holds an international MSc degree in Language Science and Technology from Saarland University. At Coreon, she focuses on pragmatic data conversion, hands-on natural language processing, and analytics.
Having dived into trees and graphs, she concentrates on leveraging structured data in typical NLP scenarios. Alena's interests revolve around multilingual NLP and NLU and all things Python.
--
Welcome to Connected Data London's #ThrowbackThursday
Every Thursday at 3pm GMT, we are releasing gems from our vault on #YouTube
Tune in and learn from leaders and innovators; subscribe to our channel and watch premieres as they are released! 
#ontology #taxonomy #AI #MachineLearning #DataScience #DeepLearning                
                
Повторяем попытку...
 
                Доступные форматы для скачивания:
Скачать видео
- 
                                Информация по загрузке: