Meet the Experts: Preprocessing text data with TextPrep (KODAQS Toolbox)
Автор: GESIS - Leibniz-Institut für Sozialwissenschaften
Загружено: 2026-02-13
Просмотров: 18
Описание:
The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), equips researchers with practical tools and tutorials for assessing and improving data quality across survey, digital behavioral, and linked data. In this talk, we highlight TextPrep, a tool designed to assess how preprocessing methods, such as automated translation, minor text operations, and stopword removal, can significantly improve the quality of social media data depended on use case, data types, and methods. By systematically evaluating and comparing different approaches (e.g. different stopword lists), it is highlighted how they can alter textual content and impact data interpretation and quality. Text similarity measures, such as word count or cosine similarity, are used to document differences between the various preprocessing strategies and packages. Also Structural Topic Modeling is applied to compare different preprocessing stages using semantic coherence and exclusivity. With TextPrep, all of this can be assessed and implemented in an automated process through commented R code, which can be adapted and transfered to different use cases.
Presenter: Yannik Peters
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: