How to Read Messy Tab-Delimited .DAT Files with Grouped Lines in R

Автор: vlogommentary

Загружено: 2026-01-23

Просмотров: 1

Описание: Learn a clean and efficient method to read irregular tab-delimited .DAT files in R by grouping related lines and parsing them into structured data.
---
This video is based on the question https://stackoverflow.com/q/79376510/ asked by the user 'afleishman' ( https://stackoverflow.com/u/4424306/ ) and on the answer https://stackoverflow.com/a/79376876/ provided by the user 'margusl' ( https://stackoverflow.com/u/646761/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.

Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Reading .DAT file with odd tab-delimited structure in r

Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.

If anything seems off to you, please feel free to drop me a comment under this video.
---
Introduction

When working with .DAT files that are supposed to be tab-delimited but include irregular lines (such as free text without tabs), standard functions like read_tsv() may fail or produce incorrect output. This often happens when data rows span multiple lines or contain notes embedded beneath main records.

The Challenge

You have a .DAT file where:

Each record should have five columns:

Numeric ID

Date (MM/DD/YYYY)

Time (HH:MM or HH:MM:SS)

Free text field

Free text field

However, the file also contains lines without tabs that belong to the previous record's last column.

For example:

[[See Video to Reveal this Text or Code Snippet]]

Here, the lines without tabs ("UNKNOWN", "CONTRAINDICATION, STOP") are continuation lines for the first record's last column.

The Solution: Group and Collapse Related Lines

We can solve this by:

Reading all lines as strings using readLines() or readr::read_lines().

Identifying record starts: Lines containing tabs indicate a new record start.

Grouping lines: Use cumulative sums on presence of tabs to group related lines.

Collapsing lines in each group: Concatenate all lines belonging to the same record, separating continuation lines with ", ".

Parsing the cleaned data: Apply readr::read_tsv() on the collapsed strings.

Concise R Code Implementation

[[See Video to Reveal this Text or Code Snippet]]

Explanation

grepl("\t", line) returns a logical vector identifying lines with tabs (record starts).

cumsum() turns this into a grouping integer that increments only when a new record starts.

summarise(paste(...)) joins all lines of a record into one string with comma-separated continuation texts.

Finally, read_tsv() easily parses the well-structured tab-delimited data.

Result

The output dataframe will have five columns:

X1: Numeric identifier

X2: Date

X3: Time

X4: Free text

X5: Concatenated free text from continuation lines

This method is robust as long as continuation lines never contain tabs themselves.

Summary

Handling irregular tab-delimited files with continuation lines can be tricky, but simple grouping based on tab presence combined with collapsing lines enables clean parsing into tidy data frames.

Keep this pattern handy when your data doesn't fit neatly into standard delimited formats!

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

How to Read Messy Tab-Delimited .DAT Files with Grouped Lines in R

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Переговоры провалились / Срочная переброска войск

Переговоры провалились / Срочная переброска войск

Сводные Таблицы и Дэшборды

Сводные Таблицы и Дэшборды

Getting Started with LibreOffice Base for Beginners

Getting Started with LibreOffice Base for Beginners

Для Чего РЕАЛЬНО Нужен был ГОРБ Boeing 747?

Для Чего РЕАЛЬНО Нужен был ГОРБ Boeing 747?

Я вернул себе фокус (вот, что реально сработало)

Я вернул себе фокус (вот, что реально сработало)

SQL Query | Программируем SQL Команды в Функции | MS Access Базы Данных | Database Connectivity

SQL Query | Программируем SQL Команды в Функции | MS Access Базы Данных | Database Connectivity

Fixing Date Format Issues in Word Mail Merge with Excel DataFields Using VBA

Fixing Date Format Issues in Word Mail Merge with Excel DataFields Using VBA

Emacs в 2026: Секретное оружие или старый хлам? |vim, vscode, lisp, org-mode|Podlodka Podcast #460

Emacs в 2026: Секретное оружие или старый хлам? |vim, vscode, lisp, org-mode|Podlodka Podcast #460

ОБЫЧНЫЙ VPN УМЕР: Чем обходить блокировки в 2026

ОБЫЧНЫЙ VPN УМЕР: Чем обходить блокировки в 2026

ХИТЫ 2025🔝Лучшая музыка 2025 🏖️ Зарубежные песни Хиты 🏖️ Популярные песни Слушать бесплатно 2025

ХИТЫ 2025🔝Лучшая музыка 2025 🏖️ Зарубежные песни Хиты 🏖️ Популярные песни Слушать бесплатно 2025

Где начало СХЕМЫ? Понимаем, читаем, изучаем схемы. Понятное объяснение!

Где начало СХЕМЫ? Понимаем, читаем, изучаем схемы. Понятное объяснение!

ХИТЫ 2026🔝Лучшая музыка 2026 🏖️ Зарубежные песни Хиты 🏖️ Популярные песни Слушать бесплатно 2026

ХИТЫ 2026🔝Лучшая музыка 2026 🏖️ Зарубежные песни Хиты 🏖️ Популярные песни Слушать бесплатно 2026

Фишки Excel, которые я использую КАЖДЫЙ ДЕНЬ! ЭТО нужно каждому

Фишки Excel, которые я использую КАЖДЫЙ ДЕНЬ! ЭТО нужно каждому

Я в опасности

Using @ Volatile lateinit var with Double-Checked Locking in Kotlin: Is It Safe?

Using @ Volatile lateinit var with Double-Checked Locking in Kotlin: Is It Safe?

Деепричастие и деепричастный оборот | Русский язык TutorOnline

Деепричастие и деепричастный оборот | Русский язык TutorOnline

Golden Dust Particles Animation Background video | 4K Gold Dust

Golden Dust Particles Animation Background video | 4K Gold Dust

Сводные таблицы Excel с нуля до профи за полчаса + Дэшборды! | 1-ое Видео курса

Сводные таблицы Excel с нуля до профи за полчаса + Дэшборды! | 1-ое Видео курса "Сводные Таблицы"

Управление Базами Данных | Создание Credentials | Создание и Проверка Пароля | Логические Операции

Управление Базами Данных | Создание Credentials | Создание и Проверка Пароля | Логические Операции

Если у тебя спросили «Как твои дела?» — НЕ ГОВОРИ! Ты теряешь свою силу | Еврейская мудрость

Если у тебя спросили «Как твои дела?» — НЕ ГОВОРИ! Ты теряешь свою силу | Еврейская мудрость