7. Databricks optimization || OPTIMIZE and VACUUM command. (Hands on)

Автор: MOHAN SHARMA

Загружено: 2024-11-07

Просмотров: 619

Описание: OPTIMIZE Command:
This is the command to compact smaller sized files. In Big Data processing, having too many small files or too few very big files are not desired. It’s always good to have files with optimal size.
As per OPTIMIZE, the default size is 1GB. So, for data files smaller than 1GB, running this command would combine such files to 1gb in size.
X In Big Data processing, it’s always good to have files with optimal size. We should always avoid having too many small files or very few very big files. And we can use OPTIMIZE command for the same.

VACUUM Command:
(*) Delta Tables, by default, keep the history files for time-travelling.
(*) But over a period of time, if we do not clean up those data files, it will continue to pile-up huge amount of data, which is not good from maintenance and storage perspectives.
(*) So, we must clean them periodically and for that we can use the VACUUM command, which helps us to remove the obsolete files, that are not part of our latest version of our delta table.
X When data files become obsolete, they are automatically removed from the storage locations. And to optimize our storage costs, we can periodically run VACUUM command on our tables. But please make sure to do a DRY RUN before actually executing it.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

7. Databricks optimization || OPTIMIZE and VACUUM command. (Hands on)

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

8. Z Ordering In Detail Explanation

8. Z Ordering In Detail Explanation

Databricks | pySpark: Mastering Series | Full Course

Databricks | pySpark: Mastering Series | Full Course

Databricks Performance Optimization

Databricks Performance Optimization

64. Databricks | Pyspark | Delta Lake: Команда «Оптимизировать» — Сжатие файлов

64. Databricks | Pyspark | Delta Lake: Команда «Оптимизировать» — Сжатие файлов

Będziemy płacić za emerytury Ukraińców!

Będziemy płacić za emerytury Ukraińców!

Как выполнить оптимизацию и очистку данных в #Databricks: DLT, прогнозная оптимизация, автоматиче...

Как выполнить оптимизацию и очистку данных в #Databricks: DLT, прогнозная оптимизация, автоматиче...

Databricks Unity Catalog

Databricks Unity Catalog

22. Как выбрать тип Worker/Driver в Databricks?

22. Как выбрать тип Worker/Driver в Databricks?

Core Databricks: понимание Hive Metastore

Core Databricks: понимание Hive Metastore

оптимизация в Spark

оптимизация в Spark

65. Databricks | Pyspark | Delta Lake: Vacuum Command

65. Databricks | Pyspark | Delta Lake: Vacuum Command

Day 29 Master Databricks VACUUM Command Optimize Your Delta Tables!

Day 29 Master Databricks VACUUM Command Optimize Your Delta Tables!

ADF performance tuning part 1: adf performance tuning #performancetuning #adf #azure #datafactory

ADF performance tuning part 1: adf performance tuning #performancetuning #adf #azure #datafactory

6. Big Data File Formats Explained | CSV, JSON, Parquet, ORC, Avro for Data Engineers

6. Big Data File Formats Explained | CSV, JSON, Parquet, ORC, Avro for Data Engineers

23 Databricks COPY INTO command | COPY INTO Metadata | Idempotent Pipeline | Exactly Once processing

23 Databricks COPY INTO command | COPY INTO Metadata | Idempotent Pipeline | Exactly Once processing

Хранилище данных, озеро данных и Лейкхаус данных | В чём разница? (2025)

Хранилище данных, озеро данных и Лейкхаус данных | В чём разница? (2025)

38. Databricks | Pyspark | Вопрос для собеседования | Методы сжатия: Snappy против Gzip

38. Databricks | Pyspark | Вопрос для собеседования | Методы сжатия: Snappy против Gzip

Need for Speed! Auto Optimize and Vaccum to the Rescue - Fabric's Delta Lake Table Optimization

Need for Speed! Auto Optimize and Vaccum to the Rescue - Fabric's Delta Lake Table Optimization

Azure Data Engineer Mock Interview - Project Special

Azure Data Engineer Mock Interview - Project Special

30 Data Skipping and Z-Ordering in Delta Lake Tables | Optimize & Data Compaction Delta Lake Tables

30 Data Skipping and Z-Ordering in Delta Lake Tables | Optimize & Data Compaction Delta Lake Tables