1.2 Billion Records Per Hour High Performance Kafka and Spark - End to End Data Engineering Project

Автор: CodeWithYu

Загружено: 2024-12-03

Просмотров: 18939

Описание: PART 2: • End to End Monitoring of High Performance ...

Ever wondered how to process 1 billion records per hour seamlessly? In this video, we break down the architecture and tools to make it happen:

✅ Apache Kafka: The backbone of real-time data streaming.
✅ Apache Spark: Lightning-fast processing for massive data pipelines.
✅ ELK Stack: Gain visibility with Elasticsearch, Logstash, and Kibana.
✅ Grafana & Prometheus: Real-time monitoring and performance insights.
✅ Kafka Schema Registry & Control Center: Streamlined management and schema validation.

🎯 What You'll Learn:
✅ How to design a robust architecture for high-throughput data pipelines.
✅ Insights into Python vs. Java Kafka Producers: Which one performs better?
✅ Real-time logging, monitoring, and debugging strategies.

🔥 Why This Matters: If you're in data engineering or want to level up your skills, this video showcases everything you need to build, monitor, and scale an ultra-high-performance streaming platform.

Timestamps:
0:00 Introduction
2:31 High Level Architecture Whiteboard
12:55 Data Storage Estimation with workings!
29:33 Clean Architecture
30:39 System Architecture
36:27 System Architecture Setup and Coding
58:21 Python Producer 😩
1:29:27 Java Producer (yay! 😁)
1:33:17 300,000 records per second!
1:36:21 Apache Spark Consumer
2:03:50 Spark Job Optimisation and Statistics
2:15:26 Cluster Health issues
2:15:38 Part 1 Outro

👀 Don't just watch, build it! 🚧

👍 Like, Comment, & Subscribe for more cutting-edge data engineering content!

Resources:
Full Source Code:
https://buymeacoffee.com/yusuf.ganiyu...
Kafka Documentation: https://kafka.apache.org/documentation/
Apache Spark Documentation: https://spark.apache.org/documentatio...

#ApacheKafka, #ApacheSpark, #DataEngineering, #BigData, #RealTimeProcessing, #ELKStack, #Grafana, #Prometheus, #KafkaStreams, #BigDataAnalytics, #DataPipeline, #StreamingData, #KafkaMonitoring, #SparkStreaming, #DataArchitecture, #HighPerformanceComputing

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

1.2 Billion Records Per Hour High Performance Kafka and Spark - End to End Data Engineering Project

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Building Self-Healing Data Pipeline - End to End Data Engineering Project

Building Self-Healing Data Pipeline - End to End Data Engineering Project

Объяснение айсберга Apache за 10 минут — все, что вам нужно знать!

Объяснение айсберга Apache за 10 минут — все, что вам нужно знать!

Apache Flink For Analytics | End to End Data Engineering Project

Apache Flink For Analytics | End to End Data Engineering Project

End to End Monitoring of High Performance Systems - A Data Engineering Project PART 1

End to End Monitoring of High Performance Systems - A Data Engineering Project PART 1

End-to-End E-Commerce Data Pipeline with Snowflake, dbt & Airflow | Delayed Orders Alterting

End-to-End E-Commerce Data Pipeline with Snowflake, dbt & Airflow | Delayed Orders Alterting

End to End Modern Distributed Data Lakehouse using Apache Iceberg, Trino, Airflow, DBT and Minio

End to End Modern Distributed Data Lakehouse using Apache Iceberg, Trino, Airflow, DBT and Minio

Realtime Logs Processing with Apache Airflow, Kafka and Elasticsearch - PART 1

Realtime Logs Processing with Apache Airflow, Kafka and Elasticsearch - PART 1

Apache Kafka Projects

Apache Kafka Projects

Twitter Data Pipeline using Airflow for Beginners | Data Engineering Project

Twitter Data Pipeline using Airflow for Beginners | Data Engineering Project

Googles AI Boss Reveals What AI In 2026 Looks Like

Googles AI Boss Reveals What AI In 2026 Looks Like

Realtime Data Streaming | End To End Data Engineering Project

Realtime Data Streaming | End To End Data Engineering Project

Elasticsearch for High Throughout Systems - 1 Billion records!

Elasticsearch for High Throughout Systems - 1 Billion records!

Complete MLOps Pipeline: End-to-End ML Project Deployment 2025 | Production Ready

Complete MLOps Pipeline: End-to-End ML Project Deployment 2025 | Production Ready

Building Data Lakehouse from Scratch - End to End Data Engineering Project

Building Data Lakehouse from Scratch - End to End Data Engineering Project

Building Realtime Data Warehouses from Scratch | End to End Data Engineering Project

Building Realtime Data Warehouses from Scratch | End to End Data Engineering Project

Code along - build an ELT Pipeline in 1 Hour (dbt, Snowflake, Airflow)

Code along - build an ELT Pipeline in 1 Hour (dbt, Snowflake, Airflow)

Deep Dive Into OLAKE: End-to-End Setup, Sync, Debugging & Live Demo (Full Developer Walkthrough)

Deep Dive Into OLAKE: End-to-End Setup, Sync, Debugging & Live Demo (Full Developer Walkthrough)

Build a Real ETL Pipeline with Airflow | API to Postgres in Python

Build a Real ETL Pipeline with Airflow | API to Postgres in Python

Realtime Change Data Capture Streaming | End to End Data Engineering Project

Realtime Change Data Capture Streaming | End to End Data Engineering Project

Build Your First Data Pipeline project using Apache Airflow | End-to-End Project

Build Your First Data Pipeline project using Apache Airflow | End-to-End Project