Bringing Apache Iceberg to Low-Latency Workloads: Rapid Queries Through Iceberg-Rust with Cheetah
Автор: Apache Iceberg
Загружено: 2025-04-29
Просмотров: 676
Описание:
#icebergSummit 2025 breakout session delivered by Scott Donnelly of Balyasny Asset Management.
Session Description:
#ApacheIceberg has become the go-to table format for managing petabyte-scale data lakes, with query engine implementations usually responding in single digit seconds at best. In a hedge fund environment, our users demand faster responses than this: sub-second query performance on Iceberg tables. This talk will discuss the journey we took on the development of Cheetah, a high-performance distributed Arrow Flight service that can serve up responses to Iceberg table queries in as little as 60ms. To get to this point, a swathe of submissions were made to iceberg-rust, such as Parquet page skipping - with iceberg-rust becoming the first of the core Apache Iceberg libraries to implement this capability.
Cheetah goes beyond this, implementing caching of parsed Parquet metadata and raw data sections, and a sharded distributed cache, to deliver the performance levels mentioned above.
This talk will explore how Cheetah extends Iceberg’s capabilities, the optimizations that make sub-second querying possible, and the evolution of iceberg-rust to support these new workloads. Attendees will come away with insights into optimizing Iceberg for real-time analytics, best practices for performance tuning, and the case for integrating these innovations back into Apache Iceberg-rust itself.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: