ClickHouse for Observability: Federated Queries to Slash High-Cardinality Data Egress Costs
Автор: ClickHouse
Загружено: 2024-09-13
Просмотров: 1365
Описание:
Sean Gillespie, Software Engineer at Temporal
Learn how Temporal leverages ClickHouse to supercharge their observability efforts! At this Bellevue Meetup, Sean Gillespie, Staff Engineer at Temporal, dives into how they handle high-cardinality data across 14 regions using ClickHouse Cloud. Discover how ClickHouse powers real-time metrics, logs, and queries to give Temporal deep insights into their multi-tenant cloud product. From cost-efficient data transit with AWS Private Link to the use of materialized views and global query distribution, see how ClickHouse helps Temporal deliver exceptional observability at scale! https://www.meetup.com/clickhouse-sea...
Temporal's observability platform, built on ClickHouse, offers a high-performance, cost-effective solution for high-cardinality logs and metrics. It handles unsampled, raw event data, eliminating the quadratic cost scaling issues common in traditional time-series databases. The system unifies logs and metrics by storing raw, wide-structured events where each row represents a single request. This approach embraces high-cardinality fields like tenant IDs as first-class dimensions, allowing engineers to slice telemetry data by any tenant to debug customer-specific issues. The platform achieves real-time query performance, with common dashboard queries returning in approximately 200 milliseconds and complex heatmap queries completing in 50 milliseconds. Ingestion performance on ClickHouse Cloud reaches 45,000 to 60,000 rows per second per core, optimized by using async inserts with a 5-second wait period to substantially increase data throughput. To minimize data egress costs, which range from $0.02 to $0.09 per gigabyte, the solution employs a federated query architecture. Data from 14 production regions is collected into five regional ClickHouse Cloud services. A global ClickHouse instance uses the `remoteSecureTable` function to push query predicates down to the regional databases, processing data in-place and merging only the results. This design, combined with AWS PrivateLink for a 50% bandwidth cost reduction to $0.01/GB, keeps data stationary and lowers operational costs. A dual-pipeline ingestion model provides both deep investigation and fast dashboarding. Raw events are inserted into a primary table for ad-hoc queries, while ClickHouse materialized views perform write-time aggregation to populate tables that power instantaneous dashboards. Query performance is tuned through a specific sorting key (`ORDER BY cluster, namespace, timestamp`). The platform achieves a 36.59x compression ratio by using the `LowCardinality(String)` data type for all identifiers, enabling effective dictionary encoding. For numeric data points, the `Decimal` type with ZSTD compression yielded a 2.63x compression ratio, outperforming both Gorilla and DoubleDelta codecs for this workload.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: