Hyperspace: An Indexing Subsystem for Apache Spark

Автор: Databricks

Загружено: 2020-07-16

Просмотров: 2528

Описание: At Microsoft, we store datasets (both from internal teams and external customers) ranging from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative, ‘finding needle in a haystack’ type of queries (e.g., point-lookups, summarization etc.). Resorting to linear scans of these large datasets with huge clusters for every simple query is prohibitively expensive and not the top choice for many of our customers, who are constantly exploring (and demanding!) ways to reducing their operational costs – incurring unchecked expenses are their worst nightmare. Over the years, we have seen a huge demand for bringing ‘indexing’ capabilities that come de facto in the traditional database systems world into Apache Spark.

Among many ways to improve query performance and lowering resource consumption in database systems, indexes are particularly efficient in providing tremendous acceleration for certain workloads since they could reduce the amount of data scanned for a given query and thus also result in lowering resource costs. In this talk, we present our experiences in designing, implementing and operationalizing Hyperspace, an indexing subsystem for Apache Spark that introduces the ability for users to build, maintain (through a multi-user concurrency model) and leverage indexes (automatically, without any changes to their existing code) on their data (e.g., CSV, JSON, Parquet etc.) for query/workload acceleration. We will cover the necessary foundations behind our indexing infrastructure including the API design, how we leveraged Spark’s Catalyst optimizer to provide a transparent user experience and also discuss our development roadmap as we work towards open sourcing our work for the benefit of the broader community. Through presentation, benchmarks, code examples and notebooks, this will be one fun session, so come join us as we get started on this journey.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook:   / databricksinc
Twitter:   / databricks
LinkedIn:   / databricks
Instagram:   / databricksinc   Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Hyperspace: An Indexing Subsystem for Apache Spark

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Applying Big Data and ML to Solve the World's Toughest Geospatial Intelligence Problems

Applying Big Data and ML to Solve the World's Toughest Geospatial Intelligence Problems

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Как устроена База Данных? Кластеры, индексы, схемы, ограничения

Как устроена База Данных? Кластеры, индексы, схемы, ограничения

LakeBase from Databricks Is Changing Everything and People Are Mad!

LakeBase from Databricks Is Changing Everything and People Are Mad!

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

AGI Достигнут! ChatGPT 5.2 Рвет ВСЕ Тесты! Внезапно OpenAI Выкатил Новую ИИ! Новая Qwen от Alibaba.

AGI Достигнут! ChatGPT 5.2 Рвет ВСЕ Тесты! Внезапно OpenAI Выкатил Новую ИИ! Новая Qwen от Alibaba.

Storage и FS - что подходит для enterprise

Storage и FS - что подходит для enterprise

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

«Вот теперь я задумался об эмиграции»: зачем Кремль заблокировал Roblox и как реагируют россияне

«Вот теперь я задумался об эмиграции»: зачем Кремль заблокировал Roblox и как реагируют россияне

Как строили корабли для мирового господства

Как строили корабли для мирового господства

Крах Jaguar: Как “повестка” в рекламе добила легенду британского автопрома

Крах Jaguar: Как “повестка” в рекламе добила легенду британского автопрома

Изучите Azure Databricks за 10 минут — простое объяснение | Учебные пособия по Azure Databricks д...

Изучите Azure Databricks за 10 минут — простое объяснение | Учебные пособия по Azure Databricks д...

User Behavior Hashing for Audience Expansion

User Behavior Hashing for Audience Expansion

Firecrawl + MCP-сервер в n8n: Забудь про сложный парсинг и скрапинг! Идеальный AI агент

Firecrawl + MCP-сервер в n8n: Забудь про сложный парсинг и скрапинг! Идеальный AI агент

«Сыграй На Пианино — Я Женюсь!» — Смеялся Миллиардер… Пока Еврейка Не Показала Свой Дар

«Сыграй На Пианино — Я Женюсь!» — Смеялся Миллиардер… Пока Еврейка Не Показала Свой Дар

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

Доктрина Трампа: революционная смена подходов в отношении Китая, китаевед Вавилов

Доктрина Трампа: революционная смена подходов в отношении Китая, китаевед Вавилов

"Там что, обрыв?!" Кошмар Рейса Air India 1344, 7 августа 2020 год

Молочные продукты после 40–50 лет, есть или исключить? Что укрепляет кости, а что их разрушает.

Молочные продукты после 40–50 лет, есть или исключить? Что укрепляет кости, а что их разрушает.