Using Databricks as an Analysis Platform
Автор: Databricks
Загружено: 2020-08-07
Просмотров: 4284
Описание:
Over the past year, YipitData spearheaded a full migration of its data pipelines to Apache Spark via the Databricks platform. Databricks now empowers its 40+ data analysts to independently create data ingestion systems, manage ETL workflows, and produce meaningful financial research for our clients. Today, YipitData analysts own production data pipelines end-to-end that interact with over 1,700 databases and 51,000 tables without dedicated data engineers. This talk explains how to identify key areas of data infrastructure that can be abstracted with Databricks and PySpark to allow data analysts to own production workflows. At YipitData, we pinpointed sensitive steps in our data pipelines to build powerful abstractions that let our analyst team easily and safely transform, store, and clean data. Attendees will find code snippets of utilities built with Databricks and Spark APIs that provide data analysts a clear interface to run reliable table/schema operations, reusable data transformations, scheduled jobs on spark clusters, and secure processes to import third-party data and export data to clients.
The talk will also showcase our system of integrating Apache Airflow with Databricks, so analysts can rapidly construct and deploy robust ETL workflows within the Databricks workspace. System administrators and engineers will also learn to utilize Databricks and Airflow metadata to discover large-scale optimizations of pipelines managed by analysts and create business value. Attendees will walk away with concrete strategies, tools, and architecture to drive their data analyst team to own production data pipelines and as a result, scale their engineering team and business.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: