3H6 - Big Data Orchestration on Spark, Flink and DataFlow using Apache Beam
Автор: Apache Hop
Загружено: 2021-05-04
Просмотров: 2398
Описание:
Discover what Apache Beam is and how you can use Apache Hop to visually design big data pipelines that run on Apache Spark, Apache Flink and Google Dataflow over Apache Beam.
We start with an overview of the technology before we explore the Beam runtime configurations in Hop in more detail.
This is a session for a rather technical audience: as accessible as possible, but expect technical deep dives and discussions.
As always, there's plenty of time for Q&A.
About Apache Beam:
Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream processing.
Apache Beam: https://beam.apache.org
https://hop.apache.org
Check the full 3Hx schedule: http://hop.apache.org/community/events/
Join our chat https://chat.project-hop.org
Follow Hop Twitter: / apachehop
Follow Hop on LinkedIn: / apachehop
00:00 Hop pipelines & Beam
09:00 set up local Spark cluster, run Hop pipelines on Spark over Beam
29:00 run Hop pipelines on Google Dataflow over Beam
40:00 Q&A 1: Pipeline & workflow unit testing
43:00 Q&A 2: support for Parquet, ORC, Avro
45:00 Q&A 3: support for AWS EMR, Google Dataproc, Databricks
49:00 Q&A 4: Fat jar: tune, trim down, build incrementally, deployment options
56:00 Q&A 5: deployment options, deployment overview
59:00 Q&A 6: Hop Web
1:12:45 Q&A 8: custom transforms in Java, Scala, Python
1:21:35 Q&A 9: Neo4j integration
1:24:50 Q&A 10: Airflow integration
1:26:25 Q&A 11: parallel workflow execution
1:28:45 Q&A 12: porting Pentaho plugins
1:31:30 Q&A 13: Protobuf
1:38:30 Q&A 14: OCR/RPA (read pdf invoices, audio, video etc)
1:45:30 Q&A 15: Hop 0.99/1.0, ASF graduation
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: