ycliper

Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
Скачать

Train Machine Learning Model with SparkML (...and Python) | Hands-on tutorial

Автор: Data Science Garage

Загружено: 2022-12-06

Просмотров: 1392

Описание: To build and train a Machine Learning (#ML) model with Spark is not hard. With this tutorial we will build a simple Binary Classification ML model with Spark. We will use Logistic Regression built-in Spark algorithm, and then evaluate it by getting performance metrics from the model.

There are some different from we do it in Scikit-Learn. Spark provides a built-in SparkML engine with rich #SparkML API which you can leverage to build your unique Machine Learning model.

In this tutorial we are using SparkUI v.3.2.1 with pyspark-shell.

The critical points you should pay your attention to is:
Datatypes (DTypes)
String Indexer and One-Hot-Encoding for categorical features.
Vector Assembler.

All these parts are explained and demonstrated in details in this tutorial. Also, you will learn what is SparkContext and SparkSession (differences between them). Therefore you will be able to check Data schema and handle data types in Spark DataFrame, selected features within your data. As required for ML modelling, you will also learn how to split your data into train and test sets.

Here you also learn how to setup ML stages with Spark and build a custom ML Pipeline to build your Machine Learning Model with Spark.

At the end, you will learn hot to get model performance metrics, such as Precision, Recall, or ROC curve values.

The tutorial is prepared with Jupyter Notebook, using Python programming language, so all the steps are executed with #pyspark .

The content of the video:
0:00 - Intro
0:32 - Start of Hands-on with Jupyter Notebook
0:46 - 1. Import main dependencies for Spark and Python
1:14 - Theory: Spark Session vs. Spark Context
3:10 - 1. Continuing importing dependencies
3:28 - 2. Load External CSV data to Spark (as Spark DataFrame)
5:40 - 3. Train and Test splits
6:39 - 4. Check Data Types
8:27 - 5. One-Hot-Encoding with Spark
10:07 - Theory: StringIndexer and One-Hot-Encoer
11:01 - 5. Continuing with StringIndexer hands-on
12:19 - 6. Vector Assembling
12:55 - Theory: Vector Assembling in Spark
13:53 - 6. Continuing with Vector Assembling
15:24 - 7. Make Spark ML Pipeline
18:31 - 8. Train ML Model with Spark
20:07 - 9. Get Model Performance Metrics

Spark API and SparkML API method used in the tutorial (incl. documentation):
Spark Datatypes (https://spark.apache.org/docs/latest/...)
PySpark SQL DataFrame Random Split (https://spark.apache.org/docs/3.1.3/a...)
StringIndexer (https://spark.apache.org/docs/latest/...)
OneHotEncoder (https://spark.apache.org/docs/3.1.1/a...)
VectorAssembler (https://spark.apache.org/docs/latest/...)
Spark DataFrame aggregation (https://spark.apache.org/docs/latest/...)
Count Distinct values from Spark DataFrame (https://spark.apache.org/docs/3.1.2/a...)
Group by to check feature distribution (https://spark.apache.org/docs/latest/...)
SparkML Pipelines (https://spark.apache.org/docs/latest/...)
Logistic Regression in Spark (https://spark.apache.org/docs/1.6.1/m...)

Link to the Github repo to hand-on everything on your side (data file is included there): https://github.com/vb100/spark_ml_tra...

Thank you for watching!

Please subscribe this channel - ‪@DataScienceGarage‬ to get more high-quality videos about #DataScience , #Python , #AI , #MachineLearning , #DeepLearning and much more!

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...
Train Machine Learning Model with SparkML (...and Python) | Hands-on tutorial

Поделиться в:

Доступные форматы для скачивания:

Скачать видео

  • Информация по загрузке:

Скачать аудио

Похожие видео

OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python

OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python

PySpark Tutorial 33: PySpark Logistic Regression | PySpark with Python

PySpark Tutorial 33: PySpark Logistic Regression | PySpark with Python

Визуализация скрытого пространства: PCA, t-SNE, UMAP | Глубокое обучение с анимацией

Визуализация скрытого пространства: PCA, t-SNE, UMAP | Глубокое обучение с анимацией

Путина предали? / Требование досрочных выборов президента

Путина предали? / Требование досрочных выборов президента

Finetune and Deploy Mistral 7B LLM Model on AWS Sagemaker | QLoRA | 29th May 2024 |

Finetune and Deploy Mistral 7B LLM Model on AWS Sagemaker | QLoRA | 29th May 2024 |

Правда Зеленского о потерях.

Правда Зеленского о потерях.

Создание модели машинного обучения с использованием Apache Spark | Учебное пособие по PySpark MLlib

Создание модели машинного обучения с использованием Apache Spark | Учебное пособие по PySpark MLlib

Apache Spark/PySpark Tutorial

Apache Spark/PySpark Tutorial

Python Data Science Project

Python Data Science Project

Apache Spark™ ML and Distributed Learning (1/5)

Apache Spark™ ML and Distributed Learning (1/5)

Лучший Гайд по Kafka для Начинающих За 1 Час

Лучший Гайд по Kafka для Начинающих За 1 Час

Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters

Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters

Learn to Use Databricks for the Full ML Lifecycle

Learn to Use Databricks for the Full ML Lifecycle

Создайте свою первую модель машинного обучения на Python

Создайте свою первую модель машинного обучения на Python

Spark MLlib Machine Learning Classification Model from scratch | Code walk through

Spark MLlib Machine Learning Classification Model from scratch | Code walk through

Polars vs Pandas | detailed test with explained results

Polars vs Pandas | detailed test with explained results

Shuffling: What it is and why it's important

Shuffling: What it is and why it's important

Multi Agent System with Python and CrewAI

Multi Agent System with Python and CrewAI

Учебное пособие по PySpark для начинающих

Учебное пособие по PySpark для начинающих

Google Cloud ML APIs and AutoML

Google Cloud ML APIs and AutoML

© 2025 ycliper. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]