Data scaling | feature scaling | normalization | Data Science
Автор: data science Consultancy
Загружено: 2023-05-20
Просмотров: 56
Описание:
Data scaling, also known as feature scaling or normalization, is the process of transforming the numerical features in a data set to a common scale.
00:10
It is an important pre processing step in many machine learning algorithms as it can improve the performance and stability of the models.
00:18
Data scaling ensures that all features contribute equally to the analysis and prevents features with larger magnitudes from dominating the learning process.
00:29
There are a few common techniques for data scaling.
00:32
Minmax scaling, normalization.
00:35
This technique scales the data to a fixed range, usually between zero and one.
00:41
It is achieved by subtracting the minimum value from each feature and then dividing it by the range maximum value minus minimum value.
00:50
The formula for min Max scaling is.
00:53
X under score Scaled equals X under score min forward, slash, X under score Max, X under score min.
01:01
Min Max scaling preserves the original distribution of the data, but compresses it into a specific range.
01:08
Standardization, Z score scaling.
01:11
Standardization transforms the data to have a mean of 0 and a standard deviation of 1.
01:18
It is accomplished by subtracting the mean from each feature and dividing it by the standard deviation.
01:24
The formula for standardization is.
01:27
X under score scaled equals X under score mean, X under score STD.
01:34
Standardization centres the data around zero and gives it a standard deviation of 1.
01:40
It makes the data more suitable for algorithms that assume a Gaussian distribution.
01:45
Robust scaling.
01:47
Robust scaling is similar to standardization, but uses the median and interquartile range IQR instead of the mean and standard deviation.
01:57
It is less sensitive to outliers in the data and is suitable when the data set contains extreme values.
02:04
The formula for robust scaling is.
02:07
X under score Scaled equals X under score median IQR.
02:13
The choice of data scaling technique depends on the specific requirements of the data set and the machine learning algorithm being used.
02:21
It is generally recommended to scale the data before applying algorithms that rely on distance calculations or gradient descent optimization.
02:31
Additionally, some algorithms such as Support Vector Machines and K nearest neighbours are sensitive to the scale of the features and can benefit from scaling.
02:41
It's important to note that data scaling should be applied to the training data and then propagated to the test or validation data using the parameters example mean and standard deviation computed from the training set.
02:55
This ensures consistency and prevents leakage of information from the test set into the training process.
Повторяем попытку...

Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: