Top 5 distance metrics in K-Means

Автор: John Eidenham

Загружено: 2025-11-14

Просмотров: 34

Описание: Distance metrics play a much bigger role in K-Means than most people realize. While K-Means is often introduced as a simple clustering algorithm, the definition of “closeness” completely shapes how the algorithm groups data. In this video, we explore the five most important distance metrics used in K-Means and show exactly how each one changes the geometry of the resulting clusters.

We begin with Euclidean distance, the standard measure that K-Means is built around. Because it calculates straight-line distance, it creates circular or spherical clusters and works best when features are scaled similarly. Through simple visual examples, you’ll see how this metric causes points to form smooth, rounded regions around their centroids.

We then shift to Manhattan distance, which acts more like navigating a grid of city blocks. Since you can only move horizontally or vertically, cluster boundaries take on diamond-shaped patterns instead of circles. This makes Manhattan distance useful for grid-like data or situations where different features vary independently.

Cosine distance offers a completely different perspective. Instead of measuring straight-line distance or grid-like paths, it focuses on the angle between points. Two points pointing in the same direction from the origin are considered similar, even if their magnitudes differ. This makes cosine distance especially powerful for text data, embeddings, and any high-dimensional dataset where magnitude isn’t the important factor. In the visualization, you’ll see how clusters form like slices of a pie, each capturing a direction rather than a location.

Minkowski distance generalizes both Euclidean and Manhattan by introducing a parameter, p, that controls how distances are computed. When p equals 1, you get Manhattan distance; when p equals 2, you get Euclidean. By adjusting p, Minkowski lets you smoothly transition between different interpretations of distance, giving more control over how K-Means perceives similarity.

Finally, we explore Mahalanobis distance, which adjusts based on the data’s variance and correlation structure. Instead of assuming features are independent and equally scaled, Mahalanobis takes into account how the data stretches or tilts along certain directions. This results in elliptical clusters aligned with the true distribution of the data. It’s particularly valuable when features are correlated or when the raw geometry of the data is not spherical.

By the end of the video, you’ll understand how each distance metric changes the shape of K-Means clusters, why different metrics lead to completely different results, and how to choose the right one depending on your dataset. This visual explanation is designed to make abstract concepts intuitive, helping you build a deeper understanding of clustering and similarity in machine learning.

If you enjoy clear, visual explanations of AI and machine learning concepts, consider subscribing for more content.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Top 5 distance metrics in K-Means

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Грозев УДИВИЛ прогнозом! Мир страшнее войны. Почему режим не переживет заморозку

Грозев УДИВИЛ прогнозом! Мир страшнее войны. Почему режим не переживет заморозку

Weird notions of

Weird notions of "distance" || Intro to Metric Spaces

Хаос в Китае: Apple, Amazon и BMW бегут. Почему начался промышленный обвал?

Хаос в Китае: Apple, Amazon и BMW бегут. Почему начался промышленный обвал?

The Most Elegant Way to Compare Probability Distributions

The Most Elegant Way to Compare Probability Distributions

How K-Means Clustering REALLY Works (Visual Explanation)

How K-Means Clustering REALLY Works (Visual Explanation)

Вы думали, что допинг — это плохо? Подождите, пока не услышите об электромагнитных велосипедах.

Вы думали, что допинг — это плохо? Подождите, пока не услышите об электромагнитных велосипедах.

Акунин ошарашил прогнозом! Финал войны уже решён — Кремль скрывает правду

Акунин ошарашил прогнозом! Финал войны уже решён — Кремль скрывает правду

Женщина патриарха. Как глава РПЦ 50 лет скрывал гражданскую жену?

Женщина патриарха. Как глава РПЦ 50 лет скрывал гражданскую жену?

Самый важный алгоритм в истории [Veritasium]

Самый важный алгоритм в истории [Veritasium]

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Euclidean Distance and Manhattan Distance

Euclidean Distance and Manhattan Distance

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Маска подсети — пояснения

Маска подсети — пояснения

Demystifying The Metric Tensor in General Relativity

Demystifying The Metric Tensor in General Relativity

How to Choose K in K-Means (Elbow Method + Silhouette Score)

How to Choose K in K-Means (Elbow Method + Silhouette Score)

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

Преломление и «замедление» света | По мотивам лекции Ричарда Фейнмана

Преломление и «замедление» света | По мотивам лекции Ричарда Фейнмана

Понимание сталей и термообработки

Понимание сталей и термообработки

Понимание GD&T