Fixing a Broken AI: Battling Class Imbalance with SMOTE & XGBoost
Автор: BioniChaos
Загружено: 2025-12-16
Просмотров: 3
Описание:
We are back with another intense coding session working on our Kaggle competition entry for detecting Body-Focused Repetitive Behaviors (BFRBs). In this video, we face a classic Machine Learning nightmare: a model that looks amazing on the surface but is failing hard under the hood.
We start by running our full training pipeline on over 8,000 sensor sequences using data from IMU, Thermopile, and Time-of-Flight sensors. Initial indicators look incredible—our Binary Classification model (the "Bouncer") is hitting a 98% F1 score, perfectly identifying when a gesture occurs. However, when we ask the model which gesture is happening, things fall apart.
We analyze the Confusion Matrix to find that our model has learned to predict only two gestures out of ten, resulting in a disastrous Gesture F1 score. Through a mix of technical analysis and a comedy skit breakdown (featuring the "King of Idiots" meta-model), we realize our previous high scores were due to data leakage during grid search. The reality is we are dealing with severe class imbalance.
Join us as we debug the code live. We implement a two-front strategy to fix the imbalance:
Data Level: Aggressively tuning SMOTE parameters (increasing k-neighbors) to generate better synthetic data for rare gestures.
Algorithm Level: Implementing Class Weights in XGBoost to heavily penalize the model for missing rare classes.
We also squash a critical bug where our safe_fit function was silently ignoring our weight parameters, effectively rendering our fixes useless. Watch as we patch the code, handle the data leakage, and relaunch the training to chase a competitive spot on the Kaggle leaderboard.
All code and project updates are available at BioniChaos.com.
#MachineLearning #DataScience #Python #Kaggle #XGBoost #BioniChaos #AI #Coding #ClassImbalance #SMOTE #BiomedicalAI #WebDevelopment
00:00
Intro and model expectations: Anticipating an F1 score above 0.9.
01:00
Explaining the "Safe Fit" fix and how early stopping handles bad models without crashing.
02:46
Launching the full training run on 8,151 sequences and monitoring feature extraction.
06:03
The Skit: A Comedian and Data Scientist break down the "Bouncer" model (Binary Classification).
08:10
The Confusion Matrix disaster: Why the model is only guessing two gestures.
12:08
Live training update: Binary F1 hits 98%, but Gesture Classification is the real test.
19:30
Clarifying the dataset structure: Target gestures vs. Non-target noise.
22:54
Analyzing the Kaggle Leaderboard: Comparing our target against the top public and private scores.
26:04
The reality check: Why our full run dropped to a 76% average and the issues with "Rare" classes.
33:23
Implementing the fixes: Tuning SMOTE k-neighbors and adding computed Class Weights.
36:00
Plain language explanation: How we are forcing the model to pay attention to minority classes.
40:30
Skit Part 2: Realizing the previous 95% score was "cheating" due to data leakage in grid search.
50:50
Investigating the raw data: Identifying the 4:1 imbalance ratio between common and rare gestures.
53:44
Bug fix: Correcting the safe_fit function to properly pass sample weights to XGBoost.
55:00
Restarting the training pipeline with the new imbalance strategy and monitoring initial progress.
Check out the tools we develop at https://bionichaos.com
Support BioniChaos on Patreon: / bionichaos
Become a channel member to get exclusive perks: / @bionichaos
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: