BFGS & L-BFGS: The Algorithms Behind Modern Machine Learning | Machine Learning | Optimisation

Автор: Synapsara

Загружено: 2026-01-17

Просмотров: 213

Описание: In this video, we dive deep into BFGS and L-BFGS - the quasi-Newton optimization algorithms that power most modern machine learning libraries. If you've ever wondered why PyTorch and SciPy use L-BFGS instead of pure gradient descent or Newton's method, this is your answer.

WHAT YOU'LL LEARN:
Why gradient descent fails for ill-conditioned problems
How second-order methods use curvature information
The computational bottleneck of Newton's method (O(n³) complexity!)
How BFGS approximates the inverse Hessian without computing it
Why L-BFGS only needs O(mn) memory instead of O(n²)
The elegant two-loop recursion algorithm
When to use GD vs BFGS vs L-BFGS in practice

TECHNICAL DETAILS:
BFGS (Broyden-Fletcher-Goldfarb-Shanno) builds an approximation of the inverse Hessian matrix iteratively using only gradient information. This gives you superlinear convergence without the O(n³) cost of computing and inverting the full Hessian.

L-BFGS extends this by storing only the last m vector pairs (typically m=3-20), reducing memory from O(n²) to O(mn). For a neural network with 1 million parameters, this is the difference between 8 terabytes and 80 megabytes.

The two-loop recursion is an elegant algorithm that computes the search direction in O(mn) time using only stored history - no matrix storage required.

KEY CONCEPTS COVERED:
Ill-conditioned optimization problems
Hessian matrix and curvature
Quasi-Newton methods
Secant equation and rank-two updates
Limited-memory approximations
Computational complexity analysis

WHERE YOU'LL SEE THESE ALGORITHMS:
scipy.optimize.minimize(method='L-BFGS-B')
PyTorch: torch.optim.LBFGS
TensorFlow's L-BFGS optimizer
scikit-learn's LogisticRegression (solver='lbfgs')
Maximum likelihood estimation in statistics
Parameter fitting in scientific computing

RELATED TOPICS:
If you enjoyed this video, check out my series on optimization algorithms:
Part 1: Gradient Descent Deep Dive
Part 2: Newton's Method and Second-Order Optimization
Part 3: BFGS and L-BFGS (this video)
Part 4: Conjugate Gradient Methods (coming soon)

WHO IS THIS FOR?
This video is designed for:
Machine learning engineers wondering what's under the hood
PhD students in computational science
Anyone who's seen "method='LBFGS'" and thought "what is that?"
Optimization enthusiasts who want to understand modern algorithms

You should have basic calculus knowledge (gradients, derivatives) and some familiarity with optimization. If you know what gradient descent is, you're ready for this video.

PERFORMANCE COMPARISON:
For a typical ML problem with n=100,000 parameters:
Gradient Descent: ~50,000 iterations, O(n) memory
BFGS: ~500 iterations, O(n²) memory = 80GB (impractical!)
L-BFGS: ~500 iterations, O(mn) memory = 40MB (perfect!)

WHY THIS MATTERS:
L-BFGS is the secret weapon of computational science. When you fit a logistic regression model in scikit-learn, train certain neural networks, or run maximum likelihood estimation, there's a good chance L-BFGS is doing the heavy lifting. Understanding how it works makes you a better ML engineer and computational scientist.

CONNECT WITH ME:
🔗 GitHub: https://github.com/psychedelic2007
🔗 LinkedIn: / satyam-sangeet-a8a604119
🔗 Twitter: https://x.com/satyam_san20
💬 Personal Page: https://psychedelic2007.github.io/

TOOLS:
Animations created with Manim Community Edition
Simulations in Python (NumPy, Matplotlib, SciPy)

If you found this helpful, please like, subscribe, and share with anyone learning optimization or ML! Questions? Drop them in the comments - I read and respond to every one.

#MachineLearning #Optimization #BFGS #LBFGS #DeepLearning #ComputationalScience #Algorithm #Python #DataScience #AI #NumericalMethods #QuasiNewton #GradientDescent #PyTorch #TensorFlow #ScikitLearn

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

BFGS & L-BFGS: The Algorithms Behind Modern Machine Learning | Machine Learning | Optimisation

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Why Returning From Mars ls Impossible - Richard Feynman's Warning

Why Returning From Mars ls Impossible - Richard Feynman's Warning

Why Intelligent Life Is Unlikely | Leonard Susskind

Why Intelligent Life Is Unlikely | Leonard Susskind

Understanding scipy.minimize part 1: The BFGS algorithm

Understanding scipy.minimize part 1: The BFGS algorithm

This is How AI Actually Learns – Gradient Descent Visualized | Machine Learning Explained

This is How AI Actually Learns – Gradient Descent Visualized | Machine Learning Explained

Factorial design equation explained | Understanding interaction terms

Factorial design equation explained | Understanding interaction terms

Neural networks

Neural networks

The Algorithm That Breaks Gradient Descent | Newton's Method Explained

The Algorithm That Breaks Gradient Descent | Newton's Method Explained

Maximum Likelihood Estimation: The Engine Behind Every AI Model

Maximum Likelihood Estimation: The Engine Behind Every AI Model

🧪🧪🧪🧪Как увидеть гиперпространство (4-е измерение)

🧪🧪🧪🧪Как увидеть гиперпространство (4-е измерение)

99,9% — легко, 100% — сложно.

99,9% — легко, 100% — сложно.

OpenClaw: чит-код для продуктивности или подарок хакерам?

OpenClaw: чит-код для продуктивности или подарок хакерам?

Автоматизация взлома оборудования с помощью кода Клода

Автоматизация взлома оборудования с помощью кода Клода

Странный предел, после которого свет начинает вышибать частицы из космической пустоты

Странный предел, после которого свет начинает вышибать частицы из космической пустоты

Вот как читать дифференциальные уравнения.

Вот как читать дифференциальные уравнения.

Google увольняет, Нейросеть для умерших, Суверенный европейский процессор | Как Там АйТи #86

Google увольняет, Нейросеть для умерших, Суверенный европейский процессор | Как Там АйТи #86

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

Лекция от легенды ИИ в Стэнфорде

Лекция от легенды ИИ в Стэнфорде

Ламповый звук: физика, психология, анатомия приятного звучания

Ламповый звук: физика, психология, анатомия приятного звучания

Он написал главные ТАНЦЕВАЛЬНЫЕ хиты 2000х. История Эрика Придза

Он написал главные ТАНЦЕВАЛЬНЫЕ хиты 2000х. История Эрика Придза