BFGS & L-BFGS: The Algorithms Behind Modern Machine Learning | Machine Learning | Optimisation
Автор: Synapsara
Загружено: 2026-01-17
Просмотров: 213
Описание:
In this video, we dive deep into BFGS and L-BFGS - the quasi-Newton optimization algorithms that power most modern machine learning libraries. If you've ever wondered why PyTorch and SciPy use L-BFGS instead of pure gradient descent or Newton's method, this is your answer.
WHAT YOU'LL LEARN:
Why gradient descent fails for ill-conditioned problems
How second-order methods use curvature information
The computational bottleneck of Newton's method (O(n³) complexity!)
How BFGS approximates the inverse Hessian without computing it
Why L-BFGS only needs O(mn) memory instead of O(n²)
The elegant two-loop recursion algorithm
When to use GD vs BFGS vs L-BFGS in practice
TECHNICAL DETAILS:
BFGS (Broyden-Fletcher-Goldfarb-Shanno) builds an approximation of the inverse Hessian matrix iteratively using only gradient information. This gives you superlinear convergence without the O(n³) cost of computing and inverting the full Hessian.
L-BFGS extends this by storing only the last m vector pairs (typically m=3-20), reducing memory from O(n²) to O(mn). For a neural network with 1 million parameters, this is the difference between 8 terabytes and 80 megabytes.
The two-loop recursion is an elegant algorithm that computes the search direction in O(mn) time using only stored history - no matrix storage required.
KEY CONCEPTS COVERED:
Ill-conditioned optimization problems
Hessian matrix and curvature
Quasi-Newton methods
Secant equation and rank-two updates
Limited-memory approximations
Computational complexity analysis
WHERE YOU'LL SEE THESE ALGORITHMS:
scipy.optimize.minimize(method='L-BFGS-B')
PyTorch: torch.optim.LBFGS
TensorFlow's L-BFGS optimizer
scikit-learn's LogisticRegression (solver='lbfgs')
Maximum likelihood estimation in statistics
Parameter fitting in scientific computing
RELATED TOPICS:
If you enjoyed this video, check out my series on optimization algorithms:
Part 1: Gradient Descent Deep Dive
Part 2: Newton's Method and Second-Order Optimization
Part 3: BFGS and L-BFGS (this video)
Part 4: Conjugate Gradient Methods (coming soon)
WHO IS THIS FOR?
This video is designed for:
Machine learning engineers wondering what's under the hood
PhD students in computational science
Anyone who's seen "method='LBFGS'" and thought "what is that?"
Optimization enthusiasts who want to understand modern algorithms
You should have basic calculus knowledge (gradients, derivatives) and some familiarity with optimization. If you know what gradient descent is, you're ready for this video.
PERFORMANCE COMPARISON:
For a typical ML problem with n=100,000 parameters:
Gradient Descent: ~50,000 iterations, O(n) memory
BFGS: ~500 iterations, O(n²) memory = 80GB (impractical!)
L-BFGS: ~500 iterations, O(mn) memory = 40MB (perfect!)
WHY THIS MATTERS:
L-BFGS is the secret weapon of computational science. When you fit a logistic regression model in scikit-learn, train certain neural networks, or run maximum likelihood estimation, there's a good chance L-BFGS is doing the heavy lifting. Understanding how it works makes you a better ML engineer and computational scientist.
CONNECT WITH ME:
🔗 GitHub: https://github.com/psychedelic2007
🔗 LinkedIn: / satyam-sangeet-a8a604119
🔗 Twitter: https://x.com/satyam_san20
💬 Personal Page: https://psychedelic2007.github.io/
TOOLS:
Animations created with Manim Community Edition
Simulations in Python (NumPy, Matplotlib, SciPy)
If you found this helpful, please like, subscribe, and share with anyone learning optimization or ML! Questions? Drop them in the comments - I read and respond to every one.
#MachineLearning #Optimization #BFGS #LBFGS #DeepLearning #ComputationalScience #Algorithm #Python #DataScience #AI #NumericalMethods #QuasiNewton #GradientDescent #PyTorch #TensorFlow #ScikitLearn
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: