Stabilizing PDE-ML system || How Does Neural Network Training Work || Oct 17, 2025
Автор: CRUNCH Group: Home of Math + Machine Learning + X
Загружено: 2025-10-17
Просмотров: 65
Описание:
Speakers, institutes & titles
1) Saad Qadeer, Pacific Northwest National Laboratory (PNNL), Stabilizing PDE-ML Coupled Systems
Abstract: A long-standing obstacle in the use of machine-learned surrogates with larger PDE systems is the onset of instabilities when solved numerically. Efforts towards ameliorating these have mostly concentrated on improving the accuracy of the surrogates or imbuing them with additional structure, and have garnered limited success. In this talk, we shall present some insights obtained from studying a prototype problem and how they can help with more complex systems. In particular, we shall focus on a viscous Burgers'-ML system and, after identifying the cause of the instabilities, prescribe strategies to stabilize the coupled system. Next, we will discuss methods based on the Mori--Zwanzig formalism to improve the accuracy of the stabilized system. We shall also draw analogies with more complex systems and how these prescriptions generalize to those settings.
2) Chulhee (Charlie) Yun, Korea Advanced Institute of Science & Technology (KAIST), How Does Neural Network Training Work: Edge of Stability, River Valley Landscape, and More
Abstract: Traditional analyses of gradient descent (GD) state that GD monotonically decreases the loss as long as the “sharpness” of the objective function—defined as the maximum eigenvalue of the objective's Hessian—is below a threshold $2/\eta$, where $\eta$ is the step size. Recent works have identified a striking discrepancy between traditional GD analyses and modern neural network training, referred to as the “Edge of Stability” phenomenon, in which the sharpness at GD iterates increases over time and hovers around the threshold $2/\eta$, while the loss continues to decrease rather than diverging. This discovery calls for an in-depth investigation into the underlying cause of the phenomenon as well as the actual inner mechanisms of neural network training. In this talk, I will briefly overview the Edge of Stability phenomenon and recent theoretical explanations of its underlying mechanism. We will then explore where learning actually occurs in the parameter space, discussing a recent paper that challenges the idea that neural network training happens in a low-dimensional dominant subspace. Based on these observations, I propose the hypothesis that the training loss landscape resembles a “river valley.” I will also present an analysis of the Schedule‑Free AdamW optimizer (Defazio et al., 2024) through this river-valley lens, including insights into why schedule‑free methods can be advantageous for scalable pretraining of language models.
Повторяем попытку...

Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: