Scaling PyTorch: Distributed Data Parallel & Model Parallelism
Автор: EuroCC 2 and EuroCC4SEE
Загружено: 2026-02-24
Просмотров: 86
Описание:
As datasets and models grow in complexity, mastering distributed training becomes vital. In this video, Casper van Leeuwen from NCC Netherlands breaks down the technical implementation of PyTorch Distributed Data Parallel (DDP) to synchronise training across multiple nodes. Complementing this, Gyula Ujlaki from NCC Hungary presents the strategies behind Model Parallelism, demonstrating how to train massive architectures that exceed the memory capacity of a single GPU device.
CASTIEL 2 has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101102047. The JU receives support from the European Union‘s Digital Europe Programme and Germany, Italy, Spain, France, Belgium, Austria, Estonia.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: