Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Автор: Summarize that research paper for me!
Загружено: 2025-09-24
Просмотров: 42
Описание:
Title:
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Source:
https://arxiv.org/pdf/2502.06768
Summary:
This paper, "Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions," explores the trade-offs between Masked Diffusion Models (MDMs) and Autoregressive Models (ARMs) in discrete generative modeling. The paper won the Outstanding Paper award at ICML2025.
MDMs face a more challenging training process compared to ARMs because they must learn to solve an exponentially large number of "infilling" or masking problems in an "order-agnostic" way. This training complexity can lead to performance imbalances, where the model struggles with harder subproblems. However, this rigorous training provides significant flexibility during inference.
The key insight of the paper is that this inference flexibility can be leveraged to overcome the drawbacks of complex training. By using adaptive inference strategies—which strategically choose the order of token generation—MDMs can sidestep the difficult subproblems they were not well-trained on. The paper proposes two such strategies: "Top probability" and "Top probability margin," which select the next token to unmask based on the model's certainty.
Experiments show that these adaptive strategies dramatically improve performance. For instance, on Sudoku puzzles, an MDM's accuracy boosted from under 7% with standard (vanilla) inference to approximately 90% with adaptive inference. This result even surpassed a much larger ARM that was explicitly trained with the correct token generation order.
The effectiveness of adaptive inference was also demonstrated on reasoning tasks like coding and math using the 8B LLaDa large language diffusion model. The paper concludes that for tasks without a fixed, natural token order, such as logic puzzles and reasoning, MDMs with adaptive inference are a powerful alternative to traditional ARMs.
#MaskedDiffusionModels #MDM #AutoregressiveModels #ARM #GenerativeAI #MachineLearning #DeepLearning #TokenOrdering #AIReasoning #LogicPuzzles #SudokuAI #AdaptiveInference #DiffusionModels #LLM #NaturalLanguageProcessing #TechPaper #AIResearch #InferenceOptimization #DiscreteDiffusion #GenerativeModeling #ICML2025 #BestPaper
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: