UniVideo: Unified Video Understanding, Gen, Edit
Автор: AI Research Roundup
Загружено: 2025-10-10
Просмотров: 65
Описание:
In this AI Research Roundup episode, Alex discusses the paper:
'UniVideo: Unified Understanding, Generation, and Editing for Videos'
UniVideo proposes a single multimodal framework that handles video understanding, generation, and editing under instruction. It uses a dual‑stream design: a frozen MLLM (qwen2.5VL‑7B) for instruction understanding and a Multimodal DiT (HunyuanVideo‑T2V‑13B) for video synthesis, bridged by a trainable MLP connector. Visual details are preserved via a VAE stream, with ID tags and 3D positional embeddings enabling multi‑ID and conditioning. A three‑stage training pipeline scales from large T2I/T2V alignment to high‑quality fine‑tuning and multi‑task joint training, matching or surpassing task‑specific baselines.
Paper URL: https://arxiv.org/abs/2510.08377
#AI #MachineLearning #DeepLearning #VideoGeneration #Multimodal #LLM #VideoEditing #DiffusionModels
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: