LLaVA-Scissor: Semantic Video Compression
Автор: AI Research Roundup
Загружено: 2025-06-29
Просмотров: 68
Описание:
In this AI Research Roundup episode, Alex discusses the paper:
'LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs'
Video Large Language Models (VLLMs) often struggle with high computational costs from processing redundant visual tokens. This paper introduces LLaVA-Scissor, a training-free strategy to tackle this problem by compressing tokens efficiently. The core innovation is the Semantic Connected Components (SCC) method, which groups tokens into distinct semantic regions based on their similarity. These groups are then aggregated into single representative tokens, drastically reducing the token count. This two-step process first compresses tokens spatially within each frame and then temporally across the entire video, making VLLMs more efficient without extra training.
Paper URL: https://huggingface.co/papers/2506.21862
#AI #MachineLearning #DeepLearning #VideoLLM #TokenCompression #LLaVA #ComputerVision
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: