PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Автор: LuxaK
Загружено: 2026-02-04
Просмотров: 2
Описание:
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
This document introduces PaddleOCR-VL, a state-of-the-art and resource-efficient model designed for multilingual document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the lightweight ERNIE-4.5-0.3B language model. This architecture significantly enhances dense text recognition and decoding efficiency, enabling the model to support 109 languages and excel at recognizing complex elements like text, tables, formulas, and charts with minimal resource consumption. PaddleOCR-VL employs a two-stage approach, first performing layout detection and reading order prediction, then feeding segmented elements into the VLM for recognition. Extensive evaluations confirm its state-of-the-art performance in both page-level parsing and element-level recognition, outperforming existing solutions and competing strongly with top-tier VLMs. The model boasts fast inference speeds and low training costs, making it highly suitable for practical deployment, especially in resource-constrained environments. A high-quality training data pipeline, utilizing over 30 million samples, prompt engineering, and automatic labeling, was developed to ensure robust performance.
#PaddleOCR #VisionLanguageModel #DocumentParsing #MultilingualAI #ResourceEfficient #SOTA #DeepLearning #OCR #AIResearch #Baidu
paper - https://arxiv.org/pdf/2510.14528
subscribe - https://t.me/arxivpaper
donations:
USDT: 0xAA7B976c6A9A7ccC97A3B55B7fb353b6Cc8D1ef7
BTC: bc1q8972egrt38f5ye5klv3yye0996k2jjsz2zthpr
ETH: 0xAA7B976c6A9A7ccC97A3B55B7fb353b6Cc8D1ef7
SOL: DXnz1nd6oVm7evDJk25Z2wFSstEH8mcA1dzWDCVjUj9e
created with NotebookLM
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: