Data Science Gems
These are recordings of the latest pieces of breakthrough research on deep learning for NLP and vision. The goal is to put up at least one update per week.
HomePage: https://sites.google.com/view/manishg/
LinkedIn: https://www.linkedin.com/in/manishsgupta/
Manish Gupta is a Principal Applied Researcher at Microsoft India R&D Private Limited at Hyderabad, India. He is also an Adjunct Faculty at IIIT, Hyderabad and a visiting faculty at ISB, Hyderabad. He received his Masters in Computer Science from IIT Bombay (2007) and Ph.D. from the Univ of Illinois at Urbana-Champaign in 2013. He worked for Yahoo! Bangalore from 2005-07. His research interests are in the areas of deep learning, natural language processing, web mining and data mining. He has published 150+ research papers in reputed refereed journals and conferences. He has also co-authored two books: one on Outlier Detection for Temporal Data and another one on Information Retrieval with Verbose Queries.
#292 Agentic Organization
#291 Числовые представления (вложения) для языковых моделей
#290 Контекстное увлечение и отвлечение в магистратуре
#289 HALoGEN: Тест на галлюцинации для получения степени магистра права
#288 olmOCR2
#287 Обучение рассуждениям с использованием оптимизированного кода: обучение студентов магистрату...
#286 Attention Sinks for Language modeling with 4M+ tokens
#285 FRAMES: Тестовый набор данных для систем RAG
#284 BrowseComp: бенчмарк для агентов просмотра
#283 Тест: Последний экзамен человечества (HLE)
#282 DeepSeek OCR
#281 KidLM: программы магистратуры права для детей
#280 Нативная рассеянность внимания от DeepSeek
#279 FastGen: Адаптивное сжатие кэша KV для LLM
#278 Response Sampling in LLMs
#277 Законы масштабирования для плотного поиска
#276 Restormer: Эффективный трансформатор для восстановления изображений высокого разрешения
#275 REFRAG: Compress, Sense and Expand for 31x faster Decoding
#274 DinoV3 foundational vision model
#273 Kimi K2
#272 Vision Transformer models with Registers
#271 Genie: Transformer-based Text Diffusion Model
#270 Google Gemma3
#269 OpenAI GPT5
#268 Iterative Retrieval for prompting LLMs
#267 Impact of Input Length on the Reasoning Performance of LLMs
#266 Enabling LLMs to know when to abstain
#265 MultiLegalPile and LegalXLM models
#264 UI-TARS: LLM-based GUI Native Agents
#263 Walking Tours and DoRA. Is ImageNet worth 1 video?