A Configurable Floating-Point Fused Multiply-Add Design with Mixed Precision for AI Accelerators

Автор: Nxfee Innovation

Загружено: 2025-10-30

Просмотров: 118

Описание: A Configurable Floating-Point Fused Multiply-Add Design with Mixed Precision for AI Accelerators | Hardware accelerators for deep learning in artificial intelligence applications must often meet stringent constraints for accuracy and throughput. In addition to architecture/algorithm improvements, high performance computational techniques such as mixed precision are also required. In this paper, a floating-point (FP) fused multiply-add (FMA) unit supporting mixed/multiple precision is proposed. A wide range of conventional FP formats (such as half and single) as well as emerging formats (including E4M3, E5M2, DLFloat, BFLoat16 and TF32) are supported in the proposed design. In addition to all these formats, the proposed design is flexible in manipulating the exponent and mantissa lengths for 8, 16 and 32-bit FP numbers based on the needs of an application. The proposed FMA can be configured to support either multiple normal FMA operations, or alternatively mixed precision in ASIC. It is fully pipelined and in each cycle, the input bit streams are processed based on the provided configuration, so independent of the previous cycles. For normal FMA operations, the proposed design utilizes sharing of resources to parallelize multiple operations based on the available hardware and required precision. For mixed precision the FMA accumulates the lower precision dot products into higher precision to avoid overflow/underflow. It improves computational accuracy by adding all possible dot products at the same time while decreasing the number of rounding operations to prevent rounding errors. An innovative method to accumulate the dot products and the aligned addend is also proposed. By, considering tradeoffs between reusing the available hardware and removing unnecessary complex units, a more efficient and flexible design is attained in terms of hardware metrics and supported different precision computation compared to other designs found in the technical literature. Extensive simulation results for comparative analysis are provided.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

A Configurable Floating-Point Fused Multiply-Add Design with Mixed Precision for AI Accelerators

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

System Design Concepts Course and Interview Prep

System Design Concepts Course and Interview Prep

JetKVM - девайс для удаленного управления вашими ПК

JetKVM - девайс для удаленного управления вашими ПК

Top 10 AI Project Ideas to Land Your Dream AI Job | AI Project Ideas For Beginners | Intellipaat

Top 10 AI Project Ideas to Land Your Dream AI Job | AI Project Ideas For Beginners | Intellipaat

The $200M Machine that Prints Microchips: The EUV Photolithography System

The $200M Machine that Prints Microchips: The EUV Photolithography System

Design and Analysis of Energy Efficient Approximate Multipliers for Image Processing and DNN

Design and Analysis of Energy Efficient Approximate Multipliers for Image Processing and DNN

Краткое руководство по Altium Designer от Фила Салмони из лаборатории Фила

Краткое руководство по Altium Designer от Фила Салмони из лаборатории Фила

Эфир - Самое ЛЕТУЧЕЕ Вещество на Земле!

Эфир - Самое ЛЕТУЧЕЕ Вещество на Земле!

I made a GPU at home

I made a GPU at home

🔥 DDR5 СВОИМИ РУКАМИ | Выживаем в кризис памяти 2026 года 💪| SODIMM - UDIMM без переходников

🔥 DDR5 СВОИМИ РУКАМИ | Выживаем в кризис памяти 2026 года 💪| SODIMM - UDIMM без переходников

But what is quantum computing? (Grover's Algorithm)

But what is quantum computing? (Grover's Algorithm)

AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators

AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators

Architecture All Access: Modern FPGA Architecture | Intel Technology

Architecture All Access: Modern FPGA Architecture | Intel Technology

Скрытое оружие для вывода ИИ, которое упустил каждый инженер

Скрытое оружие для вывода ИИ, которое упустил каждый инженер

Понимание инженерных чертежей

Понимание инженерных чертежей

AmneziaWG: Убийца платных VPN? Полный гайд по настройке. Нейросети без VPN. ChatGPT, Gemini обход

AmneziaWG: Убийца платных VPN? Полный гайд по настройке. Нейросети без VPN. ChatGPT, Gemini обход

AI Accelerators: Transforming Scalability & Model Efficiency

AI Accelerators: Transforming Scalability & Model Efficiency

В чем разница между матрицами и тензорами?

В чем разница между матрицами и тензорами?

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

What makes quantum computers SO powerful?

What makes quantum computers SO powerful?