Inference Optimization with NVIDIA TensorRT

Автор: NCSAatIllinois

Загружено: 2022-04-18

Просмотров: 16540

Описание: In many applications of deep learning models, we would benefit from reduced latency (time taken for inference). This tutorial will introduce NVIDIA TensorRT, an SDK for high-performance deep learning inference. We will go through all the steps necessary to convert a trained deep learning model to an inference-optimized model on HAL.

Speakers: Nikil Ravi and Pranshu Chaturvedi, UIUC
Webinar Date: April 13, 2022

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Inference Optimization with NVIDIA TensorRT

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs)

Оптимизация и запуск моделей TensorFlow с помощью TensorRT (Дмитрий Миронов)

Оптимизация и запуск моделей TensorFlow с помощью TensorRT (Дмитрий Миронов)

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

Getting Started with HAL

Getting Started with HAL

Освоение оптимизации вывода LLM: от теории до экономически эффективного внедрения: Марк Мойу

Освоение оптимизации вывода LLM: от теории до экономически эффективного внедрения: Марк Мойу

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

ONNX and ONNX Runtime

ONNX and ONNX Runtime

Build High-FPS Object Detection pipelines with NVIDIA DeepStream and Triton | TFUG Coimbatore

Build High-FPS Object Detection pipelines with NVIDIA DeepStream and Triton | TFUG Coimbatore

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

Deep House Mix 2024 | Deep House, Vocal House, Nu Disco, Chillout Mix by Diamond #3

Deep House Mix 2024 | Deep House, Vocal House, Nu Disco, Chillout Mix by Diamond #3

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

AI at the Edge TensorFlow to TensorRT on Jetson

AI at the Edge TensorFlow to TensorRT on Jetson

Квантование против обрезки против дистилляции: оптимизация нейронных сетей для вывода

Квантование против обрезки против дистилляции: оптимизация нейронных сетей для вывода

NVAITC Webinar: Deploying Models with TensorRT

NVAITC Webinar: Deploying Models with TensorRT

Optimizing Real-Time ML Inference with Nvidia Triton Inference Server | DataHour by Sharmili

Optimizing Real-Time ML Inference with Nvidia Triton Inference Server | DataHour by Sharmili

Александра Прокопенко: что власти не могут скрыть даже в официальной статистике? Телеграм и бизнес

Александра Прокопенко: что власти не могут скрыть даже в официальной статистике? Телеграм и бизнес

GTC 2022 - How CUDA Programming Works - Stephen Jones, CUDA Architect, NVIDIA

GTC 2022 - How CUDA Programming Works - Stephen Jones, CUDA Architect, NVIDIA