How to Self-Host LLMs and Multi-Modal AI Models with NVIDIA NIM in 5 Minutes

Автор: NVIDIA Developer

Загружено: 2024-07-29

Просмотров: 41362

Описание: -NVIDIA NIM is containerized AI inference software that makes it simple to deploy production-ready model endpoints accelerated by NVIDIA GPUs, anywhere you need them. Tap into the latest AI foundation models—like NVIDIA Nemotron, Qwen, DeepSeek R1, Meta Llama, and more— ready for secure, private deployment in 5 minutes or less on NVIDIA-accelerated workstations, datacenter or cloud environments.

Join Neal Vaidya, developer advocate at NVIDIA, for a demo on how to privately deploy LLMs and multi-modal AI models with NVIDIA NIM This tutorial focuses on running Llama 3 locally with NIM, but once you’re up and running with NIM, it’s easy to tap into NVIDIA Nemotron, Qwen, DeepSeek, Mistral, Meta, and more—all with the same simple workflow.

0:22 - Overview of NIM microservices (https://nvda.ws/4bZLY9E)
0:36 - Test the NVIDIA-hosted NIM endpoint for Llama 3
0:51 - Generate an API key and access sample code for OpenAI API-compatible chat completion endpoints
0:59 - Get instructions for pulling the NIM docker container to run Llama 3 locally
1:22 - How to log-in and authenticate with the NVIDIA NGC private registry from your local environment using the command line (CLI)
1:55 - Create and set an environment variable called NGC_API_Key
2:05 - Input a single ‘Docker run’ command to pull the NIM container, automatically download optimized model weights, and launch a local LLM endpoint2:19 - Explanation of Docker command options: Expose all GPUs to the running container
2:28 - Explanation of Docker command options: Expose the API key environment variable
2:35 - Explanation of Docker command options Mount the cache to download and store model weights to avoid redownload on future deployments
2:48 - Explanation of Docker command options: Specify the NIM should run as the local user
2:53 - Explanation of Docker command options: Expose the HTTP requests port to interact with the locally running NIM
3:03 - Explanation of Docker command syntax: Specifying the model name in the container image path
3:30 - Check that the Llama 3 inference service is running by sending a curl request to the API readiness health check endpoint in another terminal
3:41 - Use curl to send another inference request to the local Llama 3 NIM API endpoint

Developer resources

▶️ Learn more about NIM: https://nvda.ws/472hzbF
▶️ Join the NVIDIA Developer Program: https://nvda.ws/3OhiXfl
▶️ Trial and download NIM and NVIDIA Blueprints— reference workflows for AI agent sample apps—on the NVIDIA API catalog: https://nvda.ws/4bZLY9E
▶️ Read the Mastering LLM Techniques series to learn about inference optimization including continuous batching, KV caching, model quantization, tensor parallelism and more: https://resources.nvidia.com/en-us-la...

#selfhosting #LLM #nvidianim #aimodel #docker #generativeai #modeldeployment #aiinference #containerizedinference #llmapi #developer #inferenceoptimization #productiongenai #devops #artificialintelligence

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

How to Self-Host LLMs and Multi-Modal AI Models with NVIDIA NIM in 5 Minutes

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео