Scale AI Agent Evaluation with NVIDIA NeMo Evaluator LLM-as-a-Judge

Автор: NVIDIA Developer

Загружено: 2025-08-20

Просмотров: 1248

Описание: In this step-by-step tutorial, you’ll discover how to scale your AI agent evaluation workflows with NVIDIA NeMo Evaluator LLM-as-a-Judge.

This video walks you through how to:
✅ Install NVIDIA NIM Operator and setting up Prometheus
✅ Set up LLM-as-a-Judge with NeMo Evaluator using Docker compose
✅ Configure evaluation config with Llama Nemotron Nano 1.1 4B model
✅ Scale the evaluation to multiple GPUs

NVIDIA NeMo Evaluator microservice simplifies the end-to-end evaluation of generative AI applications, including LLM evaluation, retrieval-augmented generation (RAG) evaluation, and AI agent evaluation with an easy-to-use API. It provides LLM-as-a-judge capabilities, along with a comprehensive suite of LLM benchmarks and LLM metrics for a wide range of custom tasks and domains, including reasoning, coding, and instruction-following.

Get started:
✅ Follow the official documentation to scale using NIM Operator: https://docs.nvidia.com/nim-operator/...
✅ Download included Jupyter notebooks to replicate the evaluation workflows in your own environment: https://github.com/NVIDIA/GenerativeA...
✅ Learn more about NeMo Evaluator: https://nvda.ws/414aHXJ

00:00 - Introduction
01:00 - Building a Manifest for a NIM
01:52 - Run Command Get Pods
3:00 - Standard NeMo Evaluator Notebook
4:03 - Creating a Job
4:22 - Changing Configurations
5:00 - Apply YAML File to NIM Operator Instance

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Scale AI Agent Evaluation with NVIDIA NeMo Evaluator LLM-as-a-Judge

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео