Introducing Lemonade Server: Local LLM Serving with GPU and NPU Acceleration

Автор: AMD Developer Central

Загружено: 2025-07-14

Просмотров: 10369

Описание: In this video, we introduce Lemonade Server—a powerful tool that lets you deploy local large language models (LLMs) directly on your PC. With support for industry-standard APIs, Lemonade Server easily connects to a wide range of applications, enabling you to replace cloud-based LLMs with fast, private, local alternatives.

🔧 What You’ll See

How to install and set up Lemonade Server
Downloading, managing, and prompting LLMs
Exploring key resources: GitHub repo, documentation, model details, and featured apps

🖥️ Test Setup
We demonstrate everything using an AMD Ryzen™ AI 395+ Mini PC with 128GB of RAM, showcasing the performance and flexibility of local inference.

Whether you're a developer, researcher, or enthusiast, this walkthrough will help you get started with local LLMs in minutes.

Links Referenced in the Video:
Lemonade Server: https://lemonade-server.ai
Local LLM Servers: https://lemonade-server.ai/docs/serve...

Find the resources you need to develop using AMD products: https://www.amd.com/en/developer.html

Find Ryzen AI Software 1.5 documentation:
https://ryzenai.docs.amd.com/en/lates...

Have questions or ideas? Collaborate directly with developers and experts on the AMD Developer Community Discord:
/ discord

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Introducing Lemonade Server: Local LLM Serving with GPU and NPU Acceleration

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Запуск нейросетей локально. Генерируем - ВСЁ

Запуск нейросетей локально. Генерируем - ВСЁ

Chris Lattner on High Performance AMD GPU Programming with Mojo

Chris Lattner on High Performance AMD GPU Programming with Mojo

TRP1 Week4-Day3 Technical Tutorial W4D3

TRP1 Week4-Day3 Technical Tutorial W4D3

4 levels of LLMs (on the go)

4 levels of LLMs (on the go)

Контейнерные LLM делают тестирование простым и надежным — Strix Halo Toolboxes

Контейнерные LLM делают тестирование простым и надежным — Strix Halo Toolboxes

Add and Run an FLM NPU Model (Qwen3-VL) to Lemonade Server

Add and Run an FLM NPU Model (Qwen3-VL) to Lemonade Server

Dev Workloads and LLMs… under $1000

Dev Workloads and LLMs… under $1000

Купил МОНСТРА на 32 ГБ VRAM за 45к. Что может серверная Tesla V100 в ИГРАХ?

Купил МОНСТРА на 32 ГБ VRAM за 45к. Что может серверная Tesla V100 в ИГРАХ?

THIS is the REAL DEAL 🤯 for local LLMs

THIS is the REAL DEAL 🤯 for local LLMs

У этого AI-агента уже 235 000 звёзд на GitHub. Показываю, как запустить за 10 минут

У этого AI-агента уже 235 000 звёзд на GitHub. Показываю, как запустить за 10 минут

Полный гайд по Claude: как выжать максимум из этой нейросети

Полный гайд по Claude: как выжать максимум из этой нейросети

Qwen 3.5 Plus УНИЧТОЖАЕТ платные AI! Бесплатно + уровень Claude Opus

Qwen 3.5 Plus УНИЧТОЖАЕТ платные AI! Бесплатно + уровень Claude Opus

Вайб-Кодинг — Гайд Для Тупых (Приложение За 1 Минуту Без Кода)

Вайб-Кодинг — Гайд Для Тупых (Приложение За 1 Минуту Без Кода)

Плачу $100 за Claude. Он автоматизировал весь мой YouTube

Плачу $100 за Claude. Он автоматизировал весь мой YouTube

Как Создавать ИИ-Агентов: Полное Руководство для Начинающих

Как Создавать ИИ-Агентов: Полное Руководство для Начинающих

Запуск vLLM на Strix Halo (AMD Ryzen AI MAX) + обновления производительности ROCm.

Запуск vLLM на Strix Halo (AMD Ryzen AI MAX) + обновления производительности ROCm.

Local LLM Challenge | Speed vs Efficiency

Local LLM Challenge | Speed vs Efficiency

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Goodbye AI Cloud Bills... Exo Runs AI on Your Own Devices

Goodbye AI Cloud Bills... Exo Runs AI on Your Own Devices

Lemonade Server & Open WebUI - Local LLM Serving with GPU and NPU Acceleration

Lemonade Server & Open WebUI - Local LLM Serving with GPU and NPU Acceleration