Everyone's Switching to Qwen3.5 Locally — Here's Why | OpenCode + llama.cpp + Docker
Автор: Lukasz Gawenda
Загружено: 2026-03-03
Просмотров: 110
Описание:
RTX 6000 PRO local AI setup 2026 — Deploy 122B models with llama.cpp, Docker & OpenCode. Stanford research proves local AI closing the gap with cloud. Full agentic coding workflow inside.
Can local AI finally compete with the cloud? According to Stanford & Together AI research, intelligence per watt has improved 5.3x from 2023 to 2025 — and I put that claim to the test. In this video I deploy a 122B parameter model locally using Docker + llama.cpp, hook it into OpenCode, and build a full agentic coding workflow — all on my own hardware.
I'm Łukasz, Lead AI Engineer, and today I'll show you exactly how to run production-grade local AI without paying cloud inference bills. 🔥
⚡ What You'll Learn:
✅ Why local AI is finally catching up to cloud efficiency (the research behind it)
✅ How to containerize llama.cpp server with Docker for any hardware
✅ GGUF format explained — quantization, accuracy tradeoffs & why it matters
✅ How to pick the RIGHT quantization level for YOUR VRAM
✅ Full OpenCode setup — terminal, desktop app & VS Code extension
✅ Building multi-agent workflows with sub-agents & custom skills
✅ Hardware compatibility tricks using Hugging Face model pages
💡 Key Takeaways:
GGUF + llama.cpp is the go-to stack for GPU-poor setups — CPU offloading works, just slower
Quantization sweet spot: Q6K = near perfect quality; Q2 = surprisingly usable on huge models
OpenCode gives you agents, sub-agents, custom skills, and tool use out of the box
Local models fall asleep when idle to save resources — normal behavior, not a bug
Always set folder permissions before hf download or you'll hit blob creation errors
⏱️ Timestamps:
0:00 Can Local AI Stand Against the Cloud?
0:32 Stanford Research: 5.3x Intelligence Per Watt Improvement
1:24 Hardware Overview & Requirements
1:55 Docker Setup — llama.cpp Server Container
3:05 Finding the Right Model for Your Hardware (HF Compatibility Tool)
3:30 GGUF Format & Quantization Explained
5:45 Downloading Models Fast with HF Transfer
6:42 Docker Compose Walkthrough
9:47 Configuring OpenCode (JSON Schema + API Key)
10:47 Terminal vs Desktop App — Which to Use?
13:35 Adding Cloud Providers to OpenCode
13:55 Agents, Sub-Agents & Skills Explained
16:00 Live Demo: Creator Sub-Agent Writing Documentation
17:50 File Attachments & Folder Context in OpenCode
18:10 Live Demo: Training Visualization App Built by Agent
19:09 Final Verdict — Is Local AI Worth It?
📦 Resources:
Github repo files from video: https://github.com/lukaLLM/AI_Inferen...
OpenCode: https://opencode.ai
llama.cpp: https://github.com/ggerganov/llama.cpp
Nvidia Container Toolkit: https://docs.nvidia.com/datacenter/cl...
HF Hub CLI: https://huggingface.co/docs/huggingfa...
Stanford/Together AI Research (The Batch): https://www.deeplearning.ai/the-batch...
Docker Desktop (Windows): https://www.docker.com/products/docke...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: