My Preferred DIY Language Model Stack
Автор: Cruz Macias
Загружено: 2025-11-26
Просмотров: 12
Описание:
Using Sora 2 to read from the Recent Projects and Notable Accomplishments section of my CV verbatim in the style of AI YouTuber / Grad Student.
Is this a valid interviewing format?
OpenWebUI & and Local LLM Deployment/Integration
Maintaining and expanding the OpenWebUI platform for Generative AI web integration, locally hosted small language models (SLMs) and cloud-based large language models (LLMs) via a unified architecture. See my preferred DIY Language Model-Stack(s) below which include RAG/Embedding models, web search tools, OCR, and reranking models. Key components include:
Models/Tools: Integrated diverse models (e.g., GLM, Mistral, Perplexity, GGUF) and tools (llama.cpp, Ollama, LMStudio, LangChain) with custom middleware.
Stack: Python (FastAPI, REST), cURL, WebSocket streaming, async/await, Docker, and Python venv/uv.
Features: Retrieval-augmented generation (RAG), embedding, reranking pipelines, Model Context Protocol (MCP) servers, multi-agent systems, and privacy-preserving protocols like Role-Pseudonymous Prompting (RPP).
Hardware Preferences:
GPU: GGUF Models hosted on Llama.cpp servers (CUDA release) via HuggingFace repos or Ollama, optimized for NVIDIA RTX 5070 TI GPU (12GB VRAM).
CPU: GGUF Models hosted on Llama.cpp servers (CPU release) via HuggingFace repos, optimized for AMD Ryzen 9 8940HX.
Benchmarks: Achieved parity with commercial models at significantly lower costs (up to 94.4% reduction) in domain-specific benchmarks, validating scalable and enterprise-grade performance.
Applications: Optimized for specialized contexts (e.g., research, analysis), ensuring secure, compliant, and cost-efficient generative AI architectures.
Preferred DIY Language Model-Stack(s):
Platform Integration
OpenWebUI
Continue
Ollama
llama.cpp
My Profiles
HuggingFace
Continue
Ollama
OpenWebUI
Cloud
Instruct Model(s) Chat:
GLM-4.5-Flash
Magistral-Small
Mistral-Nemo-12B
Cohere-Command-R-7B
Groq-GPT-OSS-20B
Gemini-2.5-Flash
Base Model(s) Code:
Devstral-Small
Web Search Model:
Perplexity Search API
Groq-Compound-Mini
OCR Model:
Mistral OCR
Embedding Model:
Cohere-Embed
Codestral-Embed
Mistral-Embed
Reranking Model:
Cohere-Rerank
Local
Instruct Model(s) Chat:
LiquidAI/LFM2-8B-A1B-GGUF
google/gemma-7b-GGUF
command-r7b:7b
gemma3:4b
deepseek-r1:8b
qwen3:8b
llama3.1:8b
mistral:7b
dolphin3:8b
dolphin-llama3:8b
Base Model(s) Code:
google/codegemma-7b-GGUFp
LiquidAI/LFM2-1.2B-Tool-GGUF
LiquidAI/LFM2-350M-Math-GGUF
codegemma:7b
deepcoder:1.5b
deepseek-coder:6.7b
llama3-groq-tool-use:8b
qwen2.5-coder:7b
RAG Local Model:
LiquidAI/LFM2-1.2B-RAG-GGUF
Embedding Model:
leliuga/all-MiniLM-L12-v2-GGUF
LiquidAI/LFM2-1.2B-Extract
embeddinggemma:300m
all-MiniLM-L6-v2:22m
all-minilm:33m
Reranking Model:
gpustack/bge-reranker-v2-m3-GGUF
bge-reranker-v2-m3:600m
bge-m3:567m
Content not intended for SEO-Spam/spamdexing/AI Slop or otherwise — solely to spread the Gospel of Jesus Christ through musicianship, art, technology, and media.
Prompts Powered by GPT-5.1
For a complete Statement of Copyright Protection and Limitation and Liability Statement, please visit,
https://cmathgit.github.io/cruzgmacia...
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: