local LLM inferencing on a smartphone with NO Internet.
Автор: Yitz
Загружено: 2025-02-18
Просмотров: 71
Описание:
local LLM inferencing on a smartphone with NO Internet.
using ONNX ONNX Runtime and transformers.js with Qwen 2.5 0.5b model from Hugging Face using the https://ailocalhost.com chat UI.
I could go on about how it's not working properly... and how I had it working properly with webGPU and steaming responses, and more models would work EARLIER, but since I'm hardly a software developer, I didn't have versions saved and so that was all lost and I'm back here .. so now with just wasm (CPU) working, not streaming responses, many models that worked earlier won't load (I could paste a load of errors too), and none of the transformers parameters are being sent properly to the pipeline commands. lol 😂 well well ... it works enough for a demo!
the server routing, model selection, maxtokens, temp, top p, presence penalty are all working... and so it's good for testing .. and showing off!
plus all the other features work fine, so can still use it for Ollama and Open AI API servers with all the other features working 💯 💯 💯 💯
ITQIX Technology Group
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: