• 0 Posts
  • 2 Comments
Joined 2Y ago
cake
Cake day: Jul 04, 2023

help-circle
rss

So I googled it and if you have a Pi 5 with 8gb or 16gb of ram it is technically possible to run Ollama, but the speeds will be excruciatingly slow. My Nvidia 3060 12gb will run 14b (billion parameter) models typically around 11 tokens per second, this website shows a Pi 5 only runs an 8b model at 2 tokens per second - each query will literally take 5-10 minutes at that rate:
Pi 5 Deepseek
It also shows you can get a reasonable pace out of the 1.5b model but those are whittled down so much I don’t believe they’re really useful.

There are lots of lighter weight services you can host on a Pi though, I highly recommend an app called Cosmos Cloud, it’s really an all-in-one solution to building your own self-hosted services - it has its own reverse proxy like Nginx or Traefik including Let’s Encrypt security certificates, URL management, and incoming traffic security features; it has an excellent UI for managing docker containers and a large catalog of prepared docker compose files to spin up services with the click of a button; it has more advanced features you can grow into using like OpenID SSO manager, your own VPN, and disk management/backups.
It’s still very important to read the documentation thoroughly and expect occasional troubleshooting will be necessary, but I found it far, far easier to get working than a previous Nginx/Docker/Portainer setup I used.


Using Ollama depends a lot on the equipment you run - you should aim to have at least 12gb of VRAM/unified memory to run models. I have one copy running in a docker container using CPU on Linux and another running on the GPU of my windows desktop so I can give install advice for either OS if you’d like