Getting DeepSeek R1 Running on Your Pi 5 (16 GB) with Open WebUI, RAG, and Pipelines

🚀 Introduction

Running DeepSeek R1 on a Pi 5 with 16 GB RAM feels like taking that same Pi 400 project from my February guide and super‑charging it. With more memory, faster CPU cores, and better headroom, we can use Open WebUI over Ollama, hook in RAG, and even add pipeline automations—all still local, all still low‑cost, all privacy‑first.

PiAI


💡 Why Pi 5 (16 GB)?

Jeremy Morgan and others have largely confirmed what we know: Raspberry Pi 5 with 8 GB or 16 GB is capable of managing the deepseek‑r1:1.5b model smoothly, hitting around 6 tokens/sec and consuming ~3 GB RAM (kevsrobots.comdev.to).

The extra memory gives breathing room for RAGpipelines, and more.


🛠️ Prerequisites & Setup

  • OS: Raspberry Pi OS (64‑bit, Bookworm)

  • Hardware: Pi 5, 16 GB RAM, 32 GB+ microSD or SSD, wired or stable Wi‑Fi

  • Tools: Docker, Docker Compose, access to terminal

🧰 System prep

bash
CopyEdit
sudo apt update && sudo apt upgrade -y
sudo apt install curl git

Install Docker & Compose:

bash
CopyEdit
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

Install Ollama (ARM64):

bash
CopyEdit
curl -fsSL https://ollama.com/install.sh | sh
ollama --version

⚙️ Docker Compose: Ollama + Open WebUI

Create the stack folder:

bash
CopyEdit
sudo mkdir -p /opt/stacks/openwebui
cd /opt/stacks/openwebui

Then create docker-compose.yaml:

yaml
CopyEdit
services:
ollama:
image: ghcr.io/ollama/ollama:latest
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
open-webui:
image: ghcr.io/open-webui/open-webui:ollama
container_name: open-webui
ports:
- "3000:8080"
volumes:
- openwebui_data:/app/backend/data
restart: unless-stopped

volumes:
ollama:
openwebui_data:

Bring it online:

bash
CopyEdit
docker compose up -d

✅ Ollama runs on port 11434Open WebUI on 3000.


📥 Installing DeepSeek R1 Model

In terminal:

bash
CopyEdit
ollama pull deepseek-r1:1.5b

In Open WebUI (visit http://<pi-ip>:3000):

  1. 🧑‍💻 Create your admin user

  2. ⚙️ Go to Settings → Models

  3. ➕ Pull deepseek-r1:1.5b via UI

Once added, it’s selectable from the top model dropdown.


💬 Basic Usage & Performance

Select deepseek-r1:1.5b, type your prompt:

→ Expect ~6 tokens/sec
→ ~3 GB RAM usage
→ CPU fully engaged

Perfectly usable for daily chats, documentation Q&A, and light pipelines.


📚 Adding RAG with Open WebUI

Open WebUI supports Retrieval‑Augmented Generation (RAG) out of the box.

Steps:

  1. 📄 Collect .md or .txt files (policies, notes, docs).

  2. ➕ In UI: Workspace → Knowledge → + Create Knowledge Base, upload your docs.

  3. 🧠 Then: Workspace → Models → + Add New Model

    • Model name: DeepSeek‑KB

    • Base model: deepseek-r1:1.5b

    • Knowledge: select the knowledge base

The result? 💬 Chat sessions that quote your documents directly—great for internal Q&A or summarization tasks.


🧪 Pipeline Automations

This is where things get real fun. With Pipelines, Open WebUI becomes programmable.

🧱 Start the pipelines container:

bash
CopyEdit
docker run -d -p 9099:9099 \
--add-host=host.docker.internal:host-gateway \
-v pipelines:/app/pipelines \
--name pipelines ghcr.io/open-webui/pipelines:main

Link it via WebUI Settings (URL: http://host.docker.internal:9099)

Now build workflows:

  • 🔗 Chain prompts (e.g. translate → summarize → translate back)

  • 🧹 Clean/filter input/output

  • ⚙️ Trigger external actions (webhooks, APIs, home automation)

Write custom Python logic and integrate it as a processing step.


🧭 Example Use Cases

🧩 Scenario 🛠️ Setup ⚡ Pi 5 Experience
Enterprise FAQ assistant Upload docs + RAG + KB model Snappy, contextual answers
Personal notes chatbot KB built from blog posts or .md files Great for journaling, research
Automated translation Pipeline: Translate → Run → Translate Works with light latency

📝 Tips & Gotchas

  • 🧠 Stick with 1.5B models for usability.

  • 📉 Monitor RAM and CPU; disable swap where possible.

  • 🔒 Be cautious with pipeline code—no sandboxing.

  • 🗂️ Use volume backups to persist state between upgrades.


🎯 Conclusion

Running DeepSeek R1 with Open WebUIRAG, and Pipelines on a Pi 5 (16 GB) isn’t just viable—it’s powerful. You can create focused, contextual AI tools completely offline. You control the data. You own the results.

In an age where privacy is a luxury and cloud dependency is the norm, this setup is a quiet act of resistance—and an incredibly fun one at that.

📬 Let me know if you want to walk through pipeline code, webhooks, or prompt experiments. The Pi is small—but what it teaches us is huge.

 

 

* AI tools were used as a research assistant for this content, but human moderation and writing are also included. The included images are AI-generated.

Getting DeepSeek R1 Running on Your Pi 400: A No-Nonsense Guide

After spending decades in cybersecurity, I’ve learned that sometimes the most interesting solutions come in small packages. Today, I want to talk about running DeepSeek R1 on the Pi 400 – it’s not going to replace ChatGPT, but it’s a fascinating experiment in edge AI computing.

PiAI

The Setup

First, let’s be clear – you’re not going to run the full 671B parameter model that’s making headlines. That beast needs serious hardware. Instead, we’ll focus on the distilled versions that actually work on our humble Pi 400.

Prerequisites:

            sudo apt update && sudo apt upgrade
            sudo apt install curl
            sudo ufw allow 11434/tcp
        

Installation Steps:

            # Install Ollama
            curl -fsSL https://ollama.com/install.sh | sh

            # Verify installation
            ollama --version

            # Start Ollama server
            ollama serve
        

What to Expect

Here’s the unvarnished truth about performance:

Model Options:

  • deepseek-r1:1.5b (Best performer, ~1.1GB storage)
  • deepseek-r1:7b (Slower but more capable, ~4.7GB storage)
  • deepseek-r1:8b (Even slower, ~4.8GB storage)

The 1.5B model is your best bet for actual usability. You’ll get around 1-2 tokens per second, which means you’ll need some patience, but it’s functional enough for experimentation and learning.

Real Talk

Look, I’ve spent my career telling hard truths about security, and I’ll be straight with you about this: running AI models on a Pi 400 isn’t going to revolutionize your workflow. But that’s not the point. This is about understanding edge AI deployment, learning about model quantization, and getting hands-on experience with local language models.

Think of it like the early days of computer networking – sometimes you need to start small to understand the big picture. Just don’t expect this to replace your ChatGPT subscription, and you won’t be disappointed.

Remember: security is about understanding both capabilities and limitations. This project teaches you both.

Sources