open-webui-docs/docs/tutorials/tips/reduce-ram-usage.md

---
sidebar_position: 10
title: "Reduce RAM Usage"
---

## Reduce RAM Usage

If you are deploying Open WebUI in a RAM-constrained environment (such as a Raspberry Pi, small VPS, or shared hosting), there are several strategies to significantly reduce memory consumption.

On a Raspberry Pi 4 (arm64) with version v0.3.10, these optimizations reduced idle memory consumption from >1GB to ~200MB (as observed with `docker container stats`).

---

## Quick Start

Set the following environment variables for immediate RAM savings:

```bash
# Use external embedding instead of local SentenceTransformers
RAG_EMBEDDING_ENGINE=ollama

# Use external Speech-to-Text instead of local Whisper
AUDIO_STT_ENGINE=openai
```

:::tip

These settings can also be configured in the **Admin Panel > Settings** interface - set RAG embedding to Ollama or OpenAI, and Speech-to-Text to OpenAI or WebAPI.

:::

---

## Why Does Open WebUI Use So Much RAM?

Much of the memory consumption comes from locally loaded ML models. Even when using an external LLM (OpenAI or separate Ollama instance), Open WebUI may load additional models for:

| Feature | Default | RAM Impact | Solution |
|---------|---------|------------|----------|
| **RAG Embedding** | Local SentenceTransformers | ~500-800MB | Use Ollama or OpenAI embeddings |
| **Speech-to-Text** | Local Whisper | ~300-500MB | Use OpenAI or WebAPI |
| **Reranking** | Disabled | ~200-400MB when enabled | Keep disabled or use external |
| **Image Generation** | Disabled | Variable | Keep disabled if not needed |

---

## ⚙️ Environment Variables for RAM Reduction

### Offload Embedding to External Service

The biggest RAM saver is using an external embedding engine:

```bash
# Option 1: Use Ollama for embeddings (if you have Ollama running separately)
RAG_EMBEDDING_ENGINE=ollama

# Option 2: Use OpenAI for embeddings
RAG_EMBEDDING_ENGINE=openai
OPENAI_API_KEY=your-api-key
```

### Offload Speech-to-Text

Local Whisper models consume significant RAM:

```bash
# Use OpenAI's Whisper API
AUDIO_STT_ENGINE=openai

# Or use browser-based WebAPI (no external service needed)
AUDIO_STT_ENGINE=webapi
```

### Disable Unused Features

Disable features you don't need to prevent model loading:

```bash
# Disable image generation (prevents loading image models)
ENABLE_IMAGE_GENERATION=False

# Disable code execution (reduces overhead)
ENABLE_CODE_EXECUTION=False

# Disable code interpreter
ENABLE_CODE_INTERPRETER=False
```

### Reduce Background Task Overhead

These settings reduce memory usage from background operations:

```bash
# Disable autocomplete (high resource usage)
ENABLE_AUTOCOMPLETE_GENERATION=False

# Disable automatic title generation
ENABLE_TITLE_GENERATION=False

# Disable tag generation
ENABLE_TAGS_GENERATION=False

# Disable follow-up suggestions
ENABLE_FOLLOW_UP_GENERATION=False
```

### Database and Cache Optimization

```bash
# Disable real-time chat saving (reduces database overhead)
ENABLE_REALTIME_CHAT_SAVE=False

# Reduce thread pool size for low-resource systems
THREAD_POOL_SIZE=10
```

### Vector Database Multitenancy

If using Milvus or Qdrant, enable multitenancy mode to reduce RAM:

```bash
# For Milvus
ENABLE_MILVUS_MULTITENANCY_MODE=True

# For Qdrant
ENABLE_QDRANT_MULTITENANCY_MODE=True
```

---

## 🚀 Recommended Minimal Configuration

For extremely RAM-constrained environments, use this combined configuration:

```bash
# Offload ML models to external services
RAG_EMBEDDING_ENGINE=ollama
AUDIO_STT_ENGINE=openai

# Disable all non-essential features
ENABLE_IMAGE_GENERATION=False
ENABLE_CODE_EXECUTION=False
ENABLE_CODE_INTERPRETER=False
ENABLE_AUTOCOMPLETE_GENERATION=False
ENABLE_TITLE_GENERATION=False
ENABLE_TAGS_GENERATION=False
ENABLE_FOLLOW_UP_GENERATION=False

# Reduce worker overhead
THREAD_POOL_SIZE=10
```

---

## 💡 Additional Tips

- **Monitor Memory Usage**: Use `docker container stats` or `htop` to monitor RAM consumption
- **Restart After Changes**: Environment variable changes require a container restart
- **Fresh Deployments**: Some environment variables only take effect on fresh deployments without an existing `config.json`
- **Consider Alternatives**: For very constrained systems, consider running Open WebUI on a more capable machine and accessing it remotely

---

## Related Guides

- [Improve Local LLM Performance](/tutorials/tips/improve-performance-local) - For optimizing performance without reducing features
- [Environment Variable Configuration](/getting-started/env-configuration) - Complete list of all configuration options