--- sidebar_position: 3 title: "πŸ¦™Starting with Llama.cpp" --- ## Overview Open WebUI makes it simple and flexible to connect and manage a local Llama.cpp server to run efficient, quantized language models. Whether you’ve compiled Llama.cpp yourself or you're using precompiled binaries, this guide will walk you through how to: - Set up your Llama.cpp server - Load large models locally - Integrate with Open WebUI for a seamless interface Let’s get you started! --- ## Step 1: Install Llama.cpp To run models with Llama.cpp, you first need the Llama.cpp server installed locally. You can either: - πŸ“¦ [Download prebuilt binaries](https://github.com/ggerganov/llama.cpp/releases) - πŸ› οΈ Or build it from source by following the [official build instructions](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md) After installing, make sure `llama-server` is available in your local system path or take note of its location. --- ## Step 2: Download a Supported Model You can load and run various GGUF-format quantized LLMs using Llama.cpp. One impressive example is the DeepSeek-R1 1.58-bit model optimized by UnslothAI. To download this version: 1. Visit the [Unsloth DeepSeek-R1 repository on Hugging Face](https://huggingface.co/unsloth/DeepSeek-R1-GGUF) 2. Download the 1.58-bit quantized version – around 131GB. Alternatively, use Python to download programmatically: ```python # pip install huggingface_hub hf_transfer from huggingface_hub import snapshot_download snapshot_download( repo_id = "unsloth/DeepSeek-R1-GGUF", local_dir = "DeepSeek-R1-GGUF", allow_patterns = ["*UD-IQ1_S*"], # Download only 1.58-bit variant ) ``` This will download the model files into a directory like: ``` DeepSeek-R1-GGUF/ └── DeepSeek-R1-UD-IQ1_S/ β”œβ”€β”€ DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf β”œβ”€β”€ DeepSeek-R1-UD-IQ1_S-00002-of-00003.gguf └── DeepSeek-R1-UD-IQ1_S-00003-of-00003.gguf ``` πŸ“ Keep track of the full path to the first GGUF file β€” you’ll need it in Step 3. --- ## Step 3: Serve the Model with Llama.cpp Start the model server using the llama-server binary. Navigate to your llama.cpp folder (e.g., build/bin) and run: ```bash ./llama-server \ --model /your/full/path/to/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \ --port 10000 \ --ctx-size 1024 \ --n-gpu-layers 40 ``` πŸ› οΈ Tweak the parameters to suit your machine: - --model: Path to your .gguf model file - --port: 10000 (or choose another open port) - --ctx-size: Token context length (can increase if RAM allows) - --n-gpu-layers: Layers offloaded to GPU for faster performance Once the server runs, it will expose a local OpenAI-compatible API on: ``` http://127.0.0.1:10000 ``` --- ## Step 4: Connect Llama.cpp to Open WebUI To control and query your locally running model directly from Open WebUI: 1. Open Open WebUI in your browser 2. Go to βš™οΈ Admin Settings β†’ Connections β†’ OpenAI Connections 3. Click βž• Add Connection and enter: - URL: `http://127.0.0.1:10000/v1` (Or use `http://host.docker.internal:10000/v1` if running WebUI inside Docker) - API Key: `none` (leave blank) πŸ’‘ Once saved, Open WebUI will begin using your local Llama.cpp server as a backend! ![Llama.cpp Connection in Open WebUI](/images/tutorials/deepseek/connection.png) --- ## Quick Tip: Try Out the Model via Chat Interface Once connected, select the model from the Open WebUI chat menu and start interacting! ![Model Chat Preview](/images/tutorials/deepseek/response.png) --- ## You're Ready to Go! Once configured, Open WebUI makes it easy to: - Manage and switch between local models served by Llama.cpp - Use the OpenAI-compatible API with no key needed - Experiment with massive models like DeepSeek-R1 β€” right from your machine! --- πŸš€ Have fun experimenting and building!