Files
docker-docs/content/manuals/ai/model-runner/api-reference.md

13 KiB

title, description, weight, keywords
title description weight keywords
DMR REST API Reference documentation for the Docker Model Runner REST API endpoints, including OpenAI, Anthropic, and Ollama compatibility. 30 Docker, ai, model runner, rest api, openai, anthropic, ollama, endpoints, documentation, cline, continue, cursor

Once Model Runner is enabled, new API endpoints are available. You can use these endpoints to interact with a model programmatically. Docker Model Runner provides compatibility with OpenAI, Anthropic, and Ollama API formats.

Determine the base URL

The base URL to interact with the endpoints depends on how you run Docker and which API format you're using.

{{< tabs >}} {{< tab name="Docker Desktop">}}

Access from Base URL
Containers http://model-runner.docker.internal
Host processes (TCP) http://localhost:12434

Note

TCP host access must be enabled. See Enable Docker Model Runner.

{{< /tab >}} {{< tab name="Docker Engine">}}

Access from Base URL
Containers http://172.17.0.1:12434
Host processes http://localhost:12434

Note

The 172.17.0.1 interface may not be available by default to containers within a Compose project. In this case, add an extra_hosts directive to your Compose service YAML:

extra_hosts:
  - "model-runner.docker.internal:host-gateway"

Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/

{{< /tab >}} {{}}

Base URLs for third-party tools

When configuring third-party tools that expect OpenAI-compatible APIs, use these base URLs:

Tool type Base URL format
OpenAI SDK / clients http://localhost:12434/engines/v1
Anthropic SDK / clients http://localhost:12434
Ollama-compatible clients http://localhost:12434

See IDE and tool integrations for specific configuration examples.

Supported APIs

Docker Model Runner supports multiple API formats:

API Description Use case
OpenAI API OpenAI-compatible chat completions, embeddings Most AI frameworks and tools
Anthropic API Anthropic-compatible messages endpoint Tools built for Claude
Ollama API Ollama-compatible endpoints Tools built for Ollama
Image Generation API Diffusers-based image generation Generating images from text prompts
DMR API Native Docker Model Runner endpoints Model management

OpenAI-compatible API

DMR implements the OpenAI API specification for maximum compatibility with existing tools and frameworks.

Endpoints

Endpoint Method Description
/engines/v1/models GET List models
/engines/v1/models/{namespace}/{name} GET Retrieve model
/engines/v1/chat/completions POST Create chat completion
/engines/v1/completions POST Create completion
/engines/v1/embeddings POST Create embeddings

Note

You can optionally include the engine name in the path: /engines/llama.cpp/v1/chat/completions. This is useful when running multiple inference engines.

Model name format

When specifying a model in API requests, use the full model identifier including the namespace:

{
  "model": "ai/smollm2",
  "messages": [...]
}

Common model name formats:

  • Docker Hub models: ai/smollm2, ai/llama3.2, ai/qwen2.5-coder
  • Tagged versions: ai/smollm2:360M-Q4_K_M
  • Custom models: myorg/mymodel

Supported parameters

The following OpenAI API parameters are supported:

Parameter Type Description
model string Required. The model identifier.
messages array Required for chat completions. The conversation history.
prompt string Required for completions. The prompt text.
max_tokens integer Maximum tokens to generate.
temperature float Sampling temperature (0.0-2.0).
top_p float Nucleus sampling parameter (0.0-1.0).
stream Boolean Enable streaming responses.
stop string/array Stop sequences.
presence_penalty float Presence penalty (-2.0 to 2.0).
frequency_penalty float Frequency penalty (-2.0 to 2.0).

Limitations and differences from OpenAI

Be aware of these differences when using DMR's OpenAI-compatible API:

Feature DMR behavior
API key Not required. DMR ignores the Authorization header.
Function calling Supported with llama.cpp for compatible models.
Vision Supported for multi-modal models (e.g., LLaVA).
JSON mode Supported via response_format: {"type": "json_object"}.
Logprobs Supported.
Token counting Uses the model's native token encoder, which may differ from OpenAI's.

Anthropic-compatible API

DMR provides Anthropic Messages API compatibility for tools and frameworks built for Claude.

Endpoints

Endpoint Method Description
/anthropic/v1/messages POST Create a message
/anthropic/v1/messages/count_tokens POST Count tokens

Supported parameters

The following Anthropic API parameters are supported:

Parameter Type Description
model string Required. The model identifier.
messages array Required. The conversation messages.
max_tokens integer Maximum tokens to generate.
temperature float Sampling temperature (0.0-1.0).
top_p float Nucleus sampling parameter.
top_k integer Top-k sampling parameter.
stream Boolean Enable streaming responses.
stop_sequences array Custom stop sequences.
system string System prompt.

Example: Chat with Anthropic API

curl http://localhost:12434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example: Streaming response

curl http://localhost:12434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Count from 1 to 10"}
    ]
  }'

Ollama-compatible API

DMR also provides Ollama-compatible endpoints for tools and frameworks built for Ollama.

Endpoints

Endpoint Method Description
/api/tags GET List available models
/api/show POST Show model information
/api/chat POST Generate chat completion
/api/generate POST Generate completion
/api/embeddings POST Generate embeddings

Example: Chat with Ollama API

curl http://localhost:12434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example: List models

curl http://localhost:12434/api/tags

Image generation API (Diffusers)

DMR supports image generation through the Diffusers backend, enabling you to generate images from text prompts using models like Stable Diffusion.

Note

The Diffusers backend requires an NVIDIA GPU with CUDA support and is only available on Linux (x86_64 and ARM64). See Inference engines for setup instructions.

Endpoint

Endpoint Method Description
/engines/diffusers/v1/images/generations POST Generate an image from a text prompt

Supported parameters

Parameter Type Description
model string Required. The model identifier (e.g., stable-diffusion:Q4).
prompt string Required. The text description of the image to generate.
size string Image dimensions in WIDTHxHEIGHT format (e.g., 512x512).

Response format

The API returns a JSON response with the generated image encoded in base64:

{
  "data": [
    {
      "b64_json": "<base64-encoded-image-data>"
    }
  ]
}

Example: Generate an image

curl -s -X POST http://localhost:12434/engines/diffusers/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "stable-diffusion:Q4",
    "prompt": "A picture of a nice cat",
    "size": "512x512"
  }' | jq -r '.data[0].b64_json' | base64 -d > image.png

This command:

  1. Sends a POST request to the Diffusers image generation endpoint
  2. Specifies the model, prompt, and output image size
  3. Extracts the base64-encoded image from the response using jq
  4. Decodes the base64 data and saves it as image.png

DMR native endpoints

These endpoints are specific to Docker Model Runner for model management:

Endpoint Method Description
/models/create POST Pull/create a model
/models GET List local models
/models/{namespace}/{name} GET Get model details
/models/{namespace}/{name} DELETE Delete a local model

REST API examples

Request from within a container

To call the chat/completions OpenAI endpoint from within another container using curl:

#!/bin/sh

curl http://model-runner.docker.internal/engines/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Request from the host using TCP

To call the chat/completions OpenAI endpoint from the host via TCP:

  1. Enable the host-side TCP support from the Docker Desktop GUI, or via the Docker Desktop CLI. For example: docker desktop enable model-runner --tcp <port>.

    If you are running on Windows, also enable GPU-backed inference. See Enable Docker Model Runner.

  2. Interact with it as documented in the previous section using localhost and the correct port.

#!/bin/sh

curl http://localhost:12434/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "ai/smollm2",
      "messages": [
          {
              "role": "system",
              "content": "You are a helpful assistant."
          },
          {
              "role": "user",
              "content": "Please write 500 words about the fall of Rome."
          }
      ]
  }'

Request from the host using a Unix socket

To call the chat/completions OpenAI endpoint through the Docker socket from the host using curl:

#!/bin/sh

curl --unix-socket $HOME/.docker/run/docker.sock \
    localhost/exp/vDD4.40/engines/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Streaming responses

To receive streaming responses, set stream: true:

curl http://localhost:12434/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "ai/smollm2",
      "stream": true,
      "messages": [
          {"role": "user", "content": "Count from 1 to 10"}
      ]
  }'

Using with OpenAI SDKs

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:12434/engines/v1",
    api_key="not-needed"  # DMR doesn't require an API key
)

response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:12434/engines/v1',
  apiKey: 'not-needed',
});

const response = await client.chat.completions.create({
  model: 'ai/smollm2',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

What's next