Files
docker-docs/content/manuals/ai/docker-agent/reference/config.md
Guillaume Tardif 3f9571086e Rename cagent GH repo (#24285)
<!--Delete sections as needed -->

## Description

rename GH cagent repo URL, repo has been renamed to docker/docker-agent

## Reviews

<!-- Notes for reviewers here -->
<!-- List applicable reviews (optionally @tag reviewers) -->

- [ ] Technical review
- [ ] Editorial review
- [ ] Product review

---------

Signed-off-by: Guillaume Tardif <guillaume.tardif@gmail.com>
2026-03-10 12:03:58 +01:00

18 KiB

title, linkTitle, description, keywords, weight
title linkTitle description keywords weight
Configuration file reference Configuration file Complete reference for the Docker Agent YAML configuration file format
ai
agent
cagent
configuration
yaml
10

This reference documents the YAML configuration file format for agents suing Docker Agent. It covers file structure, agent parameters, model configuration, toolset setup, and RAG sources.

For detailed documentation of each toolset's capabilities and specific options, see the Toolsets reference.

File structure

A configuration file has four top-level sections:

agents: # Required - agent definitions
  root:
    model: anthropic/claude-sonnet-4-5
    description: What this agent does
    instruction: How it should behave

models: # Optional - model configurations
  custom_model:
    provider: openai
    model: gpt-5

rag: # Optional - RAG sources
  docs:
    docs: [./documents]
    strategies: [...]

metadata: # Optional - author, license, readme
  author: Your Name

Agents

Property Type Description Required
model string Model reference or name Yes
description string Brief description of agent's purpose No
instruction string Detailed behavior instructions Yes
sub_agents array Agent names for task delegation No
handoffs array Agent names for conversation handoff No
toolsets array Available tools No
welcome_message string Message displayed on start No
add_date boolean Include current date in context No
add_environment_info boolean Include working directory, OS, Git info No
add_prompt_files array Prompt file paths to include No
max_iterations integer Maximum tool call loops (unlimited if not set) No
num_history_items integer Conversation history limit No
code_mode_tools boolean Enable Code Mode for tools No
commands object Named prompts accessible via /command_name No
structured_output object JSON schema for structured responses No
rag array RAG source names No

Task delegation versus conversation handoff

Agents support two different delegation mechanisms. Choose based on whether you need task results or conversation control.

Sub_agents: Hierarchical task delegation

Use sub_agents for hierarchical task delegation. The parent agent assigns a specific task to a child agent using the transfer_task tool. The child executes in its own context and returns results. The parent maintains control and can delegate to multiple agents in sequence.

This works well for structured workflows where you need to combine results from specialists, or when tasks have clear boundaries. Each delegated task runs independently and reports back to the parent.

Example:

agents:
  root:
    sub_agents: [researcher, analyst]
    instruction: |
      Delegate research to researcher.
      Delegate analysis to analyst.
      Combine results and present findings.

Root calls: transfer_task(agent="researcher", task="Find pricing data"). The researcher completes the task and returns results to root.

Handoffs: Conversation transfer

Use handoffs to transfer conversation control to a different agent. When an agent uses the handoff tool, the new agent takes over completely. The original agent steps back until someone hands back to it.

This works well when different agents should own different parts of an ongoing conversation, or when specialists need to collaborate as peers without a coordinator managing every step.

Example:

agents:
  generalist:
    handoffs: [database_expert, security_expert]
    instruction: |
      Help with general development questions.
      If the conversation moves to database optimization,
      hand off to database_expert.
      If security concerns arise, hand off to security_expert.

  database_expert:
    handoffs: [generalist, security_expert]
    instruction: Handle database design and optimization.

  security_expert:
    handoffs: [generalist, database_expert]
    instruction: Review code for security vulnerabilities.

When the user asks about query performance, generalist executes: handoff(agent="database_expert"). The database expert now owns the conversation and can continue working with the user directly, or hand off to security_expert if the discussion shifts to SQL injection concerns.

Commands

Named prompts users invoke with /command_name. Supports JavaScript template literals with ${env.VARIABLE} for environment variables:

commands:
  greet: "Say hello to ${env.USER}"
  analyze: "Analyze ${env.PROJECT_NAME || 'demo'}"

Run with: docker agent run config.yaml /greet

Structured output

Constrain responses to a JSON schema (OpenAI and Gemini only):

structured_output:
  name: code_analysis
  strict: true
  schema:
    type: object
    properties:
      issues:
        type: array
        items: { ... }
    required: [issues]

Models

Property Type Description Required
provider string openai, anthropic, google, dmr Yes
model string Model name Yes
temperature float Randomness (0.0-2.0) No
max_tokens integer Maximum response length No
top_p float Nucleus sampling (0.0-1.0) No
frequency_penalty float Repetition penalty (-2.0 to 2.0, OpenAI only) No
presence_penalty float Topic penalty (-2.0 to 2.0, OpenAI only) No
base_url string Custom API endpoint No
parallel_tool_calls boolean Enable parallel tool execution (default: true) No
token_key string Authentication token key No
track_usage boolean Track token usage No
thinking_budget mixed Reasoning effort (provider-specific) No
provider_opts object Provider-specific options No

Alloy models

Use multiple models in rotation by separating names with commas:

model: anthropic/claude-sonnet-4-5,openai/gpt-5

Thinking budget

Controls reasoning depth. Configuration varies by provider:

  • OpenAI: String values - minimal, low, medium, high
  • Anthropic: Integer token budget (1024-32768, must be less than max_tokens)
    • Set provider_opts.interleaved_thinking: true for tool use during reasoning
  • Gemini: Integer token budget (0 to disable, -1 for dynamic, max 24576)
    • Gemini 2.5 Pro: 128-32768, cannot disable (minimum 128)
# OpenAI
thinking_budget: low

# Anthropic
thinking_budget: 8192
provider_opts:
  interleaved_thinking: true

# Gemini
thinking_budget: 8192    # Fixed
thinking_budget: -1      # Dynamic
thinking_budget: 0       # Disabled

Docker Model Runner (DMR)

Run local models. If base_url is omitted, Docker Agent auto-discovers via Docker Model plugin.

provider: dmr
model: ai/qwen3
max_tokens: 8192
base_url: http://localhost:12434/engines/llama.cpp/v1 # Optional

Pass llama.cpp options via provider_opts.runtime_flags (array, string, or multiline):

provider_opts:
  runtime_flags: ["--ngl=33", "--threads=8"]
  # or: runtime_flags: "--ngl=33 --threads=8"

Model config fields auto-map to runtime flags:

  • temperature--temp
  • top_p--top-p
  • max_tokens--context-size

Explicit runtime_flags override auto-mapped flags.

Speculative decoding for faster inference:

provider_opts:
  speculative_draft_model: ai/qwen3:0.6B-F16
  speculative_num_tokens: 16
  speculative_acceptance_rate: 0.8

Tools

Configure tools in the toolsets array. Three types: built-in, MCP (local/remote), and Docker Gateway.

[!NOTE] This section covers toolset configuration syntax. For detailed documentation of each toolset's capabilities, available tools, and specific configuration options, see the Toolsets reference.

All toolsets support common properties like tools (whitelist), defer (deferred loading), toon (output compression), env (environment variables), and instruction (usage guidance). See the Toolsets reference for details on these properties and what each toolset does.

Built-in tools

toolsets:
  - type: filesystem
  - type: shell
  - type: think
  - type: todo
    shared: true
  - type: memory
    path: ./memory.db

MCP tools

Local process:

- type: mcp
  command: npx
  args:
    ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/files"]
  tools: ["read_file", "write_file"] # Optional: limit to specific tools
  env:
    NODE_OPTIONS: "--max-old-space-size=8192"

Remote server:

- type: mcp
  remote:
    url: https://mcp-server.example.com
    transport_type: sse
    headers:
      Authorization: Bearer token

Docker MCP Gateway

Containerized tools from Docker MCP Catalog:

- type: mcp
  ref: docker:duckduckgo

RAG

Retrieval-augmented generation for document knowledge bases. Define sources at the top level, reference in agents.

rag:
  docs:
    docs: [./documents, ./README.md]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db

agents:
  root:
    rag: [docs]

Retrieval strategies

All strategies support chunking configuration. Chunk size and overlap are measured in characters (Unicode code points), not tokens.

Chunked-embeddings

Direct semantic search using vector embeddings. Best for understanding intent, synonyms, and paraphrasing.

Field Type Default
embedding_model string -
database string -
vector_dimensions integer -
similarity_metric string cosine
threshold float 0.5
limit integer 5
chunking.size integer 1000
chunking.overlap integer 75
chunking.respect_word_boundaries boolean true
chunking.code_aware boolean false
- type: chunked-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  database: ./vector.db
  similarity_metric: cosine_similarity
  threshold: 0.5
  limit: 10
  chunking:
    size: 1000
    overlap: 100

Semantic-embeddings

LLM-enhanced semantic search. Uses a language model to generate rich semantic summaries of each chunk before embedding, capturing deeper meaning.

Field Type Default
embedding_model string -
chat_model string -
database string -
vector_dimensions integer -
similarity_metric string cosine
threshold float 0.5
limit integer 5
ast_context boolean false
semantic_prompt string -
chunking.size integer 1000
chunking.overlap integer 75
chunking.respect_word_boundaries boolean true
chunking.code_aware boolean false
- type: semantic-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  chat_model: openai/gpt-5-mini
  database: ./semantic.db
  threshold: 0.3
  limit: 10
  chunking:
    size: 1000
    overlap: 100

BM25

Keyword-based search using BM25 algorithm. Best for exact terms, technical jargon, and code identifiers.

Field Type Default
database string -
k1 float 1.5
b float 0.75
threshold float 0.0
limit integer 5
chunking.size integer 1000
chunking.overlap integer 75
chunking.respect_word_boundaries boolean true
chunking.code_aware boolean false
- type: bm25
  database: ./bm25.db
  k1: 1.5
  b: 0.75
  threshold: 0.3
  limit: 10
  chunking:
    size: 1000
    overlap: 100

Hybrid retrieval

Combine multiple strategies with fusion:

strategies:
  - type: chunked-embeddings
    embedding_model: openai/text-embedding-3-small
    vector_dimensions: 1536
    database: ./vector.db
    limit: 20
  - type: bm25
    database: ./bm25.db
    limit: 15

results:
  fusion:
    strategy: rrf # Options: rrf, weighted, max
    k: 60 # RRF smoothing parameter
  deduplicate: true
  limit: 5

Fusion strategies:

  • rrf: Reciprocal Rank Fusion (recommended, rank-based, no normalization needed)
  • weighted: Weighted combination (fusion.weights: {chunked-embeddings: 0.7, bm25: 0.3})
  • max: Maximum score across strategies

Reranking

Re-score results with a specialized model for improved relevance:

results:
  reranking:
    model: openai/gpt-5-mini
    top_k: 10 # Only rerank top K (0 = all)
    threshold: 0.3 # Minimum score after reranking
    criteria: | # Optional domain-specific guidance
      Prioritize official docs over blog posts
  limit: 5

DMR native reranking:

models:
  reranker:
    provider: dmr
    model: hf.co/ggml-org/qwen3-reranker-0.6b-q8_0-gguf

results:
  reranking:
    model: reranker

Code-aware chunking

For source code, use AST-based chunking. With semantic-embeddings, you can include AST metadata in the LLM prompts:

- type: semantic-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  chat_model: openai/gpt-5-mini
  database: ./code.db
  ast_context: true # Include AST metadata in semantic prompts
  chunking:
    size: 2000
    code_aware: true # Enable AST-based chunking

RAG properties

Top-level RAG source:

Field Type Description
docs []string Document paths (supports glob patterns, respects .gitignore)
tool object Customize RAG tool name/description/instruction
strategies []object Retrieval strategies (see above for strategy-specific fields)
results object Post-processing (fusion, reranking, limits)

Results:

Field Type Default
limit integer 15
deduplicate boolean true
include_score boolean false
fusion.strategy string -
fusion.k integer 60
fusion.weights object -
reranking.model string -
reranking.top_k integer 0
reranking.threshold float 0.5
reranking.criteria string ""
return_full_content boolean false

Metadata

Documentation and sharing information:

Property Type Description
author string Author name
license string License (e.g., MIT, Apache-2.0)
readme string Usage documentation
metadata:
  author: Your Name
  license: MIT
  readme: |
    Description and usage instructions

Example configuration

Complete configuration demonstrating key features:

agents:
  root:
    model: claude
    description: Technical lead
    instruction: Coordinate development tasks and delegate to specialists
    sub_agents: [developer, reviewer]
    toolsets:
      - type: filesystem
      - type: mcp
        ref: docker:duckduckgo
    rag: [readmes]
    commands:
      status: "Check project status"

  developer:
    model: gpt
    description: Software developer
    instruction: Write clean, maintainable code
    toolsets:
      - type: filesystem
      - type: shell

  reviewer:
    model: claude
    description: Code reviewer
    instruction: Review for quality and security
    toolsets:
      - type: filesystem

models:
  gpt:
    provider: openai
    model: gpt-5

  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    max_tokens: 64000

rag:
  readmes:
    docs: ["**/README.md"]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db
        limit: 10
      - type: bm25
        database: ./bm25.db
        limit: 10
    results:
      fusion:
        strategy: rrf
        k: 60
      limit: 5

What's next