diff --git a/docs/features/plugin/tools/index.mdx b/docs/features/plugin/tools/index.mdx index 50d421b7..2dae8dc5 100644 --- a/docs/features/plugin/tools/index.mdx +++ b/docs/features/plugin/tools/index.mdx @@ -84,25 +84,30 @@ You can also let your LLM auto-select the right Tools using the [**AutoTool Filt --- -## Tool Calling Modes: Default vs. Native +## Tool Calling Modes: Default vs. Native (Agentic Mode) Open WebUI offers two distinct ways for models to interact with tools. Choosing the right mode depends on your model's capabilities and your performance requirements. ### 🟡 Default Mode (Prompt-based) -In Default Mode, Open WebUI manages tool selection by injecting a specific prompt template that guides the model to output a tool request. +In Default Mode, Open WebUI manages tool selection by injecting a specific prompt template that guides the model to output a tool request. - **Compatibility**: Works with **practically any model**, including older or smaller local models that lack native function-calling support. - **Flexibility**: Highly customizable via prompt templates. - **Caveat**: Can be slower (requires extra tokens) and less reliable for complex, multi-step tool chaining. -### 🟢 Native Mode (System Function Calling) -Native Mode leverages the model's built-in capability to handle tool definitions and return structured tool calls (JSON). This is the **recommended mode** for high-performance agentic workflows. +### 🟢 Native Mode (Agentic Mode / System Function Calling) +Native Mode (also called **Agentic Mode**) leverages the model's built-in capability to handle tool definitions and return structured tool calls (JSON). This is the **recommended mode** for high-performance agentic workflows. -#### Why use Native Mode? +:::warning Model Quality Matters +**Agentic tool calling requires high-quality models to work reliably.** While small local models may technically support function calling, they often struggle with the complex reasoning required for multi-step tool usage. For best results, use frontier models like **GPT-5**, **Claude 4.5 Sonnet**, **Gemini 3 Flash**, or **MiniMax M2.1**. Small local models may produce malformed JSON or fail to follow the strict state management required for agentic behavior. +::: + +#### Why use Native Mode (Agentic Mode)? - **Speed & Efficiency**: Lower latency as it avoids bulky prompt-based tool selection. -- **Reliability**: Higher accuracy in following tool schemas. +- **Reliability**: Higher accuracy in following tool schemas (with quality models). - **Multi-step Chaining**: Essential for **Agentic Research** and **Interleaved Thinking** where a model needs to call multiple tools in succession. +- **Autonomous Decision-Making**: Models can decide when to search, which tools to use, and how to combine results. -#### How to Enable Native Mode +#### How to Enable Native Mode (Agentic Mode) Native Mode can be enabled at two levels: 1. **Global/Administrator Level (Recommended)**: @@ -118,8 +123,19 @@ Native Mode can be enabled at two levels: #### Model Requirements & Caveats -- **Recommended Models**: High-tier models like **GPT-5**, **Claude 4.5 Sonnet**, **Gemini 3 Flash**, and **MiniMax M2.1** excel in Native Mode. -- **Local Model Warning**: While large local models (e.g., Qwen 3 32B) support native tool calling, **small local models** often struggle with Native Mode. They may produce malformed JSON or fail to follow the strict state management required for sequential calls. For these models, **Default Mode** is usually more reliable. + +:::tip Recommended Models for Agentic Mode +For reliable agentic tool calling, use high-tier frontier models: +- **GPT-5** (OpenAI) +- **Claude 4.5 Sonnet** (Anthropic) +- **Gemini 3 Flash** (Google) +- **MiniMax M2.1** + +These models excel at multi-step reasoning, proper JSON formatting, and autonomous tool selection. +::: + +- **Large Local Models**: Some large local models (e.g., Qwen 3 32B, Llama 3.3 70B) can work with Native Mode, but results vary significantly by model quality. +- **Small Local Models Warning**: **Small local models** (under 30B parameters) often struggle with Native Mode. They may produce malformed JSON, fail to follow strict state management, or make poor tool selection decisions. For these models, **Default Mode** is usually more reliable. | Feature | Default Mode | Native Mode | |:---|:---|:---| @@ -128,15 +144,21 @@ Native Mode can be enabled at two levels: | **Logic** | Prompt-based (Open WebUI) | Model-native (API/Ollama) | | **Complex Chaining** | ⚠️ Limited | ✅ Excellent | -### Built-in System Tools (Native Mode) +### Built-in System Tools (Native/Agentic Mode) -🛠️ When **Native Mode** is enabled, Open WebUI automatically injects powerful system tools based on the features toggled for the chat. This unlocks "Agentic" behaviors where models (like GPT-5, Claude 4.5, or MiniMax M2.1) can perform multi-step research or manage user memory dynamically. +🛠️ When **Native Mode (Agentic Mode)** is enabled, Open WebUI automatically injects powerful system tools based on the features toggled for the chat. This unlocks truly agentic behaviors where capable models (like GPT-5, Claude 4.5 Sonnet, Gemini 3 Flash, or MiniMax M2.1) can perform multi-step research, explore knowledge bases, or manage user memory autonomously. | Tool | Purpose | Requirements | |------|---------|--------------| | **Search & Web** | | | -| `web_search` | Performs a search using the configured Search Engine. | `ENABLE_WEB_SEARCH` enabled. | +| `search_web` | Search the public web for information. Best for current events, external references, or topics not covered in internal documents. | `ENABLE_WEB_SEARCH` enabled. | | `fetch_url` | Visits a URL and extracts text content via the Web Loader. | Part of Web Search feature. | +| **Knowledge Base** | | | +| `list_knowledge_bases` | List the user's accessible knowledge bases with file counts. | Always available. | +| `search_knowledge_bases` | Search knowledge bases by name and description. | Always available. | +| `search_knowledge_files` | Search files across accessible knowledge bases by filename. | Always available. | +| `view_knowledge_file` | Get the full content of a file from a knowledge base. | Always available. | +| `query_knowledge_bases` | Search internal knowledge bases using semantic/vector search. Should be your first choice for finding information before searching the web. | Always available. | | **Image Gen** | | | | `generate_image` | Generates a new image based on a prompt (supports `steps`). | `ENABLE_IMAGE_GENERATION` enabled. | | `edit_image` | Edits an existing image based on a prompt and URL. | `ENABLE_IMAGE_EDIT` enabled.| @@ -161,17 +183,21 @@ Native Mode can be enabled at two levels: | `get_current_timestamp` | Get the current UTC Unix timestamp and ISO date. | Always available. | | `calculate_timestamp` | Calculate relative timestamps (e.g., "3 days ago"). | Always available. | -**Why use these?** It allows for **Deep Research** (searching multiple times), **Contextual Awareness** (looking up previous chats or notes), **Dynamic Personalization** (saving facts), and **Precise Automation** (generating content based on existing notes). +**Why use these?** It allows for **Deep Research** (searching the web multiple times, or querying knowledge bases), **Contextual Awareness** (looking up previous chats or notes), **Dynamic Personalization** (saving facts), and **Precise Automation** (generating content based on existing notes or documents). ### Interleaved Thinking {#interleaved-thinking} -🧠 When using **Native Mode**, high-tier models can engage in **Interleaved Thinking**. This is a powerful "Thought → Action → Thought → Action → Thought → ..." loop where the model can reason about a task, execute one or more tools, evaluate the results, and then decide on its next move. +🧠 When using **Native Mode (Agentic Mode)**, high-tier models can engage in **Interleaved Thinking**. This is a powerful "Thought → Action → Thought → Action → Thought → ..." loop where the model can reason about a task, execute one or more tools, evaluate the results, and then decide on its next move. + +:::info Quality Models Required +Interleaved thinking requires models with strong reasoning capabilities. This feature works best with frontier models (GPT-5, Claude 4.5+, Gemini 3+) that can maintain context across multiple tool calls and make intelligent decisions about which tools to use when. +::: This is fundamentally different from a single-shot tool call. In an interleaved workflow, the model follows a cycle: 1. **Reason**: Analyze the user's intent and identify information gaps. -2. **Act**: Call a tool (e.g., `web_search` and `fetch_url`). +2. **Act**: Call a tool (e.g., `query_knowledge_bases` for internal docs or `search_web` and `fetch_url` for web research). 3. **Think**: Read the tool's output and update its internal understanding. -4. **Iterate**: If the answer isn't clear, call another tool (e.g., `fetch_url` to read a specific page) or refine the search. +4. **Iterate**: If the answer isn't clear, call another tool (e.g., `view_knowledge_file` to read a specific document or `fetch_url` to read a specific page) or refine the search. 5. **Finalize**: Only after completing this "Deep Research" cycle does the model provide a final, grounded answer. This behavior is what transforms a standard chatbot into an **Agentic AI** capable of solving complex, multi-step problems autonomously.