diff --git a/docs/features/ai-knowledge/skills.md b/docs/features/ai-knowledge/skills.md
index f8a3d566..19c0f457 100644
--- a/docs/features/ai-knowledge/skills.md
+++ b/docs/features/ai-knowledge/skills.md
@@ -111,7 +111,7 @@ Follow the [Open Terminal setup guide](/features/extensibility/open-terminal#get
 Add your Open Terminal instance as a Tool Server by following the [OpenAPI Tool Server Integration Guide](/features/extensibility/plugin/tools/openapi-servers/open-webui). You can connect it as:
 
 - A **User Tool Server** (in **Settings → Tools**) — connects from your browser, ideal for personal or local instances
-- A **Global Tool Server** (in **Admin Settings → Tools**) — connects from the backend, available to all users
+- A **Global Tool Server** (in **Admin Settings → Integrations**) — connects from the backend, available to all users
 
 Once connected, the Open Terminal tools (execute, file upload, file download) appear automatically in the chat interface.
 
diff --git a/docs/features/chat-conversations/chat-features/code-execution/index.md b/docs/features/chat-conversations/chat-features/code-execution/index.md
index 9a566f44..ce504682 100644
--- a/docs/features/chat-conversations/chat-features/code-execution/index.md
+++ b/docs/features/chat-conversations/chat-features/code-execution/index.md
@@ -54,6 +54,7 @@ If you are running a multi-user or organizational deployment, **Jupyter is not r
 - **Full shell access** — models can install packages, run scripts in any language, use system tools like ffmpeg, git, curl, etc.
 - **Container isolation** — runs in its own Docker container, separate from Open WebUI and other services.
 - **Rich pre-installed toolset** — the Docker image comes with Python 3.12, data science libraries, build tools, networking utilities, and more.
+- **Built-in file browser** — browse, preview, create, delete, upload, and download files directly from the chat controls panel.
 
 ### Comparison
 
@@ -66,6 +67,6 @@ If you are running a multi-user or organizational deployment, **Jupyter is not r
 | **Isolation** | ✅ Browser sandbox | ❌ Shared environment | ✅ Container-level (when using Docker) |
 | **Multi-user safety** | ✅ Per-user by design | ⚠️ Not isolated | ⚠️ Single instance (per-user containers planned) |
 | **File generation** | ❌ Very limited | ✅ Full support | ✅ Full support with upload/download |
-| **Setup** | None (built-in) | Admin configures globally | Each user adds as a Tool Server |
+| **Setup** | None (built-in) | Admin configures globally | Native integration via Settings → Integrations, or as a Tool Server |
 | **Recommended for orgs** | ✅ Safe default | ❌ Not without isolation | ✅ Per-user by design |
 | **Enterprise scalability** | ✅ Client-side, no server load | ❌ Single shared instance | ⚠️ Manual per-user instances |
diff --git a/docs/features/extensibility/mcp.mdx b/docs/features/extensibility/mcp.mdx
index f1db3349..b85fcd0a 100644
--- a/docs/features/extensibility/mcp.mdx
+++ b/docs/features/extensibility/mcp.mdx
@@ -64,6 +64,14 @@ Browser-based, multi-user deployments increase the surface area (CORS/CSRF, per-
 *   **Bearer**: Use this **only** if your MCP server requires a specific API token. You **must** populate the "Key" field.
 *   **OAuth 2.1**: For secured, enterprise deployments requiring Identity Provider flows.
 
+:::warning OAuth 2.1 Tools Cannot Be Set as Default Tools
+**Do not set OAuth 2.1 MCP tools as default/pre-enabled tools on a model.** The OAuth 2.1 authorization flow requires an interactive browser redirect (user consent, callback) that cannot happen transparently during a chat completion request.
+
+If an OAuth 2.1 tool is set as a default and the user hasn't previously authenticated (or their refresh token has expired), the tool call will fail with **"Failed to connect to MCP server"** because the backend cannot initiate the browser-based auth flow mid-request.
+
+**Workaround:** Users should manually enable OAuth 2.1 tools per-chat via the **➕** button in the chat input area. This triggers the auth flow before the tool is ever invoked. Token refresh works automatically once the initial authentication is complete.
+:::
+
 ### Connection URLs
 
 If you are running Open WebUI in **Docker** and your MCP server is on the **host machine**:
@@ -85,6 +93,7 @@ The chat shows "Failed to connect to MCP server" when using a tool, even if the
 **Solutions**:
 1.  **Check Authentication**: Ensure you haven't selected `Bearer` without a key. Switch to `None` if no token is needed.
 2.  **Filter List Bug**: If the "Function Name Filter List" is empty, try adding a comma (`,`) to it.
+3.  **OAuth 2.1 Default Tool**: If the failing tool uses OAuth 2.1 and is set as a default tool on the model, this is a [known limitation](#oauth-21-tools-cannot-be-set-as-default-tools). Remove it from the model's default tools and have users enable it manually per-chat.
 
 ### Infinite loading screen after adding External Tool
 
diff --git a/docs/features/extensibility/open-terminal/index.md b/docs/features/extensibility/open-terminal/index.md
index 1ec477c7..7618b57a 100644
--- a/docs/features/extensibility/open-terminal/index.md
+++ b/docs/features/extensibility/open-terminal/index.md
@@ -7,7 +7,7 @@ title: "Open Terminal"
 
 :::info
 
-This page is up-to-date with Open Terminal release version [v0.2.4](https://github.com/open-webui/open-terminal).
+This page is up-to-date with Open Terminal release version [v0.4.0](https://github.com/open-webui/open-terminal).
 
 :::
 
@@ -70,6 +70,9 @@ uvx open-terminal run --host 0.0.0.0 --port 8000 --api-key your-secret-key
 # Or install globally with pip
 pip install open-terminal
 open-terminal run --host 0.0.0.0 --port 8000 --api-key your-secret-key
+
+# Set a custom working directory
+open-terminal run --cwd /path/to/project
 ```
 
 :::warning
@@ -99,6 +102,7 @@ open-terminal mcp --transport streamable-http --host 0.0.0.0 --port 8000
 | `--transport` | `stdio` | Transport mode: `stdio` or `streamable-http` |
 | `--host` | `0.0.0.0` | Bind address (streamable-http only) |
 | `--port` | `8000` | Bind port (streamable-http only) |
+| `--cwd` | Server's current directory | Working directory for the server process |
 
 Under the hood, this uses [FastMCP](https://github.com/jlowin/fastmcp) to automatically convert every FastAPI endpoint into an MCP tool — no manual tool definitions needed.
 
@@ -140,23 +144,172 @@ volumes:
 | :--- | :--- | :--- | :--- |
 | `--host` | `0.0.0.0` | — | Bind address |
 | `--port` | `8000` | — | Bind port |
+| `--cwd` | Current directory | — | Working directory for the server process |
 | `--api-key` | Auto-generated | `OPEN_TERMINAL_API_KEY` | Bearer API key for authentication |
 | — | `~/.open-terminal/logs` | `OPEN_TERMINAL_LOG_DIR` | Directory for process JSONL log files |
+| — | `image` | `OPEN_TERMINAL_BINARY_MIME_PREFIXES` | Comma-separated MIME type prefixes for binary files that `read_file` returns as raw binary responses (e.g. `image,audio` or `image/png,image/jpeg`) |
 
 When no API key is provided, Open Terminal generates a random key using a cryptographically secure token and prints it to the console on startup.
 
 Process output is persisted to **JSONL log files** under `OPEN_TERMINAL_LOG_DIR/processes/`. These files provide a full audit trail that survives process cleanup and server restarts.
 
 :::note Performance
-As of v0.2.4, all file and upload endpoints use **fully async I/O** via `aiofiles`. Directory listing and file search operations run in a background thread via `asyncio.to_thread`. This means the server's event loop is never blocked by filesystem operations, even on large directories or slow storage.
+All file and upload endpoints use **fully async I/O** via `aiofiles`. Directory listing and file search operations run in a background thread via `asyncio.to_thread`. This means the server's event loop is never blocked by filesystem operations, even on large directories or slow storage. As of v0.2.5, all file endpoints gracefully handle permission errors and OS-level exceptions instead of crashing with HTTP 500.
 :::
 
 ## Connecting to Open WebUI
 
-Open Terminal is a FastAPI application and automatically exposes an OpenAPI specification at `/openapi.json`. This means it works out of the box as an [OpenAPI Tool Server](/features/extensibility/plugin/tools/openapi-servers/open-webui) — no manual tool creation required.
+There are three ways to connect Open Terminal to Open WebUI: **admin-configured** (recommended), **user-configured**, and the **generic OpenAPI Tool Server** method.
 
-- **As a User Tool Server**: Add it in **Settings → Tools** to connect directly from your browser. Ideal for personal or local instances.
-- **As a Global Tool Server**: Add it in **Admin Settings → Tools** to make it available to all users across the deployment.
+### Admin-Configured (Recommended)
+
+:::tip Experimental
+The native Open Terminal integration is currently marked as **experimental**. It provides a tighter experience than the generic OpenAPI approach, with features like the built-in file browser.
+:::
+
+Administrators can add Open Terminal connections that are available to all users (or specific groups) without exposing the terminal URL or API key to the browser. All requests are **proxied through the Open WebUI backend**, which means:
+
+- The terminal's API key is never sent to the client.
+- Access control is enforced server-side using group-based permissions.
+- Multiple authentication types are supported: **Bearer** (default), **Session**, **OAuth**, or **None**.
+
+This gives every user with access:
+
+- **Always-on tools** — When a terminal is selected, all Open Terminal endpoints are automatically injected as tools into every chat. No need to manually select them.
+- **Built-in file browser** — A **Files** tab appears in the chat controls panel when a terminal is selected, letting you browse, preview, download, upload, and attach files from the terminal directly in the chat UI.
+- **Terminal selector** — A terminal dropdown in the message input area lets users pick which terminal to use. Admin-configured terminals appear under the **System** category.
+
+#### Setup
+
+1. Go to **Admin Settings → Integrations**
+2. Scroll to the **Open Terminal** section
+3. Click **+** to add a new connection
+4. Enter the **URL** (e.g. `http://open-terminal:8000`) and **API key**
+5. Choose an **authentication type** (Bearer is recommended for most setups)
+6. Optionally restrict access to specific groups via **access grants**
+
+Each connection has an **enable/disable toggle**. Only enabled terminals appear in the terminal selector for users. You can add multiple terminal connections and enable or disable them independently.
+
+:::info
+The terminal connection can also be pre-configured via the [`TERMINAL_SERVER_CONNECTIONS`](/reference/env-configuration#terminal_server_connections) environment variable.
+:::
+
+#### Authentication Types
+
+| Type | Description |
+| :--- | :--- |
+| **Bearer** | Open WebUI sends the configured API key as a `Bearer` token to the terminal server. This is the default and recommended method. |
+| **Session** | Forwards the user's Open WebUI session credentials to the terminal server. Useful when the terminal server validates against the same auth backend. |
+| **OAuth** | Forwards the user's OAuth access token. Requires OAuth to be configured in Open WebUI. |
+| **None** | No authentication header is sent. Only use this for terminals on a trusted internal network. |
+
+#### Access Control
+
+By default, all users can access admin-configured terminals. To restrict access, add **access grants** in the terminal connection configuration. Access grants work the same way as [group-based permissions](/features/access-security/rbac/groups) — you can limit access to specific user groups.
+
+### User-Configured
+
+Individual users can also add their own Open Terminal connections under **Settings → Integrations**. This is useful for personal development terminals or when administrators haven't configured a shared instance. User-configured terminals appear under the **Direct** category in the terminal selector.
+
+#### Setup
+
+1. Go to **Settings → Integrations**
+2. Scroll to the **Open Terminal** section
+3. Click **+** to add a new connection
+4. Enter the **URL** (e.g. `http://open-terminal:8000`) and **API key**
+5. Select the terminal from the **terminal selector dropdown** in the chat input area
+
+:::note
+User-configured terminals connect **directly from the browser** to the terminal server. The terminal URL must be reachable from the user's machine, and the API key is stored in the browser. For production deployments, prefer the admin-configured approach.
+:::
+
+### Terminal Selector
+
+The message input area includes a **terminal selector dropdown** that shows all available terminals organized into two categories:
+
+- **System** — Admin-configured terminals (proxied through Open WebUI)
+- **Direct** — User-configured terminals (direct browser connection)
+
+Click a terminal to select it. Selecting a terminal:
+
+- Activates its tools for the current chat
+- Opens the **Files** tab in the chat controls panel
+- Loads the terminal's current working directory in the file browser
+
+Click the same terminal again to deselect it. Only one terminal can be active at a time — selecting a system terminal automatically deactivates any direct terminal, and vice versa.
+
+#### File Browser
+
+When a terminal is selected, the chat controls panel gains a **Files** tab:
+
+- **Browse** directories on the remote terminal filesystem
+- **Preview** text files, images, PDFs, CSV/TSV files (rendered as formatted tables), and Markdown inline — with a **Source/Preview toggle** for Markdown and CSV files
+- **Edit** text files inline — click the edit (pencil) icon in the toolbar to modify file contents, then save or cancel. Changes are written directly to the terminal filesystem.
+- **Create files** using the new file button in the toolbar (creates an empty file with the name you provide)
+- **Create folders** using the new folder button in the toolbar
+- **Delete** files and folders via the context menu (⋯) on each entry
+- **Download** any file to your local machine via the toolbar download button
+- **Upload** files by dragging and dropping them onto the directory listing
+- **Attach** files to the current chat by downloading them through the file browser
+- **Reset view** for images (resets zoom/pan back to default)
+
+The file browser remembers your last-visited directory between panel open/close cycles and automatically reloads when you switch terminals.
+
+### Networking & Connectivity
+
+Understanding where requests originate is essential for configuring Open Terminal correctly. **Admin-configured and user-configured connections work fundamentally differently at the network level**, and using the wrong URL is the most common cause of connection failures.
+
+#### Where Do Requests Come From?
+
+| Connection Type | Request Origin | What `localhost` Means |
+| :--- | :--- | :--- |
+| **Admin-Configured (System)** | Open WebUI **backend server** | The machine/container running Open WebUI |
+| **User-Configured (Direct)** | User's **browser** | The machine running the browser |
+| **Generic OpenAPI (User)** | User's **browser** | The machine running the browser |
+| **Generic OpenAPI (Global)** | Open WebUI **backend server** | The machine/container running Open WebUI |
+
+This means:
+
+- **A URL that works for a user-configured terminal may not work for an admin-configured terminal** (and vice versa), even though the URL is identical.
+- If Open WebUI runs in Docker, `localhost` inside the container refers to the container itself — not the host machine. Use the container/service name (e.g. `http://open-terminal:8000`) or `host.docker.internal` instead.
+- If you use a reverse proxy (e.g. Nginx) to expose Open Terminal under a path like `https://yourdomain.com/terminal`, the backend must be able to resolve and reach that hostname. If the hostname resolves to `127.0.0.1` on the backend, the request will fail with a 502 error.
+
+#### Common Symptoms
+
+| Symptom | Likely Cause | Fix |
+| :--- | :--- | :--- |
+| **502 Bad Gateway** on `/api/v1/terminals/...` endpoints | The Open WebUI backend cannot reach the terminal URL | Use a URL the backend can resolve — container name, internal IP, or `host.docker.internal` |
+| **User connection works, admin connection doesn't** | The URL resolves correctly from the browser but not from the backend container | Use a different URL for admin config that the backend can reach |
+| **`Connect call failed ('127.0.0.1', ...)`** in backend logs | Hostname resolves to localhost inside the container | Use the actual IP, container name, or Docker network hostname |
+| **Connection timeout** | Firewall blocking traffic between containers/hosts | Ensure both containers are on the same Docker network, or open the necessary ports |
+
+:::warning Same URL, Different Results
+**The same URL can work as a user-configured terminal but fail as an admin-configured terminal.** This is not a bug — it's how networking works.
+
+**Example:** You have Open Terminal at `https://myserver.com/terminal` with an Nginx reverse proxy. When a user adds this URL, their browser connects directly to `myserver.com` → Nginx → Open Terminal. When an admin adds the same URL, the Open WebUI backend tries to connect to `myserver.com`, which may resolve to `127.0.0.1` inside the Docker container — bypassing Nginx entirely and failing with a 502.
+
+**Fix:** For admin-configured terminals, use the **internal URL** that the backend can reach directly (e.g. `http://open-terminal:8000` if both containers are on the same Docker network).
+:::
+
+:::tip Quick Test
+To verify the backend can reach your terminal URL, exec into the Open WebUI container and test:
+
+```bash
+# From inside the Open WebUI container
+curl -s http://open-terminal:8000/openapi.json | head -c 200
+```
+
+If this returns JSON, the backend can reach it. If it fails, your admin-configured terminal will also fail.
+:::
+
+For the same networking concepts applied to generic OpenAPI tool servers, see the [Tool Server Networking Guide](/features/extensibility/plugin/tools/openapi-servers/open-webui#main-difference-where-are-requests-made-from).
+
+### Generic OpenAPI Tool Server
+
+Open Terminal is also a standard FastAPI application that exposes an OpenAPI specification at `/openapi.json`. This means it works as a generic [OpenAPI Tool Server](/features/extensibility/plugin/tools/openapi-servers/open-webui) — useful when you want more control over which tools are enabled per-chat.
+
+- **As a User Tool Server**: Add it in **Settings → Tools** to connect directly from your browser.
+- **As a Global Tool Server**: Add it in **Admin Settings → Integrations** to make it available to all users.
 
 For step-by-step instructions with screenshots, see the [OpenAPI Tool Server Integration Guide](/features/extensibility/plugin/tools/openapi-servers/open-webui).
 
@@ -178,6 +331,10 @@ Runs a shell command as a **background process** and returns a process ID. All o
 The `/execute` endpoint description in the OpenAPI spec automatically includes live system metadata — OS, hostname, current user, default shell, Python version, and working directory. When Open WebUI discovers this tool via the OpenAPI spec, models see this context in the tool description and can adapt their commands accordingly.
 :::
 
+:::info PTY Execution (v0.3.0+)
+Commands now run under a **pseudo-terminal (PTY)** by default on Linux/macOS. This means programs see a real terminal and produce colored output, interactive TUI applications work correctly, and `isatty()` returns true. On Windows, execution falls back to pipe-based subprocess handling.
+:::
+
 **Request body:**
 
 | Field | Type | Default | Description |
@@ -245,12 +402,16 @@ curl -X POST "http://localhost:8000/execute?wait=5" \
 All background process output (stdout/stderr) is persisted to JSONL log files under `~/.open-terminal/logs/processes/`. This means output is never lost, even if the server restarts. The response includes `next_offset` for stateless incremental polling — pass it as the `offset` query parameter on subsequent status requests to get only new output. The `log_path` field shows the path to the raw JSONL log file.
 :::
 
-### Search File Contents
+### Grep Search (File Contents)
 
-**`GET /files/search`**
+**`GET /files/grep`**
 
 Search for a text pattern across files in a directory. Returns structured matches with file paths, line numbers, and matching lines. Skips binary files automatically.
 
+:::note Renamed in v0.2.6
+This endpoint was renamed from `/files/search` to `/files/grep` to clearly distinguish content-level search from filename-level search (`/files/glob`).
+:::
+
 **Query parameters:**
 
 | Parameter | Type | Default | Description |
@@ -264,7 +425,7 @@ Search for a text pattern across files in a directory. Returns structured matche
 | `max_results` | integer | `50` | Maximum number of matches to return (1–500) |
 
 ```bash
-curl "http://localhost:8000/files/search?query=TODO&include=*.py&case_insensitive=true" \
+curl "http://localhost:8000/files/grep?query=TODO&include=*.py&case_insensitive=true" \
 
 #### Get Command Status
 
@@ -327,13 +488,15 @@ Finished processes are automatically cleaned up after 5 minutes. Their JSONL log
 
 **`POST /execute/{process_id}/input`**
 
-Sends text to a running process's stdin. Useful for interacting with REPLs, interactive commands, or any process waiting for input.
+Sends text to a running process's stdin (or PTY). Useful for interacting with REPLs, interactive commands, or any process waiting for input.
+
+Literal escape strings from LLMs are automatically converted to real characters: `\n` → newline, `\t` → tab, `\x03` → Ctrl-C, `\x04` → Ctrl-D, etc.
 
 **Request body:**
 
 | Field | Type | Description |
 | :--- | :--- | :--- |
-| `input` | string | Text to send to stdin. Include newline characters as needed. |
+| `input` | string | Text to send to stdin. Escape sequences like `\n` and `\x03` are automatically converted. |
 
 ```bash
 curl -X POST http://localhost:8000/execute/a1b2c3d4e5f6/input \
@@ -388,7 +551,7 @@ curl "http://localhost:8000/files/list?directory=/home/user" \
 
 **`GET /files/read`**
 
-Returns the contents of a file. Text files return a content string; binary files return base64-encoded content. Optionally specify a line range for large text files.
+Returns the contents of a file. Text files return a JSON object with a content string. **PDF files** are automatically converted to text using pypdf and returned in the same JSON format. Supported binary types (configurable, default: `image/*`) return the raw binary with the appropriate `Content-Type` header. Unsupported binary types are rejected with HTTP 415.
 
 | Parameter | Type | Default | Description |
 | :--- | :--- | :--- | :--- |
@@ -409,6 +572,38 @@ curl "http://localhost:8000/files/read?path=/home/user/script.py&start_line=1&en
 }
 ```
 
+For binary files like images, the response is the raw file content with the detected MIME type. Control which binary types are allowed via the `OPEN_TERMINAL_BINARY_MIME_PREFIXES` environment variable (default: `image`).
+
+#### View a File (Raw Binary)
+
+**`GET /files/view`**
+
+Serves the raw binary content of any file with the correct `Content-Type` header. Unlike `read_file`, this endpoint has no MIME type restrictions — it serves PDFs, images, videos, or any other file type. Designed for UI file previewing.
+
+| Parameter | Type | Description |
+| :--- | :--- | :--- |
+| `path` | string | Path to the file to serve. |
+
+```bash
+curl "http://localhost:8000/files/view?path=/home/user/report.pdf" \
+  -H "Authorization: Bearer <api-key>" --output report.pdf
+```
+
+#### Display a File (Agent Signaling)
+
+**`GET /files/display`**
+
+A signaling endpoint that lets AI agents request a file be shown to the user. Returns the file content with the detected `Content-Type`. When used with the native Open WebUI integration, calling this tool automatically emits a `display_file` event that opens the file in the chat's **Files** tab — no extra configuration needed.
+
+| Parameter | Type | Description |
+| :--- | :--- | :--- |
+| `path` | string | Path to the file to display. |
+
+```bash
+curl "http://localhost:8000/files/display?path=/home/user/chart.png" \
+  -H "Authorization: Bearer <api-key>"
+```
+
 #### Write a File
 
 **`POST /files/write`**
@@ -464,9 +659,9 @@ curl -X POST http://localhost:8000/files/replace \
   }'
 ```
 
-#### Search File Contents
+#### Grep Search (File Contents)
 
-**`GET /files/search`**
+**`GET /files/grep`**
 
 Search for a text pattern across files in a directory. Returns structured matches with file paths, line numbers, and matching lines. Skips binary files.
 
@@ -481,7 +676,7 @@ Search for a text pattern across files in a directory. Returns structured matche
 | `max_results` | integer | `50` | Maximum number of matches to return (1–500). |
 
 ```bash
-curl "http://localhost:8000/files/search?query=TODO&path=/home/user/project&include=*.py" \
+curl "http://localhost:8000/files/grep?query=TODO&path=/home/user/project&include=*.py" \
   -H "Authorization: Bearer <api-key>"
 ```
 
@@ -497,6 +692,101 @@ curl "http://localhost:8000/files/search?query=TODO&path=/home/user/project&incl
 }
 ```
 
+#### Glob Search (Filenames)
+
+**`GET /files/glob`**
+
+Search for files and subdirectories by name within a directory using glob patterns. Returns relative paths, type, size, and modification time.
+
+| Parameter | Type | Default | Description |
+| :--- | :--- | :--- | :--- |
+| `pattern` | string | (required) | Glob pattern to search for (e.g. `*.py`). |
+| `path` | string | `.` | Directory to search within. |
+| `exclude` | string[] | `null` | Glob patterns to exclude from results. |
+| `type` | string | `any` | Type filter: `file`, `directory`, or `any`. |
+| `max_results` | integer | `50` | Maximum number of matches to return (1–500). |
+
+```bash
+curl "http://localhost:8000/files/glob?pattern=*.py&path=/home/user/project&type=file" \
+  -H "Authorization: Bearer <api-key>"
+```
+
+```json
+{
+  "pattern": "*.py",
+  "path": "/home/user/project",
+  "matches": [
+    {"path": "app.py", "type": "file", "size": 2048, "modified": 1707955200.0},
+    {"path": "utils/helpers.py", "type": "file", "size": 512, "modified": 1707955200.0}
+  ],
+  "truncated": false
+}
+```
+
+#### Create a Directory
+
+**`POST /files/mkdir`**
+
+Creates a directory at the specified path. Parent directories are created automatically if they don't exist.
+
+**Request body:**
+
+| Field | Type | Description |
+| :--- | :--- | :--- |
+| `path` | string | Path of the directory to create. |
+
+```bash
+curl -X POST http://localhost:8000/files/mkdir \
+  -H "Authorization: Bearer <api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{"path": "/home/user/project/src"}'
+```
+
+```json
+{"path": "/home/user/project/src"}
+```
+
+#### Delete a File or Directory
+
+**`DELETE /files/delete`**
+
+Deletes a file or directory. Directories are removed recursively.
+
+| Parameter | Type | Description |
+| :--- | :--- | :--- |
+| `path` | string | Path to the file or directory to delete. |
+
+```bash
+curl -X DELETE "http://localhost:8000/files/delete?path=/home/user/old-file.txt" \
+  -H "Authorization: Bearer <api-key>"
+```
+
+```json
+{"path": "/home/user/old-file.txt", "type": "file"}
+```
+
+#### Get or Set Working Directory
+
+**`GET /files/cwd`** — Returns the server's current working directory.
+
+**`POST /files/cwd`** — Changes the server's working directory.
+
+```bash
+# Get current working directory
+curl "http://localhost:8000/files/cwd" \
+  -H "Authorization: Bearer <api-key>"
+
+# Set working directory
+curl -X POST http://localhost:8000/files/cwd \
+  -H "Authorization: Bearer <api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{"path": "/home/user/project"}'
+```
+
+```json
+{"cwd": "/home/user/project"}
+```
+
 ### File Transfer
 
 #### Upload a File
@@ -523,37 +813,8 @@ curl -X POST "http://localhost:8000/files/upload?directory=/home/user" \
   -F "file=@local_file.csv"
 ```
 
-**Via temporary upload link (no auth needed to upload):**
-```bash
-# 1. Generate an upload link
-curl -X POST "http://localhost:8000/files/upload/link?directory=/home/user" \
-  -H "Authorization: Bearer <api-key>"
-# → {"url": "http://localhost:8000/files/upload/a1b2c3d4..."}
-
-# 2. Upload to the link (no auth required)
-curl -X POST "http://localhost:8000/files/upload/a1b2c3d4..." \
-  -F "file=@local_file.csv"
-```
-
-Opening a temporary upload link in a browser shows a simple file picker form — useful for manual uploads without curl.
-
 The filename is automatically derived from the uploaded file or the URL.
 
-#### Download a File
-
-**`GET /files/download/link`**
-
-Returns a temporary download URL for a file. The link expires after 5 minutes and requires no authentication to use.
-
-```bash
-curl "http://localhost:8000/files/download/link?path=/home/user/output.csv" \
-  -H "Authorization: Bearer <api-key>"
-```
-
-```json
-{"url": "http://localhost:8000/files/download/a1b2c3d4..."}
-```
-
 ### Process Status (Background)
 
 **`GET /processes/{process_id}/status`**
diff --git a/docs/features/extensibility/plugin/functions/action.mdx b/docs/features/extensibility/plugin/functions/action.mdx
index 0f370dd3..1121c308 100644
--- a/docs/features/extensibility/plugin/functions/action.mdx
+++ b/docs/features/extensibility/plugin/functions/action.mdx
@@ -161,7 +161,7 @@ class Valves(BaseModel):
     priority: int = 0  # Lower = appears first in the button row
 ```
 
-This uses the same priority mechanism as [filter functions](/features/extensibility/plugin/functions/filter), so the behavior is consistent across the plugin system. Without a `priority` valve, actions default to `0` and their order among equal priorities is non-deterministic.
+This uses the same priority mechanism as [filter functions](/features/extensibility/plugin/functions/filter), so the behavior is consistent across the plugin system. Without a `priority` valve, actions default to `0` and their order among equal priorities is determined **alphabetically by function ID**.
 
 ## Advanced Capabilities
 
diff --git a/docs/features/extensibility/plugin/functions/filter.mdx b/docs/features/extensibility/plugin/functions/filter.mdx
index b762bb08..6ad4540e 100644
--- a/docs/features/extensibility/plugin/functions/filter.mdx
+++ b/docs/features/extensibility/plugin/functions/filter.mdx
@@ -527,7 +527,7 @@ class Filter:
 | `2` | Runs after priority 1 |
 
 :::tip Lower Priority = Earlier Execution
-Filters are sorted in **ascending** order by priority. A filter with `priority=0` runs **before** a filter with `priority=1`, which runs before `priority=2`, and so forth.
+Filters are sorted in **ascending** order by priority. A filter with `priority=0` runs **before** a filter with `priority=1`, which runs before `priority=2`, and so forth. When multiple filters share the same priority value, they are sorted **alphabetically by function ID** for deterministic ordering.
 :::
 
 ---
diff --git a/docs/features/extensibility/plugin/tools/index.mdx b/docs/features/extensibility/plugin/tools/index.mdx
index b9e1c1dd..36d94bce 100644
--- a/docs/features/extensibility/plugin/tools/index.mdx
+++ b/docs/features/extensibility/plugin/tools/index.mdx
@@ -22,6 +22,7 @@ Because there are several ways to integrate "Tools" in Open WebUI, it's importan
 | **Native MCP (HTTP)** | `Settings > Connections` | Standard MCP servers reachable via HTTP/SSE | External MCP Servers |
 | **MCP via Proxy (MCPO)** | `Settings > Connections` | Local stdio-based MCP servers (e.g., Claude Desktop tools) | [MCPO Adapter](https://github.com/open-webui/mcpo) |
 | **OpenAPI Servers** | `Settings > Connections` | Standard REST/OpenAPI web services | External Web APIs |
+| **Open Terminal** | `Settings > Integrations` | Full shell access in an isolated Docker container (always-on) | [Open Terminal](https://github.com/open-webui/open-terminal) |
 
 ### 1. Native Features (Built-in)
 These are deeply integrated into Open WebUI and generally don't require external scripts.
@@ -88,14 +89,22 @@ You can also let your LLM auto-select the right Tools using the [**AutoTool Filt
 
 Open WebUI offers two distinct ways for models to interact with tools: a standard **Default Mode** and a high-performance **Native Mode (Agentic Mode)**. Choosing the right mode depends on your model's capabilities and your performance requirements.
 
-### 🟡 Default Mode (Prompt-based)
+### 🟡 Default Mode (Prompt-based) — Legacy
+
+:::warning Legacy Mode
+Default Mode is maintained purely for **backwards compatibility** with older or smaller models that lack native function-calling support. It is considered **legacy** and should not be used when your model supports native tool calling. New deployments should use **Native Mode** exclusively.
+:::
+
 In Default Mode, Open WebUI manages tool selection by injecting a specific prompt template that guides the model to output a tool request.
 - **Compatibility**: Works with **practically any model**, including older or smaller local models that lack native function-calling support.
 - **Flexibility**: Highly customizable via prompt templates.
-- **Caveat**: Can be slower (requires extra tokens) and less reliable for complex, multi-step tool chaining.
+- **Caveats**:
+  - Can be slower (requires extra tokens) and less reliable for complex, multi-step tool chaining.
+  - **Breaks KV cache**: The injected prompt changes every turn, preventing LLM engines from reusing cached key-value pairs. This increases latency and cost for every message in the conversation.
+  - Does not support built-in system tools (memory, notes, channels, etc.).
 
-### 🟢 Native Mode (Agentic Mode / System Function Calling)
-Native Mode (also called **Agentic Mode**) leverages the model's built-in capability to handle tool definitions and return structured tool calls (JSON). This is the **recommended mode** for high-performance agentic workflows.
+### 🟢 Native Mode (Agentic Mode / System Function Calling) — Recommended
+Native Mode (also called **Agentic Mode**) leverages the model's built-in capability to handle tool definitions and return structured tool calls (JSON). This is the **recommended mode** for all models that support it — which includes the vast majority of modern models (2024+).
 
 :::warning Model Quality Matters
 **Agentic tool calling requires high-quality models to work reliably.** While small local models may technically support function calling, they often struggle with the complex reasoning required for multi-step tool usage. For best results, use frontier models like **GPT-5**, **Claude 4.5 Sonnet**, **Gemini 3 Flash**, or **MiniMax M2.5**. Small local models may produce malformed JSON or fail to follow the strict state management required for agentic behavior.
@@ -103,9 +112,11 @@ Native Mode (also called **Agentic Mode**) leverages the model's built-in capabi
 
 #### Why use Native Mode (Agentic Mode)?
 - **Speed & Efficiency**: Lower latency as it avoids bulky prompt-based tool selection.
+- **KV Cache Friendly**: Tool definitions are sent as structured parameters (not injected into the prompt), so they don't invalidate the KV cache between turns. This can significantly reduce latency and token costs.
 - **Reliability**: Higher accuracy in following tool schemas (with quality models).
 - **Multi-step Chaining**: Essential for **Agentic Research** and **Interleaved Thinking** where a model needs to call multiple tools in succession.
 - **Autonomous Decision-Making**: Models can decide when to search, which tools to use, and how to combine results.
+- **System Tools**: Only Native Mode unlocks the [built-in system tools](#built-in-system-tools-nativeagentic-mode) (memory, notes, knowledge, channels, etc.).
 
 #### How to Enable Native Mode (Agentic Mode)
 Native Mode can be enabled at two levels:
@@ -163,11 +174,14 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo
 **This is a DeepSeek model/API issue**, not an Open WebUI issue. Open WebUI correctly sends tools in standard OpenAI format — the malformed output originates from DeepSeek's non-standard internal format.
 :::
 
-| Feature | Default Mode | Native Mode |
+| Feature | Default Mode (Legacy) | Native Mode (Recommended) |
 |:---|:---|:---|
+| **Status** | Legacy / backwards compat | ✅ Recommended |
 | **Latency** | Medium/High | Low |
+| **KV Cache** | ❌ Can break cache | ✅ Cache-friendly |
 | **Model Compatibility** | Universal | Requires Tool-Calling Support |
 | **Logic** | Prompt-based (Open WebUI) | Model-native (API/Ollama) |
+| **System Tools** | ❌ Not available | ✅ Full access |
 | **Complex Chaining** | ⚠️ Limited | ✅ Excellent |
 
 ### Built-in System Tools (Native/Agentic Mode)
@@ -179,12 +193,13 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo
 | **Search & Web** | *Requires `ENABLE_WEB_SEARCH` enabled AND per-chat "Web Search" toggle enabled.* |
 | `search_web` | Search the public web for information. Best for current events, external references, or topics not covered in internal documents. |
 | `fetch_url` | Visits a URL and extracts text content via the Web Loader. |
-| **Knowledge Base** | *Requires per-model "Knowledge Base" category enabled (default: on).* |
+| **Knowledge Base** | *Requires per-model "Knowledge Base" category enabled (default: on). Which tools are injected depends on whether the model has attached knowledge — see note below.* |
 | `list_knowledge_bases` | List the user's accessible knowledge bases with file counts. **Use this first** to discover what knowledge is available. |
 | `query_knowledge_bases` | Search KB *names and descriptions* by semantic similarity. Use to find which KB is relevant when you don't know which one to query. |
 | `search_knowledge_bases` | Search knowledge bases by name/description (text match). |
 | `query_knowledge_files` | Search *file contents* inside KBs using vector search. **This is your main tool for finding information.** When a KB is attached to the model, searches are automatically scoped to that KB. |
 | `search_knowledge_files` | Search files across accessible knowledge bases by filename (not content). |
+| `view_file` | Get the full content of any user-accessible file by its ID. Only injected when the model has attached knowledge files. |
 | `view_knowledge_file` | Get the full content of a file from a knowledge base. |
 | **Image Gen** | *Requires image generation enabled (per-tool) AND per-chat "Image Generation" toggle enabled.* |
 | `generate_image` | Generates a new image based on a prompt. Requires `ENABLE_IMAGE_GENERATION`. |
@@ -229,6 +244,7 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo
 | `search_knowledge_bases` | `query` (required), `count` (default: 5), `skip` (default: 0) | Array of `{id, name, description, file_count}` |
 | `query_knowledge_files` | `query` (required), `knowledge_ids` (optional), `count` (default: 5) | Array of `{id, filename, content_snippet, knowledge_id}` |
 | `search_knowledge_files` | `query` (required), `knowledge_id` (optional), `count` (default: 5), `skip` (default: 0) | Array of `{id, filename, knowledge_id, knowledge_name}` |
+| `view_file` | `file_id` (required) | `{id, filename, content, updated_at, created_at}` |
 | `view_knowledge_file` | `file_id` (required) | `{id, filename, content}` |
 | **Image Gen** | | |
 | `generate_image` | `prompt` (required) | `{status, message, images}` — auto-displayed |
@@ -264,10 +280,52 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo
 Open WebUI automatically detects and stores your timezone when you log in. This allows time-related tools and features to provide accurate local times without any manual configuration. Your timezone is determined from your browser settings.
 :::
 
-:::info Recommended KB Tool Workflow
-**When an attached KB is returning empty results:**
-1. First call `list_knowledge_bases` to confirm the model can see the attached KB
-2. Then use `query_knowledge_files` (without specifying `knowledge_ids` — it auto-scopes to attached KBs)
+:::warning Knowledge Tools Change Based on Attached Knowledge
+The set of knowledge tools injected into a model **changes depending on whether the model has knowledge attached** (via the Model Editor). These are **two mutually exclusive sets** — the model gets one or the other, never both.
+
+**Model with attached knowledge** (files, collections, or notes):
+
+| Tool | When Available |
+|------|---------------|
+| `query_knowledge_files` | Always (auto-scoped to attached KBs) |
+| `view_file` | When attached knowledge includes files or collections |
+
+The model **does not** get the browsing tools (`list_knowledge_bases`, `search_knowledge_bases`, etc.) because it doesn't need to discover KBs — the search is automatically scoped to its attachments.
+
+**Model without attached knowledge** (general-purpose):
+
+| Tool | Purpose |
+|------|---------|
+| `list_knowledge_bases` | Discover available KBs |
+| `search_knowledge_bases` | Search KBs by name/description (text match) |
+| `query_knowledge_bases` | Search KBs by name/description (semantic similarity) |
+| `search_knowledge_files` | Search files by filename |
+| `query_knowledge_files` | Search file contents (vector search) |
+| `view_knowledge_file` | Read a full file from a KB |
+
+This model has the full browsing set to autonomously discover and explore any KB the user has access to.
+:::
+
+:::caution Attached Knowledge Still Requires User Access
+Attaching a knowledge base to a custom model does **not** bypass access control. When a user chats with the model, `query_knowledge_files` checks whether **that specific user** has permission to access each attached knowledge item. Items the user cannot access are silently excluded from search results.
+
+**Access requirements by knowledge type:**
+
+| Attached Type | User Needs |
+|---------------|-----------|
+| **Knowledge Base** (collection) | Owner, admin, or explicit read access grant |
+| **Individual File** | Owner or admin only (no access grants) |
+| **Note** | Owner, admin, or explicit read access grant |
+
+**Example scenario**: An admin creates a private knowledge base and attaches it to a custom model shared with all users. Regular users chatting with this model will get **empty results** from `query_knowledge_files` because they don't have read access to the KB itself — even though they can use the model.
+
+**Solution**: Make sure users who need access to the model's knowledge also have read access to the underlying knowledge base (via access grants or group permissions in the Knowledge settings).
+:::
+
+:::info Recommended KB Tool Workflow (No Attached Knowledge)
+When using a model **without** attached knowledge:
+1. First call `list_knowledge_bases` to discover what knowledge is available
+2. Then use `query_knowledge_files` to search file contents within relevant KBs
 3. If still empty, the files may not be embedded yet, or you may have **Full Context mode enabled** which bypasses the vector store
 
 **Do NOT use Full Context mode with knowledge tools** — Full Context injects file content directly and doesn't store embeddings, so `query_knowledge_files` will return empty. Use Focused Retrieval (default) for tool-based access.
diff --git a/docs/getting-started/advanced-topics/scaling.md b/docs/getting-started/advanced-topics/scaling.md
index ce2fba4d..d1c44338 100644
--- a/docs/getting-started/advanced-topics/scaling.md
+++ b/docs/getting-started/advanced-topics/scaling.md
@@ -75,6 +75,7 @@ ENABLE_WEBSOCKET_SUPPORT=true
 - If you're using Redis Sentinel for high availability, also set `REDIS_SENTINEL_HOSTS` and consider setting `REDIS_SOCKET_CONNECT_TIMEOUT=5` to prevent hangs during failover.
 - For AWS Elasticache or other managed Redis Cluster services, set `REDIS_CLUSTER=true`.
 - Make sure your Redis server has `timeout 1800` and a high enough `maxclients` (10000+) to prevent connection exhaustion over time.
+- A **single Redis instance** is sufficient for the vast majority of deployments, even with thousands of users. You almost certainly do not need Redis Cluster unless you have specific HA/bandwidth requirements. If you think you need Redis Cluster, first check whether your connection count and memory usage are caused by fixable configuration issues (see [Common Anti-Patterns](/troubleshooting/performance#%EF%B8%8F-common-anti-patterns)).
 - Without Redis in a multi-instance setup, you will experience [WebSocket 403 errors](/troubleshooting/multi-replica#2-websocket-403-errors--connection-failures), [configuration sync issues](/troubleshooting/multi-replica#3-model-not-found-or-configuration-mismatch), and intermittent authentication failures.
 
 For a complete step-by-step Redis setup (Docker Compose, Sentinel, Cluster mode, verification), see the [Redis WebSocket Support](/tutorials/integrations/redis) tutorial. For WebSocket and CORS issues behind reverse proxies, see [Connection Errors](/troubleshooting/connection-error#-https-tls-cors--websocket-issues).
@@ -232,7 +233,40 @@ Each provider has its own set of environment variables for credentials and bucke
 
 ---
 
-## Step 6 — Add Observability
+## Step 6 — Fix Content Extraction & Embeddings
+
+**When:** You process documents regularly (RAG, knowledge bases) and are running in production.
+
+:::danger These Defaults Cause Memory Leaks at Scale
+The default content extraction engine (pypdf) and default embedding engine (SentenceTransformers) are the **two most common causes of memory leaks** in production Open WebUI deployments. Fixing these is just as important as switching to PostgreSQL or adding Redis.
+:::
+
+**What to do:**
+
+1. **Switch the content extraction engine** to an external service:
+
+```
+CONTENT_EXTRACTION_ENGINE=tika
+TIKA_SERVER_URL=http://tika:9998
+```
+
+2. **Switch the embedding engine** to an external provider:
+
+```
+RAG_EMBEDDING_ENGINE=openai
+# or for self-hosted:
+RAG_EMBEDDING_ENGINE=ollama
+```
+
+**Key things to know:**
+
+- The default content extractor (pypdf) has unavoidable **known memory leaks** that cause your Open WebUI process to grow in memory continuously. An external extractor (Tika, Docling) runs in its own process/container, isolating these leaks.
+- The default SentenceTransformers embedding model loads ~500MB per worker process. With 8 workers, that's 4GB of RAM just for embeddings. External embedding eliminates this.
+- For detailed guidance and configuration options, see [Content Extraction Engine](/troubleshooting/performance#content-extraction-engine) and [Embedding Engine](/troubleshooting/performance#embedding-engine) in the Performance guide.
+
+---
+
+## Step 7 — Add Observability
 
 **When:** You want to monitor performance, troubleshoot issues, and understand how your deployment is behaving at scale.
 
@@ -300,6 +334,14 @@ ENABLE_WEBSOCKET_SUPPORT=true
 # S3_BUCKET_NAME=my-openwebui-bucket
 # S3_REGION_NAME=us-east-1
 
+# Content Extraction (do NOT use default pypdf in production)
+CONTENT_EXTRACTION_ENGINE=tika
+TIKA_SERVER_URL=http://tika:9998
+
+# Embeddings (do NOT use default SentenceTransformers at scale)
+RAG_EMBEDDING_ENGINE=openai
+# or: RAG_EMBEDDING_ENGINE=ollama
+
 # Workers (let orchestrator scale, keep workers at 1)
 UVICORN_WORKERS=1
 
@@ -311,13 +353,13 @@ ENABLE_DB_MIGRATIONS=false
 
 ## Quick Reference: When Do I Need What?
 
-| Scenario | PostgreSQL | Redis | External Vector DB | Shared Storage |
-|---|:---:|:---:|:---:|:---:|
-| Single user / evaluation | ✗ | ✗ | ✗ | ✗ |
-| Small team (< 50 users, single instance) | Recommended | ✗ | ✗ | ✗ |
-| Multiple Uvicorn workers | **Required** | **Required** | **Required** | ✗ (same filesystem) |
-| Multiple instances / HA | **Required** | **Required** | **Required** | **Optional** (NFS or S3) |
-| Large scale (1000+ users) | **Required** | **Required** | **Required** | **Optional** (NFS or S3) |
+| Scenario | PostgreSQL | Redis | External Vector DB | Ext. Content Extraction | Ext. Embeddings | Shared Storage |
+|---|:---:|:---:|:---:|:---:|:---:|:---:|
+| Single user / evaluation | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
+| Small team (< 50 users, single instance) | Recommended | ✗ | ✗ | Recommended | ✗ | ✗ |
+| Multiple Uvicorn workers | **Required** | **Required** | **Required** | **Strongly Recommended** | **Strongly Recommended** | ✗ (same filesystem) |
+| Multiple instances / HA | **Required** | **Required** | **Required** | **Strongly Recommended** | **Strongly Recommended** | **Optional** (NFS or S3) |
+| Large scale (1000+ users) | **Required** | **Required** | **Required** | **Strongly Recommended** | **Strongly Recommended** | **Optional** (NFS or S3) |
 
 :::note About "External Vector DB"
 The default ChromaDB uses a local SQLite backend that crashes under multi-process access. "External Vector DB" means either a client-server database (PGVector, Milvus, Qdrant, Pinecone) or ChromaDB running as a separate HTTP server. See [Step 4](#step-4--switch-to-an-external-vector-database) for details.
diff --git a/docs/getting-started/quick-start/tab-docker/ManualDocker.md b/docs/getting-started/quick-start/tab-docker/ManualDocker.md
index e0d469d1..5aeb811c 100644
--- a/docs/getting-started/quick-start/tab-docker/ManualDocker.md
+++ b/docs/getting-started/quick-start/tab-docker/ManualDocker.md
@@ -23,7 +23,7 @@ docker pull ghcr.io/open-webui/open-webui:main-slim
 You can also pull a specific Open WebUI release version directly by using a versioned image tag. This is recommended for production environments to ensure stable and reproducible deployments.
 
 ```bash
-docker pull ghcr.io/open-webui/open-webui:v0.8.5
+docker pull ghcr.io/open-webui/open-webui:v0.8.6
 ```
 
 ## Step 2: Run the Container
diff --git a/docs/getting-started/updating.mdx b/docs/getting-started/updating.mdx
index 6c4958f7..338bf32a 100644
--- a/docs/getting-started/updating.mdx
+++ b/docs/getting-started/updating.mdx
@@ -76,9 +76,9 @@ Without a persistent `WEBUI_SECRET_KEY`, a new key is generated each time the co
 By default the `:main` tag always points to the latest build. For production, pin a specific release:
 
 ```
-ghcr.io/open-webui/open-webui:v0.8.5
-ghcr.io/open-webui/open-webui:v0.8.5-cuda
-ghcr.io/open-webui/open-webui:v0.8.5-ollama
+ghcr.io/open-webui/open-webui:v0.8.6
+ghcr.io/open-webui/open-webui:v0.8.6-cuda
+ghcr.io/open-webui/open-webui:v0.8.6-ollama
 ```
 
 ### Rolling Back
diff --git a/docs/intro.mdx b/docs/intro.mdx
index d158f597..8b440c3b 100644
--- a/docs/intro.mdx
+++ b/docs/intro.mdx
@@ -108,9 +108,9 @@ openwebui/open-webui:<RELEASE_VERSION>-<TYPE>
 
 Examples (pinned versions for illustration purposes only):
 ```
-ghcr.io/open-webui/open-webui:v0.8.5
-ghcr.io/open-webui/open-webui:v0.8.5-ollama
-ghcr.io/open-webui/open-webui:v0.8.5-cuda
+ghcr.io/open-webui/open-webui:v0.8.6
+ghcr.io/open-webui/open-webui:v0.8.6-ollama
+ghcr.io/open-webui/open-webui:v0.8.6-cuda
 ```
 
 ### Using the Dev Branch 🌙
diff --git a/docs/reference/env-configuration.mdx b/docs/reference/env-configuration.mdx
index dbdb39bf..95c70858 100644
--- a/docs/reference/env-configuration.mdx
+++ b/docs/reference/env-configuration.mdx
@@ -217,6 +217,13 @@ is also being used and set to `True`. **Never disable this if OAUTH/SSO is not b
 - Description: Sets the default group ID to assign to new users upon registration.
 - Persistence: This environment variable is a `PersistentConfig` variable.
 
+#### `DEFAULT_GROUP_SHARE_PERMISSION`
+
+- Type: `str`
+- Options: `members`, `true`, `false`
+- Default: `members`
+- Description: Controls the default "Who can share to this group" setting for newly created groups. `members` means only group members can share to the group, `true` means anyone can share, and `false` means no one can share to the group. This applies both to groups created manually and groups created automatically (e.g. via SCIM or OAuth group sync). Existing groups are not affected — this only sets the initial default for new groups.
+
 #### `PENDING_USER_OVERLAY_TITLE`
 
 - Type: `str`
@@ -1083,6 +1090,36 @@ The JSON data structure of `TOOL_SERVER_CONNECTIONS` might evolve over time as n
 
 :::
 
+### Terminal Server
+
+#### `TERMINAL_SERVER_CONNECTIONS`
+
+- Type: `str` (JSON array)
+- Default: `[]`
+- Description: Specifies a JSON array of Open Terminal server connection configurations. Each connection defines the parameters needed to connect to an [Open Terminal](/features/extensibility/open-terminal) instance. Unlike user-level tool server connections, these are admin-configured and proxied through Open WebUI, which means the terminal URL and API key are never exposed to the browser. Supports group-based access control via `access_grants`.
+- Example:
+```json
+[
+  {
+    "id": "unique-id",
+    "url": "http://open-terminal:8000",
+    "key": "your-api-key",
+    "name": "Dev Terminal",
+    "auth_type": "bearer",
+    "config": {
+      "access_grants": []
+    }
+  }
+]
+```
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+:::warning
+
+The JSON data structure of `TERMINAL_SERVER_CONNECTIONS` might evolve over time as new features are added.
+
+:::
+
 ### Autocomplete
 
 #### `ENABLE_AUTOCOMPLETE_GENERATION`
@@ -5215,6 +5252,12 @@ This is useful when you need a JWT access token for downstream validation or whe
 - Description: Enables or disables user permission to upload files to chats.
 - Persistence: This environment variable is a `PersistentConfig` variable.
 
+#### `USER_PERMISSIONS_CHAT_WEB_UPLOAD`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables or disables user permission to attach web pages (URLs) in chats via the "Attach Webpage" option. When set to `False`, the "Attach Webpage" button is hidden for non-admin users. Also configurable per-group in **Admin Panel → Users → Groups → Permissions → Chat Permissions → Allow Web Upload**.
+
 #### `USER_PERMISSIONS_CHAT_DELETE`
 
 - Type: `bool`
@@ -5499,6 +5542,14 @@ These settings control whether users can share workspace items **publicly**.
 - Description: Enables or disables **public sharing** of notes.
 - Persistence: This environment variable is a `PersistentConfig` variable.
 
+### Access Grants
+
+#### `USER_PERMISSIONS_ACCESS_GRANTS_ALLOW_USERS`
+
+- Type: `bool`
+- Default: `True`
+- Description: Controls whether non-admin users can share resources (knowledge bases, models, prompts, notes, skills, and tools) with **specific individual users**. When set to `False`, individual user grants are silently stripped from access grant lists — group sharing and public sharing remain unaffected. Admins always retain the ability to share with individual users regardless of this setting. Also configurable per-group in **Admin Panel → Users → Groups → Permissions → Access Grants → Allow Sharing With Users**.
+
 ### Import / Export
 
 #### `USER_PERMISSIONS_WORKSPACE_MODELS_IMPORT`
diff --git a/docs/troubleshooting/connection-error.mdx b/docs/troubleshooting/connection-error.mdx
index 83d73b15..0b352d2f 100644
--- a/docs/troubleshooting/connection-error.mdx
+++ b/docs/troubleshooting/connection-error.mdx
@@ -163,6 +163,35 @@ Disabling proxy buffering also **significantly improves streaming speed**, as re
 - **Traefik**: Check compression/buffering middleware settings
 - **Caddy**: Generally handles SSE correctly by default, but check for any buffering plugins
 
+## 🌐 Frontend vs. Backend Connections (localhost Confusion)
+
+Several Open WebUI features offer **two ways** to configure connections: a **user/direct** method (from the browser) and an **admin/global** method (from the backend). These work at completely different network levels, and the same URL can succeed in one and fail in the other.
+
+### The Core Rule
+
+| Request Origin | What `localhost` Means | Who Uses This |
+| :--- | :--- | :--- |
+| **Browser (client-side)** | The machine running the browser | User Tool Servers, User-Configured Terminals, Direct Connections |
+| **Backend (server-side)** | The machine/container running Open WebUI | Global Tool Servers, Admin-Configured Terminals, Ollama connections |
+
+### Why the Same URL Can Work and Fail
+
+When you add a URL like `https://myserver.com/api` as a **user/direct** connection, your browser resolves `myserver.com` and connects directly. When you add the same URL as an **admin/global** connection, the Open WebUI backend resolves that hostname — and inside a Docker container, it may resolve to `127.0.0.1`, bypassing your reverse proxy entirely.
+
+**Common symptoms:**
+- 502 Bad Gateway on admin-configured connections while user connections work fine
+- `Connect call failed ('127.0.0.1', ...)` in backend logs
+- Connection timeout on global tool servers
+
+**Fix:** For backend/admin connections, use the **internal URL** that the backend can actually reach:
+- Docker service names (e.g. `http://open-terminal:8000`)
+- `host.docker.internal` (to reach the host machine from inside Docker)
+- Internal IPs (e.g. `http://192.168.1.50:8000`)
+
+:::tip
+This applies to **all** backend-proxied connections in Open WebUI — not just Open Terminal. The same pattern affects [Tool Server connections](/features/extensibility/plugin/tools/openapi-servers/open-webui#main-difference-where-are-requests-made-from), [Open Terminal admin connections](/features/extensibility/open-terminal#networking--connectivity), and Ollama/OpenAI API endpoints.
+:::
+
 ## 🌟 Connection to Ollama Server
 
 ### 🚀 Accessing Ollama from Open WebUI
diff --git a/docs/troubleshooting/performance.md b/docs/troubleshooting/performance.md
index 62eecd00..5b7cc77b 100644
--- a/docs/troubleshooting/performance.md
+++ b/docs/troubleshooting/performance.md
@@ -134,6 +134,40 @@ For multi-user setups, the choice of Vector DB matters.
     *   `ENABLE_MILVUS_MULTITENANCY_MODE=True`
     *   `ENABLE_QDRANT_MULTITENANCY_MODE=True`
 
+### Content Extraction Engine
+
+:::danger Default Content Extractor Causes Memory Leaks
+The **default content extraction engine** uses Python libraries including **pypdf**, which are known to have **persistent memory leaks** during document ingestion. In production deployments with regular document uploads, this will cause Open WebUI's memory usage to grow continuously until the process is killed or the container is restarted.
+
+This is the **#1 cause of unexplained memory growth** in production deployments.
+:::
+
+**Recommendation**: Switch to an external content extraction engine for any deployment that processes documents regularly:
+
+| Engine | Best For | Configuration |
+|---|---|---|
+| **Apache Tika** | General-purpose, widely used, handles most document types | `CONTENT_EXTRACTION_ENGINE=tika` + `TIKA_SERVER_URL=http://tika:9998` |
+| **Docling** | High-quality extraction with layout-aware parsing | `CONTENT_EXTRACTION_ENGINE=docling` |
+| **External Loader** | Recommended for production and custom extraction pipelines | `CONTENT_EXTRACTION_ENGINE=external` + `EXTERNAL_DOCUMENT_LOADER_URL=...` |
+
+Using an external extractor moves the memory-intensive parsing out of the Open WebUI process entirely, eliminating this class of memory leaks.
+
+### Embedding Engine
+
+:::warning SentenceTransformers at Scale
+The **default SentenceTransformers** embedding engine (all-MiniLM-L6-v2) loads a machine learning model into the Open WebUI process memory. While lightweight enough for personal use, at scale this model:
+
+- **Consumes significant RAM** (~500MB+ per worker process)
+- **Blocks the event loop** during embedding operations on older versions
+- **Multiplies with workers** — each Uvicorn worker loads its own copy of the model
+
+For multi-user or production deployments, **offload embeddings to an external service**.
+:::
+
+-   **Recommended**: Use `RAG_EMBEDDING_ENGINE=openai` (for cloud embeddings via OpenAI, Azure, or compatible APIs) or `RAG_EMBEDDING_ENGINE=ollama` (for self-hosted embedding via Ollama with models like `nomic-embed-text`).
+-   **Env Var**: `RAG_EMBEDDING_ENGINE=openai`
+-   **Effect**: The embedding model is no longer loaded into the Open WebUI process, freeing hundreds of MB of RAM per worker.
+
 ### Optimizing Document Chunking
 
 The way your documents are chunked directly impacts both storage efficiency and retrieval quality.
@@ -359,12 +393,43 @@ If resource usage is critical, disable automated features that constantly trigge
 *Target: Many concurrent users, Stability > Persistence.*
 
 1.  **Database**: **PostgreSQL** (Mandatory).
-2.  **Workers**: `THREAD_POOL_SIZE=2000` (Prevent timeouts).
-3.  **Streaming**: `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE=7` (Reduce CPU/Net/DB writes).
-4.  **Chat Saving**: `ENABLE_REALTIME_CHAT_SAVE=False`.
-5.  **Vector DB**: **Milvus**, **Qdrant**, or **PGVector**. **Do not use ChromaDB's default local mode** — its SQLite backend will crash under multi-worker/multi-replica access.
-6.  **Task Model**: External/Hosted (Offload compute).
-7.  **Caching**: `ENABLE_BASE_MODELS_CACHE=True`, `MODELS_CACHE_TTL=300`, `ENABLE_QUERIES_CACHE=True`.
+2.  **Content Extraction**: **Tika** or **Docling** (Mandatory — default pypdf leaks memory). See [Content Extraction Engine](#content-extraction-engine).
+3.  **Embeddings**: **External** — `RAG_EMBEDDING_ENGINE=openai` or `ollama` (Mandatory — default SentenceTransformers consumes too much RAM at scale). See [Embedding Engine](#embedding-engine).
+4.  **Tool Calling**: **Native Mode** (strongly recommended — Default Mode is legacy and breaks KV cache). See [Tool Calling Modes](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native).
+5.  **Workers**: `THREAD_POOL_SIZE=2000` (Prevent timeouts).
+6.  **Streaming**: `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE=7` (Reduce CPU/Net/DB writes).
+7.  **Chat Saving**: `ENABLE_REALTIME_CHAT_SAVE=False`.
+8.  **Vector DB**: **Milvus**, **Qdrant**, or **PGVector**. **Do not use ChromaDB's default local mode** — its SQLite backend will crash under multi-worker/multi-replica access.
+9.  **Task Model**: External/Hosted (Offload compute).
+10. **Caching**: `ENABLE_BASE_MODELS_CACHE=True`, `MODELS_CACHE_TTL=300`, `ENABLE_QUERIES_CACHE=True`.
+11. **Redis**: Single instance with `timeout 1800` and high `maxclients` (10000+). See [Redis Tuning](#redis-tuning) below.
+
+#### Redis Tuning
+
+A single Redis instance is sufficient for the vast majority of deployments, including those with thousands of users. **You almost certainly do not need Redis Cluster or Redis Sentinel** unless you have specific HA requirements.
+
+Common Redis configuration issues that cause unnecessary scaling:
+
+| Issue | Symptom | Fix |
+|---|---|---|
+| **Stale connections** | Redis runs out of connections or memory grows indefinitely | Set `timeout 1800` in redis.conf (kills idle connections after 30 minutes) |
+| **Low maxclients** | `max number of clients reached` errors | Set `maxclients 10000` or higher |
+| **No connection limits** | Open WebUI pods may accumulate connections that never close | Combine `timeout` with connection pool limits in your Redis client config |
+
+---
+
+## ⚠️ Common Anti-Patterns
+
+These are real-world mistakes that cause organizations to massively over-provision infrastructure:
+
+| Anti-Pattern | What Happens | Fix |
+|---|---|---|
+| **Using default content extractor in production** | pypdf leaks memory → containers restart constantly → you add more replicas to compensate | Switch to Tika or Docling (`CONTENT_EXTRACTION_ENGINE=tika`) |
+| **Running SentenceTransformers at scale** | Each worker loads ~500MB embedding model → RAM usage explodes → you add more machines | Use external embeddings (`RAG_EMBEDDING_ENGINE=openai` or `ollama`) |
+| **Redis Cluster when single Redis suffices** | Too many replicas → too many connections → Redis can't handle them → you deploy Redis Cluster to compensate | Fix the root cause (fewer replicas, `timeout 1800`, `maxclients 10000`) |
+| **Scaling replicas to mask memory leaks** | Leaky processes → OOM kills → auto-scaler adds more pods → more Redis connections → Redis overwhelmed | Fix the leaks first (content extraction, embedding engine), then right-size |
+| **Using Default (prompt-based) tool calling** | Injected prompts may break KV cache → higher latency → more resources needed per request | Switch to Native Mode for all capable models |
+| **Not configuring Redis stale connection timeout** | Connections accumulate forever → Redis OOM → you deploy Redis Cluster | Add `timeout 1800` to redis.conf |
 
 ---
 
@@ -384,6 +449,7 @@ For detailed information on all available variables, see the [Environment Config
 | `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE` | [Streaming Chunk Size](/reference/env-configuration#chat_response_stream_delta_chunk_size) |
 | `THREAD_POOL_SIZE` | [Thread Pool Size](/reference/env-configuration#thread_pool_size) |
 | `RAG_EMBEDDING_ENGINE` | [Embedding Engine](/reference/env-configuration#rag_embedding_engine) |
+| `CONTENT_EXTRACTION_ENGINE` | [Content Extraction Engine](/reference/env-configuration#content_extraction_engine) |
 | `AUDIO_STT_ENGINE` | [STT Engine](/reference/env-configuration#audio_stt_engine) |
 | `ENABLE_IMAGE_GENERATION` | [Image Generation](/reference/env-configuration#enable_image_generation) |
 | `ENABLE_AUTOCOMPLETE_GENERATION` | [Autocomplete](/reference/env-configuration#enable_autocomplete_generation) |
diff --git a/docs/tutorials/tips/sqlite-database.md b/docs/tutorials/tips/sqlite-database.md
index 4f221394..bdf1a29c 100644
--- a/docs/tutorials/tips/sqlite-database.md
+++ b/docs/tutorials/tips/sqlite-database.md
@@ -10,7 +10,7 @@ This tutorial is a community contribution and is not supported by the Open WebUI
 :::
 
 > [!WARNING]
-> This documentation was created/updated based on version 0.8.5 and updated for recent migrations.
+> This documentation was created/updated based on version 0.8.6 and updated for recent migrations.
 
 ## Open-WebUI Internal SQLite Database