From a82cdffc83d1397142a3c695d8033538cb5a364a Mon Sep 17 00:00:00 2001 From: Riskey Date: Fri, 26 Dec 2025 11:48:58 +0800 Subject: [PATCH] Add more notes about the extracted image url --- .../create-knowledge/import-text-data/readme.mdx | 16 ++++++++-------- .../knowledge-pipeline-orchestration.mdx | 6 ++++-- .../maintain-knowledge-documents.mdx | 6 +++--- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/en/use-dify/knowledge/create-knowledge/import-text-data/readme.mdx b/en/use-dify/knowledge/create-knowledge/import-text-data/readme.mdx index fc0136ec..061ef34a 100644 --- a/en/use-dify/knowledge/create-knowledge/import-text-data/readme.mdx +++ b/en/use-dify/knowledge/create-knowledge/import-text-data/readme.mdx @@ -30,6 +30,10 @@ When quick-creating a knowledge base, you can upload local files as its data sou JPG, JPEG, PNG, and GIF images under 2 MB are automatically extracted as attachments to their corresponding chunks. These images can be managed independently and are returned alongside their chunks during retrieval. + URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won't affect the extracted images. + + If you select a multimodal embedding model (marked with a **Vision** icon) in index settings, the extracted images will also be embedded and indexed for retrieval. + Each chunk supports up to 10 image attachments; images beyond this limit will not be extracted. @@ -44,15 +48,11 @@ When quick-creating a knowledge base, you can upload local files as its data sou - Images embedded in DOCX files - - Images embedded in other file types (e.g., PDF) can only be extracted by using appropriate document extraction plugins in [knowledge pipelines](/en/use-dify/knowledge/knowledge-pipeline/readme). - + + Images embedded in other file types (e.g., PDF) can be extracted by using appropriate document extraction plugins in [knowledge pipelines](/en/use-dify/knowledge/knowledge-pipeline/readme). + - Images referenced via accessible URLs using the following Markdown syntax in any file type: - `![alt text](image_url)` - - `![alt text](image_url "optional title")` - - - If you select a multimodal embedding model (marked with a **Vision** icon) in subsequent index settings, the extracted images will be embedded and indexed for retrieval. - \ No newline at end of file + - `![alt text](image_url "optional title")` \ No newline at end of file diff --git a/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration.mdx b/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration.mdx index 56973713..6ca5619d 100644 --- a/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration.mdx +++ b/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration.mdx @@ -196,7 +196,9 @@ You can choose Dify's Doc Extractor to process files, or select tools based on y -Images in documents can be extracted using appropriate doc processors. Extracted images are attached to their corresponding chunks, can be managed independently, and are returned alongside those chunks during retrieval. +Images in documents can be extracted using appropriate document processors. Extracted images are attached to their corresponding chunks, can be managed independently, and are returned alongside those chunks during retrieval. + +URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won't affect the extracted images. Each chunk supports up to 10 image attachments; images beyond this limit will not be extracted. @@ -213,7 +215,7 @@ If no images are extracted by the selected processor, Dify will automatically ex - Maximum number of attachments per chunk: `SINGLE_CHUNK_ATTACHMENT_LIMIT` -If you select a multimodal embedding model (marked with a **Vision** icon) in subsequent index settings, the extracted images will be embedded and indexed for retrieval. +If you select a multimodal embedding model (marked with a **Vision** icon) in index settings, the extracted images will also be embedded and indexed for retrieval. diff --git a/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents.mdx b/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents.mdx index 470f9cd1..c959dc4b 100644 --- a/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents.mdx +++ b/en/use-dify/knowledge/manage-knowledge/maintain-knowledge-documents.mdx @@ -32,7 +32,7 @@ According to its chunk settings, every document is split into content chunks—t From the chunk list within a document, you can view and manage all its chunks to improve the retrieval efficiency and accuracy. - Click the document name in the upper—left corner to quickly switch between documents. + Click the document name in the upper-left corner to quickly switch between documents. ![Manage Knowledge Chunks](/images/manage_document_chunks.png) @@ -42,9 +42,9 @@ From the chunk list within a document, you can view and manage all its chunks to | Add | Add one or batch add multiple new chunks.

For documents chunked with Parent-child mode, both new parent and child chunks can be added. *Add chunks* is a paid feature on Dify Cloud. [Upgrade to Professional or Team](https://dify.ai/pricing) to use it.| | Delete | Permanently remove a chunk. **Deletion cannot be undone**.| | Enable / Disable | Temporarily include or exclude a chunk from retrieval. Disabled chunks cannot be edited.| -| Edit | Modify the content of a chunk. Edited chunks are marked **Edited**.

For documents chunked with Parent-child mode: When images in documents are extracted as chunk attachments, their URLs remain in the chunk text. Deleting these URLs won't affect the extracted image attachments.| +| Edit | Modify the content of a chunk. Edited chunks are marked **Edited**.

For documents chunked with Parent-child mode: | | Add / Edit / Delete Keywords | In knowledge bases using the Economical index method, you can add or modify keywords for each chunk to improve its retrievability.

Each chunk can have up to 10 keywords.| -| Add / Delete Image Attachments | Delete images extracted from documents or upload new ones within their corresponding chunk.

Image attachments and their chunks can be edited independently without affecting each other. Each chunk can have up to 10 image attachments, which are returned alongside it during retrieval; images beyond this limit will not be extracted.

For self-hosted deployments, you can adjust this limit via the environment variable `SINGLE_CHUNK_ATTACHMENT_LIMIT`.
To enable cross-modal retrieval—retrieving both text and images based on semantic relevance, choose a multimodal embedding model (marked with a **Vision** icon) for the knowledge base.

Image attachments will then be embedded and indexed for retrieval.
| +| Add / Delete Image Attachments | Delete images extracted from documents or upload new ones within their corresponding chunk.

URLs of extracted images remain in the chunk text, but you can safely remove these URLs to keep the text clean—this won't affect the extracted images. Each chunk can have up to 10 image attachments, which are returned alongside it during retrieval; images beyond this limit will not be extracted.

For self-hosted deployments, you can adjust this limit via the environment variable `SINGLE_CHUNK_ATTACHMENT_LIMIT`.
If you select a multimodal embedding model (marked with a **Vision** icon), the extracted images will also be embedded and indexed for retrieval.| ## Best Practices