mirror of
https://github.com/langgenius/dify-docs.git
synced 2026-03-27 13:28:32 +07:00
* draft * migrate from old docs repo * adjust content based on experiencing the feature * Improvements * changes upon feedback * refinements * zh draft * add plugin dev docs * update old links * add jp docs * change the position of variables related to multimodal embedding in the environment variable doc --------- Co-authored-by: Riskey <riskey47@dify.ai>
276 lines
9.5 KiB
Plaintext
276 lines
9.5 KiB
Plaintext
---
|
|
title: 构建在知识流水线中处理多模态数据的工具插件
|
|
---
|
|
|
|
<Note> ⚠️ 本文档由 AI 自动翻译。如有任何不准确之处,请参考[英文原版](/en/develop-plugin/dev-guides-and-walkthroughs/develop-multimodal-data-processing-tool)。</Note>
|
|
|
|
在知识流水线中,知识库节点支持两种多模态数据格式的入参:`multimodal-Parent-Child` 和 `multimodal-General`。
|
|
|
|
开发用于多模态数据处理的工具插件时,若希望插件输出的多模态数据(如文字、图片、音视频等)能够被知识库节点正确识别并向量化,需要完成以下配置:
|
|
|
|
- **在工具代码中**,调用接口上传并构造文件对象 `files`。
|
|
|
|
- **在工具提供者 YAML 文件中**,将 `output_schema` 声明为 `multimodal-Parent-Child` 或 `multimodal-General`。
|
|
|
|
## 上传并构造文件对象
|
|
|
|
在处理多模态数据(如图片)时,需要先通过 Dify 的工具会话接口上传文件,以获取文件元数据。
|
|
|
|
下面以 Dify 官方插件 Dify Extractor 为例,展示如何上传文件并构造文件对象。
|
|
|
|
```python
|
|
|
|
# Upload the file using the tool session
|
|
file_res = self._tool.session.file.upload(
|
|
file_name, # filename
|
|
file_blob, # file binary data
|
|
mime_type, # MIME type, e.g., "image/png"
|
|
)
|
|
|
|
# Generate a Markdown image reference using the file preview URL
|
|
image_url = f""
|
|
```
|
|
|
|
上传接口会返回一个 `UploadFileResponse` 对象,包含文件的基本信息:
|
|
|
|
```python
|
|
from enum import Enum
|
|
from pydantic import BaseModel
|
|
|
|
class UploadFileResponse(BaseModel):
|
|
class Type(str, Enum):
|
|
DOCUMENT = "document"
|
|
IMAGE = "image"
|
|
VIDEO = "video"
|
|
AUDIO = "audio"
|
|
|
|
@classmethod
|
|
def from_mime_type(cls, mime_type: str):
|
|
if mime_type.startswith("image/"):
|
|
return cls.IMAGE
|
|
if mime_type.startswith("video/"):
|
|
return cls.VIDEO
|
|
if mime_type.startswith("audio/"):
|
|
return cls.AUDIO
|
|
return cls.DOCUMENT
|
|
id: str
|
|
name: str
|
|
size: int
|
|
extension: str
|
|
mime_type: str
|
|
type: Type | None = None
|
|
preview_url: str | None = None
|
|
```
|
|
|
|
根据其结构,可将文件信息(如 `name`, `size`, `extension`, `mime_type` 等)映射到多模态输出结构中的 `files` 字段。
|
|
|
|
<CodeGroup>
|
|
```yaml multimodal_parent_child_structure highlight={22-62} expandable
|
|
{
|
|
"$id": "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json",
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"version": "1.0.0",
|
|
"type": "object",
|
|
"title": "Multimodal Parent-Child Structure",
|
|
"description": "Schema for multimodal parent-child structure (v1)",
|
|
"properties": {
|
|
"parent_mode": {
|
|
"type": "string",
|
|
"description": "The mode of parent-child relationship"
|
|
},
|
|
"parent_child_chunks": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "object",
|
|
"properties": {
|
|
"parent_content": {
|
|
"type": "string",
|
|
"description": "The parent content"
|
|
},
|
|
"files": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "object",
|
|
"properties": {
|
|
"name": {
|
|
"type": "string",
|
|
"description": "file name"
|
|
},
|
|
"size": {
|
|
"type": "number",
|
|
"description": "file size"
|
|
},
|
|
"extension": {
|
|
"type": "string",
|
|
"description": "file extension"
|
|
},
|
|
"type": {
|
|
"type": "string",
|
|
"description": "file type"
|
|
},
|
|
"mime_type": {
|
|
"type": "string",
|
|
"description": "file mime type"
|
|
},
|
|
"transfer_method": {
|
|
"type": "string",
|
|
"description": "file transfer method"
|
|
},
|
|
"url": {
|
|
"type": "string",
|
|
"description": "file url"
|
|
},
|
|
"related_id": {
|
|
"type": "string",
|
|
"description": "file related id"
|
|
}
|
|
},
|
|
"required": ["name", "size", "extension", "type", "mime_type", "transfer_method", "url", "related_id"]
|
|
},
|
|
"description": "List of files"
|
|
},
|
|
"child_contents": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "string"
|
|
},
|
|
"description": "List of child contents"
|
|
}
|
|
},
|
|
"required": ["parent_content", "child_contents"]
|
|
},
|
|
"description": "List of parent-child chunk pairs"
|
|
}
|
|
},
|
|
"required": ["parent_mode", "parent_child_chunks"]
|
|
}
|
|
```
|
|
|
|
```yaml multimodal_general_structure highlight={18-56} expandable
|
|
{
|
|
"$id": "https://dify.ai/schemas/v1/multimodal_general_structure.json",
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"version": "1.0.0",
|
|
"type": "array",
|
|
"title": "Multimodal General Structure",
|
|
"description": "Schema for multimodal general structure (v1) - array of objects",
|
|
"properties": {
|
|
"general_chunks": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "object",
|
|
"properties": {
|
|
"content": {
|
|
"type": "string",
|
|
"description": "The content"
|
|
},
|
|
"files": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "object",
|
|
"properties": {
|
|
"name": {
|
|
"type": "string",
|
|
"description": "file name"
|
|
},
|
|
"size": {
|
|
"type": "number",
|
|
"description": "file size"
|
|
},
|
|
"extension": {
|
|
"type": "string",
|
|
"description": "file extension"
|
|
},
|
|
"type": {
|
|
"type": "string",
|
|
"description": "file type"
|
|
},
|
|
"mime_type": {
|
|
"type": "string",
|
|
"description": "file mime type"
|
|
},
|
|
"transfer_method": {
|
|
"type": "string",
|
|
"description": "file transfer method"
|
|
},
|
|
"url": {
|
|
"type": "string",
|
|
"description": "file url"
|
|
},
|
|
"related_id": {
|
|
"type": "string",
|
|
"description": "file related id"
|
|
}
|
|
},
|
|
"description": "List of files"
|
|
}
|
|
}
|
|
},
|
|
"required": ["content"]
|
|
},
|
|
"description": "List of content and files"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
</CodeGroup>
|
|
|
|
## 声明多模态输出结构
|
|
|
|
多模态数据的结构由 Dify 官方提供的 JSON Schema 定义。
|
|
|
|
为了让知识库节点识别插件的多模态输出类型,需在插件的提供者 YAML 文件中将 `output_schema` 的 `result` 字段指向对应的官方 Schema URL。
|
|
|
|
```yaml
|
|
output_schema:
|
|
type: object
|
|
properties:
|
|
result:
|
|
# multimodal-Parent-Child
|
|
$ref: "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json"
|
|
|
|
# multimodal-General
|
|
# $ref: "https://dify.ai/schemas/v1/multimodal_general_structure.json"
|
|
```
|
|
|
|
以 `multimodal-Parent-Child` 为例,一个完整的 YAML 文件配置如下:
|
|
|
|
```yaml expandable
|
|
identity:
|
|
name: multimodal_tool
|
|
author: langgenius
|
|
label:
|
|
en_US: multimodal tool
|
|
zh_Hans: 多模态提取器
|
|
pt_BR: multimodal tool
|
|
description:
|
|
human:
|
|
en_US: Process documents into multimodal-Parent-Child chunk structures
|
|
zh_Hans: 将文档处理为多模态父子分块结构
|
|
pt_BR: Processar documentos em estruturas de divisão pai-filho
|
|
llm: Processes documents into hierarchical multimodal-Parent-Child chunk structures
|
|
|
|
parameters:
|
|
- name: input_text
|
|
human_description:
|
|
en_US: The text you want to chunk.
|
|
zh_Hans: 输入文本
|
|
pt_BR: Conteúdo de Entrada
|
|
label:
|
|
en_US: Input Content
|
|
zh_Hans: 输入文本
|
|
pt_BR: Conteúdo de Entrada
|
|
llm_description: The text you want to chunk.
|
|
required: true
|
|
type: string
|
|
form: llm
|
|
|
|
output_schema:
|
|
type: object
|
|
properties:
|
|
result:
|
|
$ref: "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json"
|
|
extra:
|
|
python:
|
|
source: tools/parent_child_chunk.py
|
|
``` |