--- title: 构建在知识流水线中处理多模态数据的工具插件 --- ⚠️ 本文档由 AI 自动翻译。如有任何不准确之处,请参考[英文原版](/en/develop-plugin/dev-guides-and-walkthroughs/develop-multimodal-data-processing-tool)。 在知识流水线中,知识库节点支持两种多模态数据格式的入参:`multimodal-Parent-Child` 和 `multimodal-General`。 开发用于多模态数据处理的工具插件时,若希望插件输出的多模态数据(如文字、图片、音视频等)能够被知识库节点正确识别并向量化,需要完成以下配置: - **在工具代码中**,调用接口上传并构造文件对象 `files`。 - **在工具提供者 YAML 文件中**,将 `output_schema` 声明为 `multimodal-Parent-Child` 或 `multimodal-General`。 ## 上传并构造文件对象 在处理多模态数据(如图片)时,需要先通过 Dify 的工具会话接口上传文件,以获取文件元数据。 下面以 Dify 官方插件 Dify Extractor 为例,展示如何上传文件并构造文件对象。 ```python # Upload the file using the tool session file_res = self._tool.session.file.upload( file_name, # filename file_blob, # file binary data mime_type, # MIME type, e.g., "image/png" ) # Generate a Markdown image reference using the file preview URL image_url = f"![image]({file_res.preview_url})" ``` 上传接口会返回一个 `UploadFileResponse` 对象,包含文件的基本信息: ```python from enum import Enum from pydantic import BaseModel class UploadFileResponse(BaseModel): class Type(str, Enum): DOCUMENT = "document" IMAGE = "image" VIDEO = "video" AUDIO = "audio" @classmethod def from_mime_type(cls, mime_type: str): if mime_type.startswith("image/"): return cls.IMAGE if mime_type.startswith("video/"): return cls.VIDEO if mime_type.startswith("audio/"): return cls.AUDIO return cls.DOCUMENT id: str name: str size: int extension: str mime_type: str type: Type | None = None preview_url: str | None = None ``` 根据其结构,可将文件信息(如 `name`, `size`, `extension`, `mime_type` 等)映射到多模态输出结构中的 `files` 字段。 ```yaml multimodal_parent_child_structure highlight={22-62} expandable { "$id": "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json", "$schema": "http://json-schema.org/draft-07/schema#", "version": "1.0.0", "type": "object", "title": "Multimodal Parent-Child Structure", "description": "Schema for multimodal parent-child structure (v1)", "properties": { "parent_mode": { "type": "string", "description": "The mode of parent-child relationship" }, "parent_child_chunks": { "type": "array", "items": { "type": "object", "properties": { "parent_content": { "type": "string", "description": "The parent content" }, "files": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "file name" }, "size": { "type": "number", "description": "file size" }, "extension": { "type": "string", "description": "file extension" }, "type": { "type": "string", "description": "file type" }, "mime_type": { "type": "string", "description": "file mime type" }, "transfer_method": { "type": "string", "description": "file transfer method" }, "url": { "type": "string", "description": "file url" }, "related_id": { "type": "string", "description": "file related id" } }, "required": ["name", "size", "extension", "type", "mime_type", "transfer_method", "url", "related_id"] }, "description": "List of files" }, "child_contents": { "type": "array", "items": { "type": "string" }, "description": "List of child contents" } }, "required": ["parent_content", "child_contents"] }, "description": "List of parent-child chunk pairs" } }, "required": ["parent_mode", "parent_child_chunks"] } ``` ```yaml multimodal_general_structure highlight={18-56} expandable { "$id": "https://dify.ai/schemas/v1/multimodal_general_structure.json", "$schema": "http://json-schema.org/draft-07/schema#", "version": "1.0.0", "type": "array", "title": "Multimodal General Structure", "description": "Schema for multimodal general structure (v1) - array of objects", "properties": { "general_chunks": { "type": "array", "items": { "type": "object", "properties": { "content": { "type": "string", "description": "The content" }, "files": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "file name" }, "size": { "type": "number", "description": "file size" }, "extension": { "type": "string", "description": "file extension" }, "type": { "type": "string", "description": "file type" }, "mime_type": { "type": "string", "description": "file mime type" }, "transfer_method": { "type": "string", "description": "file transfer method" }, "url": { "type": "string", "description": "file url" }, "related_id": { "type": "string", "description": "file related id" } }, "description": "List of files" } } }, "required": ["content"] }, "description": "List of content and files" } } } ``` ## 声明多模态输出结构 多模态数据的结构由 Dify 官方提供的 JSON Schema 定义。 为了让知识库节点识别插件的多模态输出类型,需在插件的提供者 YAML 文件中将 `output_schema` 的 `result` 字段指向对应的官方 Schema URL。 ```yaml output_schema: type: object properties: result: # multimodal-Parent-Child $ref: "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json" # multimodal-General # $ref: "https://dify.ai/schemas/v1/multimodal_general_structure.json" ``` 以 `multimodal-Parent-Child` 为例,一个完整的 YAML 文件配置如下: ```yaml expandable identity: name: multimodal_tool author: langgenius label: en_US: multimodal tool zh_Hans: 多模态提取器 pt_BR: multimodal tool description: human: en_US: Process documents into multimodal-Parent-Child chunk structures zh_Hans: 将文档处理为多模态父子分块结构 pt_BR: Processar documentos em estruturas de divisão pai-filho llm: Processes documents into hierarchical multimodal-Parent-Child chunk structures parameters: - name: input_text human_description: en_US: The text you want to chunk. zh_Hans: 输入文本 pt_BR: Conteúdo de Entrada label: en_US: Input Content zh_Hans: 输入文本 pt_BR: Conteúdo de Entrada llm_description: The text you want to chunk. required: true type: string form: llm output_schema: type: object properties: result: $ref: "https://dify.ai/schemas/v1/multimodal_parent_child_structure.json" extra: python: source: tools/parent_child_chunk.py ```