Files
dify-docs/en/guides/knowledge-base
sunshinesDL 4e47eb05c1 Update chunking-and-cleaning-text.mdx (#345)
* Update chunking-and-cleaning-text.mdx

原文中 '通用模式' 小节中,关于 '分段标识符' 的解释中描述如下:
`分段标识符,默认值为 \n,即按照文章段落进行分块。你可以遵循正则表达式语法自定义分块规则,系统将在文本出现分段标识符时自动执行分段。例如 的含义是按照句子进行分段。下图是不同语法的文本分段效果:`
`例如` 后好像缺少了转义字符,这里补充正则表达式 `(?<=[.!?])\s+` 以查找句子结束标点(., !, ?)后的空白字符,从而按照句子分段。
此外,在 "父子模式" 小节中,以下内容好像也遗漏了转义字符:
`在子分段内填写以下分段设置:
分段标识符,默认值为 ,即按照句子进行分段。你可以遵循正则表达式语法自定义分块规则,系统将在文本出现分段标识符时自动执行分段。`
这里补充 `\.|\!|\?` 作为按句子分段的标识符,供作者审核。

* correct, update, and remove expired content

---------

Co-authored-by: Riskey <riskey47@dify.ai>
2025-10-31 17:43:10 +08:00
..
2025-07-16 16:42:34 +08:00
2025-07-16 16:42:34 +08:00
2025-07-16 16:42:34 +08:00

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Knowledge
---

Difys Knowledge feature visualizes each stage of the RAG pipeline, providing a friendly UI for application builders to easily manage personal or team knowledge. It also allows for seamless integration into AI applications.

Developers can upload internal company documents, FAQs, and standard working guides, then process them into structured data that large language models (LLMs) can query.

Compared with the static pre-trained datasets built into AI models, the content in a knowledge base can be updated in real time, ensuring LLMs always have access to the latest information and helping avoid problems caused by outdated or missing data.

When an LLM receives a user query, it first uses keywords to search within the knowledge base. Based on those keywords, the knowledge base returns content chunks with high relevance rankings, giving the LLM crucial context to generate more precise answers.

This approach ensures LLMs dont rely solely on pre-trained knowledge. Instead, they can also draw from real-time documents and databases, enhancing both the accuracy and relevance of responses.

**Key Advantages**

**• Real-Time Updates**: The knowledge base can be updated anytime, ensuring the model always has the latest information.

• **Precision**: By retrieving relevant documents, the LLM can ground its answers in actual information, minimizing hallucinations.

• **Flexibility**: Developers can customize the knowledge base content to match specific needs, defining the scope of knowledge as required.

***

You only need to prepare text content, such as:

* Long text content (TXT, Markdown, DOCX, HTML, JSONL, or even PDF files)
* Structured data (CSV, Excel, etc.)
* Online data source(Web pages, Notion, etc.)

By simply uploading files to the **Knowledge Base**, data processing is handled automatically.

> If your team already has an independent knowledge base, you can use the [“Connect to an External Knowledge Base”](./connect-external-knowledge-base) feature to establish its connection with Dify.

![](https://assets-docs.dify.ai/2024/12/effc826d2584d5f2983cdcd746099bb6.png)

### **Use Case**

If you want to create an AI customer support assistant based on your existing knowledge base and product documentation, you can simply upload those files to the Knowledge Base in Dify and then set up a conversational application.

Traditionally, going from raw text training to a fully developed AI customer support chatbot could take weeks, plus its challenging to maintain and iterate effectively.

In Dify, the entire process takes just three minutes, after which you can immediately begin gathering user feedback.

### Knowledge Base and Documents

In Dify, a Knowledge Base is a collection of Documents, each of which can include multiple Chunks of content. You can integrate an entire knowledge base into an application to serve as a retrieval context, drawing from uploaded files or data synchronized from other sources.

If your team already has an independent, external knowledge that is separate from the Dify platform, you can link it using the [External Knowledge Base](./connect-external-knowledge-base) feature. This way, you dont need to re-upload all your content to Dify. Your AI app can directly access and process information in real time from your teams existing knowledge.

{/*
Contributing Section
DO NOT edit this section!
It will be automatically generated by the script.
*/}

---

[Edit this page](https://github.com/langgenius/dify-docs/edit/main/en/guides/knowledge-base/readme.mdx) | [Report an issue](https://github.com/langgenius/dify-docs/issues/new?template=docs.yml)