dify-docs

---
title: Creating Knowledge Base
---

To create a knowledge base and upload documents, follow these steps:

1. Create a knowledge base by uploading local files, importing online data, or creating an empty knowledge base.

<Card title="Import Text Data" icon="link" href="/en/guides/knowledge-base/create-knowledge-and-upload-documents/import-content-data">
Choose to create a knowledge base by uploading local files or importing online data
</Card>

2. Choose a chunking mode. This stage involves content preprocessing and data structuring, where long texts are divided into multiple content chunks. You can preview the text chunking results at this stage.

<Card title="Chunking and Cleaning Text" icon="link" href="/en/guides/knowledge-base/create-knowledge-and-upload-documents/chunking-and-cleaning-text">
Learn how to set up text chunking and cleaning rules
</Card>

3. Set up the indexing method and retrieval settings. After receiving a user query, the knowledge base searches for relevant content in existing documents according to preset retrieval methods, extracting highly relevant information chunks for the language model to generate high-quality answers.

<Card title="Setting up Indexing Methods" icon="link" href="/en/guides/knowledge-base/create-knowledge-and-upload-documents/setting-indexing-methods">
Learn how to configure retrieval methods and related settings
</Card>

5. Wait for chunk embedding
6. Complete the upload, integrate the knowledge base in the application and use it. You can refer to [Integrating Knowledge Base in Applications](/en/guides/knowledge-base/integrate-knowledge-within-application) to build an LLM application capable of answering questions based on the knowledge base. For knowledge base modification or management, please refer to [Managing Knowledge Base and Documents](/en/guides/knowledge-base/knowledge-and-documents-maintenance).

![Complete the creation of knowledge base](https://assets-docs.dify.ai/2024/12/a3362a1cd384cb2b539c9858de555518.png)

### Reference Reading

#### ETL

In production-level RAG applications, data preprocessing and cleaning of multi-source data, or ETL (_extract, transform, load_), is necessary for better data recall results. To enhance the preprocessing capabilities of unstructured/semi-structured data, Dify supports optional ETL solutions: **Dify ETL** and [**Unstructured ETL**](https://unstructured.io/). Unstructured efficiently extracts and transforms your data into clean data for subsequent steps. ETL solution choices for different versions of Dify:

* SaaS version: Non-optional, uses Unstructured ETL by default;
* Community version: Optional, uses Dify ETL by default, can enable Unstructured ETL through [environment variables](/en/getting-started/install-self-hosted/environments#knowledge-base-configuration);

Differences in supported file parsing formats:

| DIFY ETL | Unstructured ETL |
| --- | --- |
| txt, markdown, md, pdf, html, htm, xlsx, xls, docx, csv | txt, markdown, md, pdf, html, htm, xlsx, xls, docx, csv, eml, msg, pptx, ppt, xml, epub |

Different ETL solutions also have differences in file extraction effects. To learn more about Unstructured ETL's data processing methods, please refer to the [official documentation](https://docs.unstructured.io/open-source/core-functionality/partitioning).

#### **Embedding**

**Embedding** is a technique for converting discrete variables (such as words, sentences, or entire documents) into continuous vector representations. It can map high-dimensional data (such as words, phrases, or images) to low-dimensional spaces, providing a compact and efficient representation. This representation not only reduces data dimensionality but also preserves important semantic information, making subsequent content retrieval more efficient.

**Embedding models** are large language models specifically designed for text vectorization. They excel at converting text into dense numerical vectors, effectively capturing semantic information.

> To learn more, please refer to: ["Dify: Embedding Technology and Dify Knowledge Base Design/Planning"](https://mp.weixin.qq.com/s/vmY_CUmETo2IpEBf1nEGLQ).

#### **Metadata**

To use metadata functionality for managing knowledge bases, please refer to [Metadata](/en/guides/knowledge-base/metadata).

{/*
Contributing Section
DO NOT edit this section!
It will be automatically generated by the script.
*/}

---

[Edit this page](https://github.com/langgenius/dify-docs/edit/main/en/guides/knowledge-base/create-knowledge-and-upload-documents/readme.mdx) | [Report an issue](https://github.com/langgenius/dify-docs/issues/new?template=docs.yml)