refactor: update default backend to hybrid-auto-engine and enhance documentation for parsing options

This commit is contained in:
myhloli
2025-12-25 19:17:08 +08:00
parent b2c126ef8a
commit 984b303dfa
9 changed files with 362 additions and 354 deletions

142
README.md
View File

@@ -641,74 +641,76 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
<table>
<thead>
<thead>
<tr>
<th rowspan="2">Parsing Backend</th>
<th rowspan="2">pipeline</th>
<th colspan="2">*-auto-engine</th>
<th colspan="2">*-http-client</th>
</tr>
<tr>
<th>hybrid</th>
<th>vlm</th>
<th>hybrid</th>
<th>vlm</th>
</tr>
</thead>
<tbody>
<tr>
<th>Backend Features</th>
<td >Good Compatibility</td>
<td colspan="2">High Config Requirements</td>
<td colspan="2">For OpenAI Compatible Servers<sup>2</sup></td>
</tr>
<tr>
<th>Accuracy<sup>1</sup></th>
<td style="text-align:center;">82+</td>
<td colspan="4" style="text-align:center;">90+</td>
</tr>
<tr>
<th>Operating System</th>
<td colspan="5" style="text-align:center;">Linux<sup>3</sup> / Windows<sup>4</sup> / macOS<sup>5</sup></td>
</tr>
<tr>
<th>Pure CPU Support</th>
<td style="text-align:center;">✅</td>
<td colspan="2" style="text-align:center;">❌</td>
<td colspan="2" style="text-align:center;">✅</td>
</tr>
<tr>
<th rowspan="2">Parsing Backend</th>
<th rowspan="2">pipeline <br> (Accuracy<sup>1</sup> 82+)</th>
<th colspan="5">vlm (Accuracy<sup>1</sup> 90+)</th>
</tr>
<tr>
<th>transformers</th>
<th>mlx-engine</th>
<th>vllm-engine / <br>vllm-async-engine</th>
<th>lmdeploy-engine</th>
<th>http-client</th>
</tr>
</thead>
<tbody>
<tr>
<th>Backend Features</th>
<td>Fast, no hallucinations</td>
<td>Good compatibility, <br>but slower</td>
<td>Faster than transformers</td>
<td>Fast, compatible with the vLLM ecosystem</td>
<td>Fast, compatible with the LMDeploy ecosystem</td>
<td>Suitable for OpenAI-compatible servers<sup>6</sup></td>
</tr>
<tr>
<th>Operating System</th>
<td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
<td style="text-align:center;">macOS<sup>3</sup></td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>5</sup> </td>
<td>Any</td>
</tr>
<tr>
<th>CPU inference support</th>
<td colspan="2" style="text-align:center;">✅</td>
<td colspan="3" style="text-align:center;">❌</td>
<td>Not required</td>
</tr>
<tr>
<th>GPU Requirements</th><td colspan="2" style="text-align:center;">Volta or later architectures, 6 GB VRAM or more, or Apple Silicon</td>
<td>Apple Silicon</td>
<td colspan="2" style="text-align:center;">Volta or later architectures, 8 GB VRAM or more</td>
<td>Not required</td>
</tr>
<tr>
<th>Memory Requirements</th>
<td colspan="5" style="text-align:center;">Minimum 16 GB, 32 GB recommended</td>
<td>8 GB</td>
</tr>
<tr>
<th>Disk Space Requirements</th>
<td colspan="5" style="text-align:center;">20 GB or more, SSD recommended</td>
<td>2 GB</td>
</tr>
<tr>
<th>Python Version</th>
<td colspan="6" style="text-align:center;">3.10-3.13<sup>7</sup></td>
</tr>
</tbody>
<th>GPU Acceleration</th>
<td colspan="4" style="text-align:center;">Volta and later architecture GPUs or Apple Silicon</td>
<td rowspan="2">Not Required</td>
</tr>
<tr>
<th>Min VRAM</th>
<td style="text-align:center;">6GB</td>
<td style="text-align:center;">10GB</td>
<td style="text-align:center;">8GB</td>
<td style="text-align:center;">3GB</td>
</tr>
<tr>
<th>RAM</th>
<td colspan="3" style="text-align:center;">Min 16GB+, Recommended 32GB+</td>
<td colspan="2" style="text-align:center;">8GB</td>
</tr>
<tr>
<th>Disk Space</th>
<td colspan="3" style="text-align:center;">20GB+, SSD Recommended</td>
<td colspan="2" style="text-align:center;">2GB</td>
</tr>
<tr>
<th>Python Version</th>
<td colspan="5" style="text-align:center;">3.10-3.13</td>
</tr>
</tbody>
</table>
<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5), tested on the latest `MinerU` version.
<sup>2</sup> Linux supports only distributions released in 2019 or later.
<sup>3</sup> MLX requires macOS 13.5 or later, recommended for use with version 14.0 or higher.
<sup>4</sup> Windows vLLM support via WSL2(Windows Subsystem for Linux).
<sup>5</sup> Windows LMDeploy can only use the `turbomind` backend, which is slightly slower than the `pytorch` backend. If performance is critical, it is recommended to run it via WSL2.
<sup>6</sup> Servers compatible with the OpenAI API, such as local or remote model services deployed via inference frameworks like `vLLM`, `SGLang`, or `LMDeploy`.
<sup>7</sup> Windows + LMDeploy only supports Python versions 3.103.12, as the critical dependency `ray` does not yet support Python 3.13 on Windows.
<sup>1</sup> Accuracy metrics are the End-to-End Evaluation Overall scores from OmniDocBench (v1.5), based on the latest version of `MinerU`.
<sup>2</sup> Servers compatible with OpenAI API, such as local model servers or remote model services deployed via inference frameworks like `vLLM`/`SGLang`/`LMDeploy`.
<sup>3</sup> Linux only supports distributions from 2019 and later.
<sup>4</sup> Since the key dependency `ray` does not support Python 3.13 on Windows, only versions 3.10~3.12 are supported.
<sup>5</sup> macOS requires version 14.0 or later.
### Install MinerU
@@ -717,19 +719,19 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
```bash
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
uv pip install -U "mineru[all]"
```
#### Install MinerU from source code
```bash
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[core]
uv pip install -e .[all]
```
> [!TIP]
> `mineru[core]` includes all core features except `vLLM`/`LMDeploy` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to use `vLLM`/`LMDeploy` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
> `mineru[all]` includes all core features, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to specify the inference framework for the VLM model, or only intend to install a lightweight client on an edge device, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
---

View File

@@ -631,76 +631,77 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
>
> 在非主线环境中由于硬件、软件配置的多样性以及第三方依赖项的兼容性问题我们无法100%保证项目的完全可用性。因此对于希望在非推荐环境中使用本项目的用户我们建议先仔细阅读文档以及FAQ大多数问题已经在FAQ中有对应的解决方案除此之外我们鼓励社区反馈问题以便我们能够逐步扩大支持范围。
<table>
<thead>
<thead>
<tr>
<th rowspan="2">解析后端</th>
<th rowspan="2">pipeline</th>
<th colspan="2">*-auto-engine</th>
<th colspan="2">*-http-client</th>
</tr>
<tr>
<th>hybrid</th>
<th>vlm</th>
<th>hybrid</th>
<th>vlm</th>
</tr>
</thead>
<tbody>
<tr>
<th>后端特性</th>
<td >兼容性好</td>
<td colspan="2">配置要求较高</td>
<td colspan="2">适用于OpenAI兼容服务器<sup>2</sup></td>
</tr>
<tr>
<th>精度指标<sup>1</sup></th>
<td style="text-align:center;">82+</td>
<td colspan="4" style="text-align:center;">90+</td>
</tr>
<tr>
<th>操作系统</th>
<td colspan="5" style="text-align:center;">Linux<sup>3</sup> / Windows<sup>4</sup> / macOS<sup>5</sup></td>
</tr>
<tr>
<th>纯CPU平台支持</th>
<td style="text-align:center;">✅</td>
<td colspan="2" style="text-align:center;">❌</td>
<td colspan="2" style="text-align:center;">✅</td>
</tr>
<tr>
<th rowspan="2">解析后端</th>
<th rowspan="2">pipeline <br> (精度<sup>1</sup> 82+)</th>
<th colspan="5">vlm (精度<sup>1</sup> 90+)</th>
</tr>
<tr>
<th>transformers</th>
<th>mlx-engine</th>
<th>vllm-engine / <br>vllm-async-engine</th>
<th>lmdeploy-engine</th>
<th>http-client</th>
</tr>
</thead>
<tbody>
<tr>
<th>后端特性</th>
<td>速度快, 无幻觉</td>
<td>兼容性好, 速度较慢</td>
<td>比transformers快</td>
<td>速度快, 兼容vllm生态</td>
<td>速度快, 兼容lmdeploy生态</td>
<td>适用于OpenAI兼容服务器<sup>6</sup></td>
</tr>
<tr>
<th>操作系统</th>
<td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
<td style="text-align:center;">macOS<sup>3</sup></td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>5</sup> </td>
<td>不限</td>
</tr>
<tr>
<th>CPU推理支持</th>
<td colspan="2" style="text-align:center;">✅</td>
<td colspan="3" style="text-align:center;">❌</td>
<td >不需要</td>
</tr>
<tr>
<th>GPU要求</th><td colspan="2" style="text-align:center;">Volta及以后架构, 6G显存以上或Apple Silicon</td>
<td>Apple Silicon</td>
<td colspan="2" style="text-align:center;">Volta及以后架构, 8G显存以上</td>
<td>不需要</td>
</tr>
<tr>
<th>内存要求</th>
<td colspan="5" style="text-align:center;">最低16GB以上, 推荐32GB以上</td>
<td>8GB</td>
</tr>
<tr>
<th>磁盘空间要求</th>
<td colspan="5" style="text-align:center;">20GB以上, 推荐使用SSD</td>
<td>2GB</td>
</tr>
<tr>
<th>python版本</th>
<td colspan="6" style="text-align:center;">3.10-3.13<sup>7</sup></td>
</tr>
</tbody>
</table>
<th>GPU加速支持</th>
<td colspan="4" style="text-align:center;">Volta及以后架构GPU或Apple Silicon</td>
<td rowspan="2">不需要</td>
</tr>
<tr>
<th>显存最低要求</th>
<td style="text-align:center;">6GB</td>
<td style="text-align:center;">10GB</td>
<td style="text-align:center;">8GB</td>
<td style="text-align:center;">3GB</td>
</tr>
<tr>
<th>内存要求</th>
<td colspan="3" style="text-align:center;">最低16GB以上,推荐32GB以上</td>
<td colspan="2" style="text-align:center;">8GB</td>
</tr>
<tr>
<th>磁盘空间要求</th>
<td colspan="3" style="text-align:center;">20GB以上,推荐使用SSD</td>
<td colspan="2" style="text-align:center;">2GB</td>
</tr>
<tr>
<th>python版本</th>
<td colspan="5" style="text-align:center;">3.10-3.13</td>
</tr>
</tbody>
</table>
<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数基于`MinerU`最新版本测试
<sup>2</sup> Linux仅支持2019年及以后发行版
<sup>3</sup> MLX需macOS 13.5及以上版本支持推荐14.0以上版本使用
<sup>4</sup> Windows vLLM通过WSL2(适用于 Linux 的 Windows 子系统)实现支持
<sup>5</sup> Windows LMDeploy只能使用`turbomind`后端,速度比`pytorch`后端稍慢如对速度有要求建议通过WSL2运行
<sup>6</sup> 兼容OpenAI API的服务器如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务
<sup>7</sup> Windows + LMDeploy 由于关键依赖`ray`未能在windows平台支持Python 3.13故仅支持至3.10~3.12版本
<sup>2</sup> 兼容OpenAI API的服务器如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务
<sup>3</sup> Linux仅支持2019年及以后发行版
<sup>4</sup> 由于关键依赖`ray`未能在windows平台支持Python 3.13故仅支持至3.10~3.12版本
<sup>5</sup> macOS 需使用14.0以上版本
> [!TIP]
> 除以上主流环境与平台外,我们也收录了一些社区用户反馈的其他平台支持情况,详情请参考[其他加速卡适配](https://opendatalab.github.io/MinerU/zh/usage/)。
@@ -712,19 +713,19 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
```bash
pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple
pip install uv -i https://mirrors.aliyun.com/pypi/simple
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
uv pip install -U "mineru[all]" -i https://mirrors.aliyun.com/pypi/simple
```
#### 通过源码安装MinerU
```bash
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
uv pip install -e .[all] -i https://mirrors.aliyun.com/pypi/simple
```
> [!TIP]
> `mineru[core]`包含除`vLLM`/`LMDeploy`加速外的所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您需要使用`vLLM`/`LMDeploy`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
> `mineru[all]`包含所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您需要指定vlm模型推理框架,或是仅准备在边缘设备安装轻量版client端可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
---

View File

@@ -9,12 +9,14 @@ from loguru import logger
from mineru.cli.common import convert_pdf_bytes_to_bytes_by_pypdfium2, prepare_env, read_fn
from mineru.data.data_reader_writer import FileBasedDataWriter
from mineru.utils.draw_bbox import draw_layout_bbox, draw_span_bbox
from mineru.utils.engine_utils import get_vlm_engine
from mineru.utils.enum_class import MakeMode
from mineru.backend.vlm.vlm_analyze import doc_analyze as vlm_doc_analyze
from mineru.backend.pipeline.pipeline_analyze import doc_analyze as pipeline_doc_analyze
from mineru.backend.pipeline.pipeline_middle_json_mkcontent import union_make as pipeline_union_make
from mineru.backend.pipeline.model_json_to_middle_json import result_to_middle_json as pipeline_result_to_middle_json
from mineru.backend.vlm.vlm_middle_json_mkcontent import union_make as vlm_union_make
from mineru.backend.hybrid.hybrid_analyze import doc_analyze as hybrid_doc_analyze
from mineru.utils.guess_suffix_or_lang import guess_suffix_by_path
@@ -23,7 +25,7 @@ def do_parse(
pdf_file_names: list[str], # List of PDF file names to be parsed
pdf_bytes_list: list[bytes], # List of PDF bytes to be parsed
p_lang_list: list[str], # List of languages for each PDF, default is 'ch' (Chinese)
backend="pipeline", # The backend for parsing PDF, default is 'pipeline'
backend="hybrid-auto-engine", # The backend for parsing PDF, default is 'hybrid-auto-engine'
parse_method="auto", # The method for parsing PDF, default is 'auto'
formula_enable=True, # Enable formula parsing
table_enable=True, # Enable table parsing
@@ -69,27 +71,60 @@ def do_parse(
f_make_md_mode, middle_json, model_json, is_pipeline=True
)
else:
f_draw_span_bbox = False
if backend.startswith("vlm-"):
backend = backend[4:]
f_draw_span_bbox = False
parse_method = "vlm"
for idx, pdf_bytes in enumerate(pdf_bytes_list):
pdf_file_name = pdf_file_names[idx]
pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, start_page_id, end_page_id)
local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, parse_method)
image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
middle_json, infer_result = vlm_doc_analyze(pdf_bytes, image_writer=image_writer, backend=backend, server_url=server_url)
if backend == "auto-engine":
backend = get_vlm_engine(inference_engine='auto', is_async=False)
pdf_info = middle_json["pdf_info"]
parse_method = "vlm"
for idx, pdf_bytes in enumerate(pdf_bytes_list):
pdf_file_name = pdf_file_names[idx]
pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, start_page_id, end_page_id)
local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, parse_method)
image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
middle_json, infer_result = vlm_doc_analyze(pdf_bytes, image_writer=image_writer, backend=backend, server_url=server_url)
_process_output(
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
f_make_md_mode, middle_json, infer_result, is_pipeline=False
)
pdf_info = middle_json["pdf_info"]
_process_output(
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
f_make_md_mode, middle_json, infer_result, is_pipeline=False
)
elif backend.startswith("hybrid-"):
backend = backend[7:]
if backend == "auto-engine":
backend = get_vlm_engine(inference_engine='auto', is_async=False)
parse_method = f"hybrid_{parse_method}"
for idx, pdf_bytes in enumerate(pdf_bytes_list):
pdf_file_name = pdf_file_names[idx]
pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, start_page_id, end_page_id)
local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, parse_method)
image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
middle_json, infer_result = hybrid_doc_analyze(
pdf_bytes,
image_writer=image_writer,
backend=backend,
parse_method=parse_method,
language=p_lang_list[idx],
inline_formula_enable=formula_enable,
server_url=server_url,
)
pdf_info = middle_json["pdf_info"]
_process_output(
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
f_make_md_mode, middle_json, infer_result, is_pipeline=False
)
def _process_output(
pdf_info,
@@ -160,7 +195,7 @@ def parse_doc(
path_list: list[Path],
output_dir,
lang="ch",
backend="pipeline",
backend="hybrid-auto-engine",
method="auto",
server_url=None,
start_page_id=0,
@@ -170,21 +205,23 @@ def parse_doc(
Parameter description:
path_list: List of document paths to be parsed, can be PDF or image files.
output_dir: Output directory for storing parsing results.
lang: Language option, default is 'ch', optional values include['ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka']。
lang: Language option, default is 'ch', optional values include['ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka', 'th', 'el',
'latin', 'arabic', 'east_slavic', 'cyrillic', 'devanagari']。
Input the languages in the pdf (if known) to improve OCR accuracy. Optional.
Adapted only for the case where the backend is set to "pipeline"
Adapted only for the case where the backend is set to 'pipeline' and 'hybrid-*'
backend: the backend for parsing pdf:
pipeline: More general.
vlm-transformers: More general.
vlm-vllm-engine: Faster(engine).
vlm-http-client: Faster(client).
without method specified, pipeline will be used by default.
vlm-auto-engine: High accuracy via local computing power.
vlm-http-client: High accuracy via remote computing power(client suitable for openai-compatible servers).
hybrid-auto-engine: Next-generation high accuracy solution via local computing power.
hybrid-http-client: High accuracy but requires a little local computing power(client suitable for openai-compatible servers).
Without method specified, hybrid-auto-engine will be used by default.
method: the method for parsing pdf:
auto: Automatically determine the method based on the file type.
txt: Use text extraction method.
ocr: Use OCR method for image-based PDFs.
Without method specified, 'auto' will be used by default.
Adapted only for the case where the backend is set to "pipeline".
Adapted only for the case where the backend is set to 'pipeline' and 'hybrid-*'.
server_url: When the backend is `http-client`, you need to specify the server_url, for example:`http://127.0.0.1:30000`
start_page_id: Start page ID for parsing, default is 0
end_page_id: End page ID for parsing, default is None (parse all pages until the end of the document)
@@ -230,12 +267,11 @@ if __name__ == '__main__':
"""如果您由于网络问题无法下载模型可以设置环境变量MINERU_MODEL_SOURCE为modelscope使用免代理仓库下载模型"""
# os.environ['MINERU_MODEL_SOURCE'] = "modelscope"
"""Use pipeline mode if your environment does not support VLM"""
parse_doc(doc_path_list, output_dir, backend="pipeline")
"""Use hybrid mode and local computing power to parse documents"""
parse_doc(doc_path_list, output_dir, backend="hybrid-auto-engine")
"""To enable VLM mode, change the backend to 'vlm-xxx'"""
# parse_doc(doc_path_list, output_dir, backend="vlm-transformers") # more general.
# parse_doc(doc_path_list, output_dir, backend="vlm-mlx-engine") # faster than transformers in macOS 13.5+.
# parse_doc(doc_path_list, output_dir, backend="vlm-vllm-engine") # faster(vllm-engine).
# parse_doc(doc_path_list, output_dir, backend="vlm-lmdeploy-engine") # faster(lmdeploy-engine).
# parse_doc(doc_path_list, output_dir, backend="vlm-http-client", server_url="http://127.0.0.1:30000") # faster(client).
"""Other backends for parsing documents, you can uncomment and try"""
# parse_doc(doc_path_list, output_dir, backend="pipeline") # more general.
# parse_doc(doc_path_list, output_dir, backend="vlm-auto-engine") # high accuracy via local computing power.
# parse_doc(doc_path_list, output_dir, backend="vlm-http-client", server_url="http://127.0.0.1:30000") # high accuracy via remote computing power(client suitable for openai-compatible servers).
# parse_doc(doc_path_list, output_dir, backend="hybrid-http-client", server_url="http://127.0.0.1:30000") # high accuracy but requires a little local computing power(client suitable for openai-compatible servers).

View File

@@ -27,74 +27,76 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
> In non-mainstream environments, due to the diversity of hardware and software configurations, as well as compatibility issues with third-party dependencies, we cannot guarantee 100% usability of the project. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first, as most issues have corresponding solutions in the FAQ. Additionally, we encourage community feedback on issues so that we can gradually expand our support range.
<table border="1">
<thead>
<thead>
<tr>
<th rowspan="2">Parsing Backend</th>
<th rowspan="2">pipeline</th>
<th colspan="2">*-auto-engine</th>
<th colspan="2">*-http-client</th>
</tr>
<tr>
<th>hybrid</th>
<th>vlm</th>
<th>hybrid</th>
<th>vlm</th>
</tr>
</thead>
<tbody>
<tr>
<th>Backend Features</th>
<td >Good Compatibility</td>
<td colspan="2">High Config Requirements</td>
<td colspan="2">For OpenAI Compatible Servers<sup>2</sup></td>
</tr>
<tr>
<th>Accuracy<sup>1</sup></th>
<td style="text-align:center;">82+</td>
<td colspan="4" style="text-align:center;">90+</td>
</tr>
<tr>
<th>Operating System</th>
<td colspan="5" style="text-align:center;">Linux<sup>3</sup> / Windows<sup>4</sup> / macOS<sup>5</sup></td>
</tr>
<tr>
<th>Pure CPU Support</th>
<td style="text-align:center;">✅</td>
<td colspan="2" style="text-align:center;">❌</td>
<td colspan="2" style="text-align:center;">✅</td>
</tr>
<tr>
<th rowspan="2">Parsing Backend</th>
<th rowspan="2">pipeline <br> (Accuracy<sup>1</sup> 82+)</th>
<th colspan="5">vlm (Accuracy<sup>1</sup> 90+)</th>
</tr>
<tr>
<th>transformers</th>
<th>mlx-engine</th>
<th>vllm-engine / <br>vllm-async-engine</th>
<th>lmdeploy-engine</th>
<th>http-client</th>
</tr>
</thead>
<tbody>
<tr>
<th>Backend Features</th>
<td>Fast, no hallucinations</td>
<td>Good compatibility, <br>but slower</td>
<td>Faster than transformers</td>
<td>Fast, compatible with the vLLM ecosystem</td>
<td>Fast, compatible with the LMDeploy ecosystem</td>
<td>Suitable for OpenAI-compatible servers<sup>6</sup></td>
</tr>
<tr>
<th>Operating System</th>
<td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
<td style="text-align:center;">macOS<sup>3</sup></td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>5</sup> </td>
<td>Any</td>
</tr>
<tr>
<th>CPU inference support</th>
<td colspan="2" style="text-align:center;">✅</td>
<td colspan="3" style="text-align:center;">❌</td>
<td>Not required</td>
</tr>
<tr>
<th>GPU Requirements</th><td colspan="2" style="text-align:center;">Volta or later architectures, 6 GB VRAM or more, or Apple Silicon</td>
<td>Apple Silicon</td>
<td colspan="2" style="text-align:center;">Volta or later architectures, 8 GB VRAM or more</td>
<td>Not required</td>
</tr>
<tr>
<th>Memory Requirements</th>
<td colspan="5" style="text-align:center;">Minimum 16 GB, 32 GB recommended</td>
<td>8 GB</td>
</tr>
<tr>
<th>Disk Space Requirements</th>
<td colspan="5" style="text-align:center;">20 GB or more, SSD recommended</td>
<td>2 GB</td>
</tr>
<tr>
<th>Python Version</th>
<td colspan="6" style="text-align:center;">3.10-3.13<sup>7</sup></td>
</tr>
</tbody>
<th>GPU Acceleration</th>
<td colspan="4" style="text-align:center;">Volta and later architecture GPUs or Apple Silicon</td>
<td rowspan="2">Not Required</td>
</tr>
<tr>
<th>Min VRAM</th>
<td style="text-align:center;">6GB</td>
<td style="text-align:center;">10GB</td>
<td style="text-align:center;">8GB</td>
<td style="text-align:center;">3GB</td>
</tr>
<tr>
<th>RAM</th>
<td colspan="3" style="text-align:center;">Min 16GB+, Recommended 32GB+</td>
<td colspan="2" style="text-align:center;">8GB</td>
</tr>
<tr>
<th>Disk Space</th>
<td colspan="3" style="text-align:center;">20GB+, SSD Recommended</td>
<td colspan="2" style="text-align:center;">2GB</td>
</tr>
<tr>
<th>Python Version</th>
<td colspan="5" style="text-align:center;">3.10-3.13</td>
</tr>
</tbody>
</table>
<sup>1</sup> Accuracy metric is the End-to-End Evaluation Overall score of OmniDocBench (v1.5), tested on the latest `MinerU` version.
<sup>2</sup> Linux supports only distributions released in 2019 or later.
<sup>3</sup> MLX requires macOS 13.5 or later, recommended for use with version 14.0 or higher.
<sup>4</sup> Windows vLLM support via WSL2(Windows Subsystem for Linux).
<sup>5</sup> Windows LMDeploy can only use the `turbomind` backend, which is slightly slower than the `pytorch` backend. If performance is critical, it is recommended to run it via WSL2.
<sup>6</sup> Servers compatible with the OpenAI API, such as local or remote model services deployed via inference frameworks like `vLLM`, `SGLang`, or `LMDeploy`.
<sup>7</sup> Windows + LMDeploy only supports Python versions 3.103.12, as the critical dependency `ray` does not yet support Python 3.13 on Windows.
<sup>1</sup> Accuracy metrics are the End-to-End Evaluation Overall scores from OmniDocBench (v1.5), based on the latest version of `MinerU`.
<sup>2</sup> Servers compatible with OpenAI API, such as local model servers or remote model services deployed via inference frameworks like `vLLM`/`SGLang`/`LMDeploy`.
<sup>3</sup> Linux only supports distributions from 2019 and later.
<sup>4</sup> Since the key dependency `ray` does not support Python 3.13 on Windows, only versions 3.10~3.12 are supported.
<sup>5</sup> macOS requires version 14.0 or later.
### Install MinerU
@@ -103,19 +105,19 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
```bash
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
uv pip install -U "mineru[all]"
```
#### Install MinerU from source code
```bash
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[core]
uv pip install -e .[all]
```
> [!TIP]
> `mineru[core]` includes all core features except `vLLM`/`LMDeploy` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to use `vLLM`/`LMDeploy` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
> `mineru[all]` includes all core features, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to specify the inference framework for the VLM model, or only intend to install a lightweight client on an edge device, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
---

View File

@@ -10,7 +10,6 @@ For more information about model source configuration and custom local model pat
## Quick Usage via Command Line
MinerU has built-in command line tools that allow users to quickly use MinerU for PDF parsing through the command line:
```bash
# Default parsing using pipeline backend
mineru -p <input_path> -o <output_path>
```
> [!TIP]
@@ -23,14 +22,6 @@ mineru -p <input_path> -o <output_path>
> The command line tool will automatically attempt cuda/mps acceleration on Linux and macOS systems.
> Windows users who need cuda acceleration should visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to select the appropriate command for their cuda version to install acceleration-enabled `torch` and `torchvision`.
```bash
# Or specify vlm backend for parsing
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> The vlm backend additionally supports `vllm`/`lmdeploy` acceleration. Compared to the `transformers` backend, inference speed can be significantly improved. You can check the installation method for the complete package supporting `vllm`/`lmdeploy` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
## Advanced Usage via API, WebUI, http-client/server
@@ -44,12 +35,7 @@ If you need to adjust parsing options through custom parameters, you can also ch
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend:
```bash
# Using pipeline/vlm-transformers/vlm-http-client backends
mineru-gradio --server-name 0.0.0.0 --server-port 7860
# Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
# Or using vlm-lmdeploy-engine/pipeline backends (requires lmdeploy environment)
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-lmdeploy-engine true
```
>[!TIP]
>
@@ -58,16 +44,12 @@ If you need to adjust parsing options through custom parameters, you can also ch
- Using `http-client/server` method:
```bash
# Start openai compatible server (requires vllm or lmdeploy environment)
mineru-openai-server
# Or start vllm server (requires vllm environment)
mineru-openai-server --engine vllm --port 30000
# Or start lmdeploy server (requires lmdeploy environment)
mineru-openai-server --engine lmdeploy --server-port 30000
mineru-openai-server --port 30000
```
>[!TIP]
>In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
>In another terminal, connect to openai server via http client
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
> mineru -p <input_path> -o <output_path> -b hybrid-http-client -u http://127.0.0.1:30000
> ```
> [!NOTE]

View File

@@ -27,74 +27,76 @@
> 在非主线环境中由于硬件、软件配置的多样性以及第三方依赖项的兼容性问题我们无法100%保证项目的完全可用性。因此对于希望在非推荐环境中使用本项目的用户我们建议先仔细阅读文档以及FAQ大多数问题已经在FAQ中有对应的解决方案除此之外我们鼓励社区反馈问题以便我们能够逐步扩大支持范围。
<table border="1">
<thead>
<thead>
<tr>
<th rowspan="2">解析后端</th>
<th rowspan="2">pipeline</th>
<th colspan="2">*-auto-engine</th>
<th colspan="2">*-http-client</th>
</tr>
<tr>
<th>hybrid</th>
<th>vlm</th>
<th>hybrid</th>
<th>vlm</th>
</tr>
</thead>
<tbody>
<tr>
<th>后端特性</th>
<td >兼容性好</td>
<td colspan="2">配置要求较高</td>
<td colspan="2">适用于OpenAI兼容服务器<sup>2</sup></td>
</tr>
<tr>
<th>精度指标<sup>1</sup></th>
<td style="text-align:center;">82+</td>
<td colspan="4" style="text-align:center;">90+</td>
</tr>
<tr>
<th>操作系统</th>
<td colspan="5" style="text-align:center;">Linux<sup>3</sup> / Windows<sup>4</sup> / macOS<sup>5</sup></td>
</tr>
<tr>
<th>纯CPU平台支持</th>
<td style="text-align:center;">✅</td>
<td colspan="2" style="text-align:center;">❌</td>
<td colspan="2" style="text-align:center;">✅</td>
</tr>
<tr>
<th rowspan="2">解析后端</th>
<th rowspan="2">pipeline <br> (精度<sup>1</sup> 82+)</th>
<th colspan="5">vlm (精度<sup>1</sup> 90+)</th>
</tr>
<tr>
<th>transformers</th>
<th>mlx-engine</th>
<th>vllm-engine / <br>vllm-async-engine</th>
<th>lmdeploy-engine</th>
<th>http-client</th>
</tr>
</thead>
<tbody>
<tr>
<th>后端特性</th>
<td>速度快, 无幻觉</td>
<td>兼容性好, 速度较慢</td>
<td>比transformers快</td>
<td>速度快, 兼容vllm生态</td>
<td>速度快, 兼容lmdeploy生态</td>
<td>适用于OpenAI兼容服务器<sup>6</sup></td>
</tr>
<tr>
<th>操作系统</th>
<td colspan="2" style="text-align:center;">Linux<sup>2</sup> / Windows / macOS</td>
<td style="text-align:center;">macOS<sup>3</sup></td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>4</sup> </td>
<td style="text-align:center;">Linux<sup>2</sup> / Windows<sup>5</sup> </td>
<td>不限</td>
</tr>
<tr>
<th>CPU推理支持</th>
<td colspan="2" style="text-align:center;">✅</td>
<td colspan="3" style="text-align:center;">❌</td>
<td >不需要</td>
</tr>
<tr>
<th>GPU要求</th><td colspan="2" style="text-align:center;">Volta及以后架构, 6G显存以上或Apple Silicon</td>
<td>Apple Silicon</td>
<td colspan="2" style="text-align:center;">Volta及以后架构, 8G显存以上</td>
<td>不需要</td>
</tr>
<tr>
<th>内存要求</th>
<td colspan="5" style="text-align:center;">最低16GB以上, 推荐32GB以上</td>
<td>8GB</td>
</tr>
<tr>
<th>磁盘空间要求</th>
<td colspan="5" style="text-align:center;">20GB以上, 推荐使用SSD</td>
<td>2GB</td>
</tr>
<tr>
<th>python版本</th>
<td colspan="6" style="text-align:center;">3.10-3.13<sup>7</sup></td>
</tr>
</tbody>
</table>
<th>GPU加速支持</th>
<td colspan="4" style="text-align:center;">Volta及以后架构GPU或Apple Silicon</td>
<td rowspan="2">不需要</td>
</tr>
<tr>
<th>显存最低要求</th>
<td style="text-align:center;">6GB</td>
<td style="text-align:center;">10GB</td>
<td style="text-align:center;">8GB</td>
<td style="text-align:center;">3GB</td>
</tr>
<tr>
<th>内存要求</th>
<td colspan="3" style="text-align:center;">最低16GB以上,推荐32GB以上</td>
<td colspan="2" style="text-align:center;">8GB</td>
</tr>
<tr>
<th>磁盘空间要求</th>
<td colspan="3" style="text-align:center;">20GB以上,推荐使用SSD</td>
<td colspan="2" style="text-align:center;">2GB</td>
</tr>
<tr>
<th>python版本</th>
<td colspan="5" style="text-align:center;">3.10-3.13</td>
</tr>
</tbody>
</table>
<sup>1</sup> 精度指标为OmniDocBench (v1.5)的End-to-End Evaluation Overall分数基于`MinerU`最新版本测试
<sup>2</sup> Linux仅支持2019年及以后发行版
<sup>3</sup> MLX需macOS 13.5及以上版本支持推荐14.0以上版本使用
<sup>4</sup> Windows vLLM通过WSL2(适用于 Linux 的 Windows 子系统)实现支持
<sup>5</sup> Windows LMDeploy只能使用`turbomind`后端,速度比`pytorch`后端稍慢如对速度有要求建议通过WSL2运行
<sup>6</sup> 兼容OpenAI API的服务器如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务
<sup>7</sup> Windows + LMDeploy 由于关键依赖`ray`未能在windows平台支持Python 3.13故仅支持至3.10~3.12版本
<sup>2</sup> 兼容OpenAI API的服务器如通过`vLLM`/`SGLang`/`LMDeploy`等推理框架部署的本地模型服务器或远程模型服务
<sup>3</sup> Linux仅支持2019年及以后发行版
<sup>4</sup> 由于关键依赖`ray`未能在windows平台支持Python 3.13故仅支持至3.10~3.12版本
<sup>5</sup> macOS 需使用14.0以上版本
> [!TIP]
@@ -108,19 +110,19 @@
```bash
pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple
pip install uv -i https://mirrors.aliyun.com/pypi/simple
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
uv pip install -U "mineru[all]" -i https://mirrors.aliyun.com/pypi/simple
```
#### 通过源码安装MinerU
```bash
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
uv pip install -e .[all] -i https://mirrors.aliyun.com/pypi/simple
```
> [!TIP]
> `mineru[core]`包含除`vLLM`/`LMDeploy`加速外的所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您需要使用`vLLM`/`LMDeploy`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
> `mineru[all]`包含所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您需要指定vlm模型推理框架,或是仅准备在边缘设备安装轻量版client端可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
---

View File

@@ -10,7 +10,6 @@ export MINERU_MODEL_SOURCE=modelscope
## 通过命令行快速使用
MinerU内置了命令行工具用户可以通过命令行快速使用MinerU进行PDF解析
```bash
# 默认使用pipeline后端解析
mineru -p <input_path> -o <output_path>
```
> [!TIP]
@@ -23,13 +22,6 @@ mineru -p <input_path> -o <output_path>
> 命令行工具会在Linux和macOS系统自动尝试cuda/mps加速。Windows用户如需使用cuda加速
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择适合自己cuda版本的命令安装支持加速的`torch`和`torchvision`。
```bash
# 或指定vlm后端解析
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> vlm后端另外支持`vllm`/`lmdeploy`加速,与`transformers`后端相比,推理速度可大幅提升。可以在[扩展模块安装指南](../quick_start/extension_modules.md)中查看支持`vllm`/`lmdeploy`加速的扩展包安装方法。
如果需要通过自定义参数调整解析选项,您也可以在文档中查看更详细的[命令行工具使用说明](./cli_tools.md)。
## 通过api、webui、http-client/server进阶使用
@@ -43,12 +35,7 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
>在浏览器中访问 `http://127.0.0.1:8000/docs` 查看API文档。
- 启动gradio webui 可视化前端:
```bash
# 使用 pipeline/vlm-transformers/vlm-http-client 后端
mineru-gradio --server-name 0.0.0.0 --server-port 7860
# 或使用 vlm-vllm-engine/pipeline 后端需安装vllm环境
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
# 或使用 vlm-lmdeploy-engine/pipeline 后端需安装lmdeploy环境
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-lmdeploy-engine true
```
>[!TIP]
>
@@ -57,16 +44,12 @@ mineru -p <input_path> -o <output_path> -b vlm-transformers
- 使用`http-client/server`方式调用:
```bash
# 启动openai兼容服务器(需要安装vllm或lmdeploy环境)
mineru-openai-server
# 或指定vllm为推理引擎(需要安装vllm环境)
mineru-openai-server --engine vllm --port 30000
# 或指定lmdeploy为推理引擎(需要安装lmdeploy环境)
mineru-openai-server --engine lmdeploy --server-port 30000
mineru-openai-server --port 30000
```
>[!TIP]
>在另一个终端中通过http client连接vllm server只需cpu与网络不需要vllm环境
>在另一个终端中通过http client连接openai server
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
> mineru -p <input_path> -o <output_path> -b hybrid-http-client -u http://127.0.0.1:30000
> ```
> [!NOTE]

View File

@@ -61,7 +61,7 @@ from .common import do_parse, read_fn, pdf_suffixes, image_suffixes
vlm-http-client: High accuracy via remote computing power(client suitable for openai-compatible servers).
hybrid-auto-engine: Next-generation high accuracy solution via local computing power.
hybrid-http-client: High accuracy but requires a little local computing power(client suitable for openai-compatible servers).
Without method specified, pipeline will be used by default.""",
Without method specified, hybrid-auto-engine will be used by default.""",
default='hybrid-auto-engine',
)
@click.option(

View File

@@ -39,7 +39,7 @@ dependencies = [
"openai>=1.70.0,<3",
"beautifulsoup4>=4.13.5,<5",
"magika>=0.6.2,<1.1.0",
"mineru-vl-utils>=0.1.18,<1",
"mineru-vl-utils>=0.1.19.1,<1",
"qwen-vl-utils>=0.0.14,<1",
]