Compare commits
172 Commits
release-2.
...
release-2.
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a0da3029fd | ||
|
|
30fe325428 | ||
|
|
6131013ce9 | ||
|
|
f1c145054a | ||
|
|
078aaaf150 | ||
|
|
c3a55fffab | ||
|
|
4eddf28c8f | ||
|
|
dd92c5b723 | ||
|
|
b5922086cb | ||
|
|
df12e4fc79 | ||
|
|
90ed311198 | ||
|
|
c922c63fbc | ||
|
|
28b278508f | ||
|
|
6b54f321b4 | ||
|
|
e47ec7cd10 | ||
|
|
701f6018f2 | ||
|
|
5ade203e31 | ||
|
|
6e83f37754 | ||
|
|
972161a991 | ||
|
|
700e11d342 | ||
|
|
fd79885b23 | ||
|
|
a0810b5b6e | ||
|
|
39271b45de | ||
|
|
db68aaf4ac | ||
|
|
a6cc8fa90d | ||
|
|
47f34f4ce8 | ||
|
|
b7a8347f45 | ||
|
|
c6d241f4f4 | ||
|
|
06b2fda1c1 | ||
|
|
5c1ca9271e | ||
|
|
e7485c5d79 | ||
|
|
80436a89f9 | ||
|
|
b36793cef0 | ||
|
|
43b51e78fc | ||
|
|
9688f73046 | ||
|
|
c02edd9cba | ||
|
|
b4d08e994c | ||
|
|
a220b8a208 | ||
|
|
ab480a7a86 | ||
|
|
f57a6d8d9e | ||
|
|
915ba87f7d | ||
|
|
42a95e8e20 | ||
|
|
a513357607 | ||
|
|
c8ccf4cf20 | ||
|
|
33d43a5afc | ||
|
|
3b057c7996 | ||
|
|
34547262a2 | ||
|
|
cd0ed982c0 | ||
|
|
52dcbcbfa5 | ||
|
|
0758de6d24 | ||
|
|
ae7892a6f9 | ||
|
|
73567ccedc | ||
|
|
bb552282f3 | ||
|
|
14c38101f7 | ||
|
|
cb3a30e9ad | ||
|
|
f4db41d0cb | ||
|
|
dad59f7d52 | ||
|
|
499e877165 | ||
|
|
2d249666ba | ||
|
|
cedc62a728 | ||
|
|
1e40bac24f | ||
|
|
23701d0db4 | ||
|
|
e7d8bf097a | ||
|
|
08a89aeca1 | ||
|
|
1b724f3336 | ||
|
|
ea4271ab37 | ||
|
|
d83b83a5ad | ||
|
|
0853b84e87 | ||
|
|
36225160a3 | ||
|
|
a36118f8ba | ||
|
|
a38384e7fb | ||
|
|
4b7c2bbcc0 | ||
|
|
504fe6ada3 | ||
|
|
39be54023b | ||
|
|
484ff5a6f9 | ||
|
|
59a7a577b3 | ||
|
|
0e73ef9615 | ||
|
|
d580d6c7f8 | ||
|
|
4c8bb038ce | ||
|
|
a89715b9a2 | ||
|
|
f05ea7c2e6 | ||
|
|
b68db3ab90 | ||
|
|
3539cfba36 | ||
|
|
3bf50d5267 | ||
|
|
2108019698 | ||
|
|
17a9921ba9 | ||
|
|
3baee1d077 | ||
|
|
e1ee728e31 | ||
|
|
1b45e6e1bc | ||
|
|
966aadd1d3 | ||
|
|
ecb8e3f0ac | ||
|
|
1bef6e3526 | ||
|
|
4c4d1d0f95 | ||
|
|
c36aa54370 | ||
|
|
4b480cfcf7 | ||
|
|
7e18e1bb76 | ||
|
|
44fdeb663f | ||
|
|
cf59949ba9 | ||
|
|
c8c2f28afc | ||
|
|
aa4bc6259b | ||
|
|
b7e4ea0b49 | ||
|
|
998197a47f | ||
|
|
3c8b6e6b6b | ||
|
|
be42b46ff9 | ||
|
|
7c689e33b8 | ||
|
|
af66bc02c2 | ||
|
|
752f75ad8e | ||
|
|
1cfde98585 | ||
|
|
54676295d5 | ||
|
|
61c7c65d8b | ||
|
|
6f05f735d0 | ||
|
|
befb16e531 | ||
|
|
abc433d6f2 | ||
|
|
e7c1385068 | ||
|
|
342c5aa34a | ||
|
|
f25ddfa024 | ||
|
|
e31de3a453 | ||
|
|
2f01754410 | ||
|
|
8a9921fb22 | ||
|
|
652e11a253 | ||
|
|
61cc6886fe | ||
|
|
80dc57e7ce | ||
|
|
d84a006f6d | ||
|
|
2c5361bf8e | ||
|
|
eb01b7acf9 | ||
|
|
5656f1363b | ||
|
|
c9315b8e10 | ||
|
|
907099762f | ||
|
|
2c356cccee | ||
|
|
0f62f166e6 | ||
|
|
c7a64e72dc | ||
|
|
3cb3a94830 | ||
|
|
8301fa4c20 | ||
|
|
4400f4b75f | ||
|
|
92efb8f96e | ||
|
|
9a88cbfb09 | ||
|
|
e96e4a0ce4 | ||
|
|
c7bde0ab39 | ||
|
|
8754c24e42 | ||
|
|
4f8c00cc34 | ||
|
|
89681f98ad | ||
|
|
66d328dbc5 | ||
|
|
f0c1318545 | ||
|
|
6e97f3cf70 | ||
|
|
aede62167e | ||
|
|
5f2740f743 | ||
|
|
a888d2b625 | ||
|
|
4275876331 | ||
|
|
ec9f7f54ab | ||
|
|
7861e5e369 | ||
|
|
159f3a89a3 | ||
|
|
d9452bbeb9 | ||
|
|
d808a32c0b | ||
|
|
12ce3bd024 | ||
|
|
e3d7aece50 | ||
|
|
7c55a0ea65 | ||
|
|
f1659eb7a7 | ||
|
|
c6bffd9382 | ||
|
|
857dcb2ef5 | ||
|
|
ef69f98cd6 | ||
|
|
6d5d1cf26b | ||
|
|
7c481796f8 | ||
|
|
7d62b7b7cc | ||
|
|
5a0cf9af7f | ||
|
|
f5e0e67545 | ||
|
|
a4cac624df | ||
|
|
e1eb318b9b | ||
|
|
31834b1e68 | ||
|
|
100ace2e99 | ||
|
|
c343afd20c | ||
|
|
6586c7c01e | ||
|
|
8bb8b715c1 |
16
.github/ISSUE_TEMPLATE/bug_report.yml
vendored
@@ -122,7 +122,21 @@ body:
|
||||
#multiple: false
|
||||
options:
|
||||
-
|
||||
- "2.0.x"
|
||||
- "<2.2.0"
|
||||
- "2.2.x"
|
||||
- ">=2.5"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: dropdown
|
||||
id: backend_name
|
||||
attributes:
|
||||
label: Backend name | 解析后端
|
||||
#multiple: false
|
||||
options:
|
||||
-
|
||||
- "vlm"
|
||||
- "pipeline"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
|
||||
39
README.md
@@ -1,7 +1,7 @@
|
||||
<div align="center" xmlns="http://www.w3.org/1999/html">
|
||||
<!-- logo -->
|
||||
<p align="center">
|
||||
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
</p>
|
||||
|
||||
<!-- icon -->
|
||||
@@ -18,7 +18,8 @@
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
|
||||
@@ -43,6 +44,28 @@
|
||||
</div>
|
||||
|
||||
# Changelog
|
||||
- 2025/10/24 2.6.0 Release
|
||||
- `pipeline` backend optimizations
|
||||
- Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
|
||||
- `OCR` speed significantly improved by 200%~300%, thanks to the optimization solution provided by @cjsdurj
|
||||
- `OCR` models updated to `ppocr-v5` version for Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) languages, with accuracy improved by over 40% compared to previous models
|
||||
- `vlm` backend optimizations
|
||||
- `table_caption` and `table_footnote` matching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page
|
||||
- Optimized CPU resource usage during high concurrency when using `vllm` backend, reducing server pressure
|
||||
- Adapted to `vllm` version 0.11.0
|
||||
- General optimizations
|
||||
- Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios
|
||||
- Added environment variable configuration option `MINERU_TABLE_MERGE_ENABLE` for table merging feature. Table merging is enabled by default and can be disabled by setting this variable to `0`
|
||||
|
||||
- 2025/09/26 2.5.4 released
|
||||
- 🎉🎉 The MinerU2.5 [Technical Report](https://arxiv.org/abs/2509.22186) is now available! We welcome you to read it for a comprehensive overview of its model architecture, training strategy, data engineering and evaluation results.
|
||||
- Fixed an issue where some `PDF` files were mistakenly identified as `AI` files, causing parsing failures
|
||||
|
||||
- 2025/09/20 2.5.3 Released
|
||||
- Dependency version range adjustment to enable Turing and earlier architecture GPUs to use vLLM acceleration for MinerU2.5 model inference.
|
||||
- `pipeline` backend compatibility fixes for torch 2.8.0.
|
||||
- Reduced default concurrency for vLLM async backend to lower server pressure and avoid connection closure issues caused by high load.
|
||||
- More compatibility-related details can be found in the [announcement](https://github.com/opendatalab/MinerU/discussions/3548)
|
||||
|
||||
- 2025/09/19 2.5.2 Released
|
||||
|
||||
@@ -733,6 +756,16 @@ Currently, some models in this project are trained based on YOLO. However, since
|
||||
# Citation
|
||||
|
||||
```bibtex
|
||||
@misc{niu2025mineru25decoupledvisionlanguagemodel,
|
||||
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
|
||||
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
|
||||
year={2025},
|
||||
eprint={2509.22186},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2509.22186},
|
||||
}
|
||||
|
||||
@misc{wang2024mineruopensourcesolutionprecise,
|
||||
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
|
||||
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
|
||||
@@ -771,4 +804,4 @@ Currently, some models in this project are trained based on YOLO. However, since
|
||||
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
|
||||
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
|
||||
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
|
||||
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
|
||||
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
<div align="center" xmlns="http://www.w3.org/1999/html">
|
||||
<!-- logo -->
|
||||
<p align="center">
|
||||
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
</p>
|
||||
|
||||
<!-- icon -->
|
||||
@@ -18,7 +18,8 @@
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
|
||||
@@ -43,6 +44,28 @@
|
||||
</div>
|
||||
|
||||
# 更新记录
|
||||
- 2025/10/24 2.6.0 发布
|
||||
- `pipline`后端优化
|
||||
- 增加对中文公式的实验性支持,可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题,建议仅在需要解析中文公式的场景下开启。如需关闭该功能,可将环境变量设置为`0`。
|
||||
- `OCR`速度大幅提升200%~300%,感谢 @cjsdurj 提供的优化方案
|
||||
- `OCR`模型更新西里尔文(cyrillic)、阿拉伯文(arabic)、天城文(devanagari)、泰卢固语(te)、泰米尔语(ta)语系至`ppocr-v5`版本,精度相比上代模型提升40%以上
|
||||
- `vlm`后端优化
|
||||
- `table_caption`、`table_footnote`匹配逻辑优化,提升页内多张连续表场景下的表格标题和脚注的匹配准确率和阅读顺序合理性
|
||||
- 优化使用`vllm`后端时高并发时的cpu资源占用,降低服务端压力
|
||||
- 适配`vllm`0.11.0版本
|
||||
- 通用优化
|
||||
- 跨页表格合并效果优化,新增跨页续表合并支持,提升在多列合并场景下的表格合并效果
|
||||
- 为表格合并功能增加环境变量配置选项`MINERU_TABLE_MERGE_ENABLE`,表格合并功能默认开启,可通过设置该变量为`0`来关闭表格合并功能
|
||||
|
||||
- 2025/09/26 2.5.4 发布
|
||||
- 🎉🎉 MinerU2.5[技术报告](https://arxiv.org/abs/2509.22186)现已发布,欢迎阅读全面了解其模型架构、训练策略、数据工程和评测结果。
|
||||
- 修复部分`pdf`文件被识别成`ai`文件导致无法解析的问题
|
||||
|
||||
- 2025/09/20 2.5.3 发布
|
||||
- 依赖版本范围调整,使得Turing及更早架构显卡可以使用vLLM加速推理MinerU2.5模型。
|
||||
- `pipeline`后端对torch 2.8.0的一些兼容性修复。
|
||||
- 降低vLLM异步后端默认的并发数,降低服务端压力以避免高压导致的链接关闭问题。
|
||||
- 更多兼容性相关内容详见[公告](https://github.com/opendatalab/MinerU/discussions/3547)
|
||||
|
||||
- 2025/09/19 2.5.2 发布
|
||||
我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数,MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型,并显著领先于主流文档解析专用模型(如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。
|
||||
@@ -719,6 +742,16 @@ mineru -p <input_path> -o <output_path>
|
||||
# Citation
|
||||
|
||||
```bibtex
|
||||
@misc{niu2025mineru25decoupledvisionlanguagemodel,
|
||||
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
|
||||
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
|
||||
year={2025},
|
||||
eprint={2509.22186},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2509.22186},
|
||||
}
|
||||
|
||||
@misc{wang2024mineruopensourcesolutionprecise,
|
||||
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
|
||||
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
|
||||
@@ -757,4 +790,4 @@ mineru -p <input_path> -o <output_path>
|
||||
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
|
||||
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
|
||||
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
|
||||
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
|
||||
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
|
||||
|
||||
@@ -1,9 +1,16 @@
|
||||
# Use DaoCloud mirrored vllm image for China region
|
||||
# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
|
||||
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
|
||||
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use the official vllm image
|
||||
# FROM vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
|
||||
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Use the official vllm image
|
||||
# FROM vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
apt-get install -y \
|
||||
|
||||
@@ -1,6 +1,10 @@
|
||||
# Use the official vllm image
|
||||
# Use the official vllm image for gpu with Ampere architecture and above (Compute Capability>=8.0)
|
||||
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
|
||||
FROM vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use the official vllm image for gpu with Turing architecture and below (Compute Capability<8.0)
|
||||
# FROM vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
apt-get install -y \
|
||||
|
||||
BIN
docs/assets/images/BISHENG_01.png
Normal file
|
After Width: | Height: | Size: 96 KiB |
BIN
docs/assets/images/Cherry_Studio_1.png
Normal file
|
After Width: | Height: | Size: 34 KiB |
BIN
docs/assets/images/Cherry_Studio_2.png
Normal file
|
After Width: | Height: | Size: 51 KiB |
BIN
docs/assets/images/Cherry_Studio_3.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/Cherry_Studio_4.png
Normal file
|
After Width: | Height: | Size: 55 KiB |
BIN
docs/assets/images/Cherry_Studio_5.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
docs/assets/images/Cherry_Studio_6.png
Normal file
|
After Width: | Height: | Size: 75 KiB |
BIN
docs/assets/images/Cherry_Studio_7.png
Normal file
|
After Width: | Height: | Size: 56 KiB |
BIN
docs/assets/images/Cherry_Studio_8.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
docs/assets/images/Coze_1.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
docs/assets/images/Coze_10.png
Normal file
|
After Width: | Height: | Size: 88 KiB |
BIN
docs/assets/images/Coze_11.png
Normal file
|
After Width: | Height: | Size: 76 KiB |
BIN
docs/assets/images/Coze_12.png
Normal file
|
After Width: | Height: | Size: 110 KiB |
BIN
docs/assets/images/Coze_13.png
Normal file
|
After Width: | Height: | Size: 79 KiB |
BIN
docs/assets/images/Coze_14.png
Normal file
|
After Width: | Height: | Size: 104 KiB |
BIN
docs/assets/images/Coze_15.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/Coze_16.png
Normal file
|
After Width: | Height: | Size: 87 KiB |
BIN
docs/assets/images/Coze_17.png
Normal file
|
After Width: | Height: | Size: 201 KiB |
BIN
docs/assets/images/Coze_18.png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/assets/images/Coze_19.png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/assets/images/Coze_2.png
Normal file
|
After Width: | Height: | Size: 53 KiB |
BIN
docs/assets/images/Coze_20.png
Normal file
|
After Width: | Height: | Size: 145 KiB |
BIN
docs/assets/images/Coze_21.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/assets/images/Coze_3.png
Normal file
|
After Width: | Height: | Size: 95 KiB |
BIN
docs/assets/images/Coze_4.png
Normal file
|
After Width: | Height: | Size: 110 KiB |
BIN
docs/assets/images/Coze_5.png
Normal file
|
After Width: | Height: | Size: 102 KiB |
BIN
docs/assets/images/Coze_6.png
Normal file
|
After Width: | Height: | Size: 101 KiB |
BIN
docs/assets/images/Coze_7.png
Normal file
|
After Width: | Height: | Size: 214 KiB |
BIN
docs/assets/images/Coze_8.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
docs/assets/images/Coze_9.png
Normal file
|
After Width: | Height: | Size: 83 KiB |
BIN
docs/assets/images/DataFLow_01.png
Normal file
|
After Width: | Height: | Size: 89 KiB |
BIN
docs/assets/images/DataFlow_02.png
Normal file
|
After Width: | Height: | Size: 147 KiB |
BIN
docs/assets/images/Dify_1.png
Normal file
|
After Width: | Height: | Size: 108 KiB |
BIN
docs/assets/images/Dify_10.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
docs/assets/images/Dify_11.png
Normal file
|
After Width: | Height: | Size: 85 KiB |
BIN
docs/assets/images/Dify_12.png
Normal file
|
After Width: | Height: | Size: 129 KiB |
BIN
docs/assets/images/Dify_13.png
Normal file
|
After Width: | Height: | Size: 35 KiB |
BIN
docs/assets/images/Dify_14.png
Normal file
|
After Width: | Height: | Size: 249 KiB |
BIN
docs/assets/images/Dify_15.png
Normal file
|
After Width: | Height: | Size: 255 KiB |
BIN
docs/assets/images/Dify_16.png
Normal file
|
After Width: | Height: | Size: 107 KiB |
BIN
docs/assets/images/Dify_17.png
Normal file
|
After Width: | Height: | Size: 125 KiB |
BIN
docs/assets/images/Dify_18.png
Normal file
|
After Width: | Height: | Size: 180 KiB |
BIN
docs/assets/images/Dify_19.png
Normal file
|
After Width: | Height: | Size: 105 KiB |
BIN
docs/assets/images/Dify_2.png
Normal file
|
After Width: | Height: | Size: 236 KiB |
BIN
docs/assets/images/Dify_20.png
Normal file
|
After Width: | Height: | Size: 177 KiB |
BIN
docs/assets/images/Dify_21.png
Normal file
|
After Width: | Height: | Size: 77 KiB |
BIN
docs/assets/images/Dify_22.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
BIN
docs/assets/images/Dify_23.png
Normal file
|
After Width: | Height: | Size: 94 KiB |
BIN
docs/assets/images/Dify_24.png
Normal file
|
After Width: | Height: | Size: 133 KiB |
BIN
docs/assets/images/Dify_25.png
Normal file
|
After Width: | Height: | Size: 161 KiB |
BIN
docs/assets/images/Dify_26.png
Normal file
|
After Width: | Height: | Size: 190 KiB |
BIN
docs/assets/images/Dify_3.png
Normal file
|
After Width: | Height: | Size: 263 KiB |
BIN
docs/assets/images/Dify_4.png
Normal file
|
After Width: | Height: | Size: 264 KiB |
BIN
docs/assets/images/Dify_5.png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/assets/images/Dify_6.png
Normal file
|
After Width: | Height: | Size: 286 KiB |
BIN
docs/assets/images/Dify_7.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
docs/assets/images/Dify_8.png
Normal file
|
After Width: | Height: | Size: 136 KiB |
BIN
docs/assets/images/Dify_9.png
Normal file
|
After Width: | Height: | Size: 110 KiB |
BIN
docs/assets/images/DingTalk_01.png
Normal file
|
After Width: | Height: | Size: 133 KiB |
BIN
docs/assets/images/FastGPT_01.png
Normal file
|
After Width: | Height: | Size: 185 KiB |
BIN
docs/assets/images/FastGPT_02.png
Normal file
|
After Width: | Height: | Size: 92 KiB |
BIN
docs/assets/images/ModelWhale_01.png
Normal file
|
After Width: | Height: | Size: 246 KiB |
BIN
docs/assets/images/ModelWhale_02.png
Normal file
|
After Width: | Height: | Size: 71 KiB |
BIN
docs/assets/images/ModelWhale_1.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/RagFlow_01.png
Normal file
|
After Width: | Height: | Size: 500 KiB |
BIN
docs/assets/images/Sider_1.png
Normal file
|
After Width: | Height: | Size: 62 KiB |
BIN
docs/assets/images/coze_0.png
Normal file
|
After Width: | Height: | Size: 92 KiB |
BIN
docs/assets/images/n8n_0.png
Normal file
|
After Width: | Height: | Size: 276 KiB |
BIN
docs/assets/images/n8n_1.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
docs/assets/images/n8n_10.png
Normal file
|
After Width: | Height: | Size: 14 KiB |
BIN
docs/assets/images/n8n_2.png
Normal file
|
After Width: | Height: | Size: 74 KiB |
BIN
docs/assets/images/n8n_3.png
Normal file
|
After Width: | Height: | Size: 71 KiB |
BIN
docs/assets/images/n8n_4.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/n8n_5.png
Normal file
|
After Width: | Height: | Size: 70 KiB |
BIN
docs/assets/images/n8n_6.png
Normal file
|
After Width: | Height: | Size: 63 KiB |
BIN
docs/assets/images/n8n_7.png
Normal file
|
After Width: | Height: | Size: 23 KiB |
BIN
docs/assets/images/n8n_8.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
BIN
docs/assets/images/n8n_9.png
Normal file
|
After Width: | Height: | Size: 89 KiB |
@@ -19,7 +19,8 @@
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
<div align="center">
|
||||
|
||||
@@ -10,7 +10,8 @@ docker build -t mineru-vllm:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
|
||||
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default. This version of vLLM v1 engine has limited support for GPU models.
|
||||
> If you cannot use vLLM accelerated inference on Turing and earlier architecture GPUs, you can resolve this issue by changing the base image to `vllm/vllm-openai:v0.10.2`.
|
||||
|
||||
## Docker Description
|
||||
|
||||
|
||||
@@ -397,10 +397,10 @@ Text levels are distinguished through the `text_level` field:
|
||||
{
|
||||
"type": "image",
|
||||
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
|
||||
"img_caption": [
|
||||
"image_caption": [
|
||||
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
|
||||
],
|
||||
"img_footnote": [],
|
||||
"image_footnote": [],
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
|
||||
@@ -87,6 +87,16 @@ Here are the environment variables and their descriptions:
|
||||
* Used to enable formula parsing
|
||||
* defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
|
||||
|
||||
- `MINERU_TABLE_ENABLE`:
|
||||
- `MINERU_FORMULA_CH_SUPPORT`:
|
||||
* Used to enable Chinese formula parsing optimization (experimental feature)
|
||||
* Default is `false`, can be set to `true` via environment variable to enable Chinese formula parsing optimization.
|
||||
* Only effective for `pipeline` backend.
|
||||
|
||||
- `MINERU_TABLE_ENABLE`:
|
||||
* Used to enable table parsing
|
||||
* defaults to `true`, can be set to `false` through environment variables to disable table parsing.
|
||||
* Default is `true`, can be set to `false` via environment variable to disable table parsing.
|
||||
|
||||
- `MINERU_TABLE_MERGE_ENABLE`:
|
||||
* Used to enable table merging functionality
|
||||
* Default is `true`, can be set to `false` via environment variable to disable table merging functionality.
|
||||
|
||||
|
||||
@@ -52,7 +52,7 @@ If you need to adjust parsing options through custom parameters, you can also ch
|
||||
>[!TIP]
|
||||
>
|
||||
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
|
||||
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
|
||||
|
||||
- Using `http-client/server` method:
|
||||
```bash
|
||||
# Start vllm server (requires vllm environment)
|
||||
|
||||
@@ -19,7 +19,8 @@
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
<div align="center">
|
||||
|
||||
@@ -10,7 +10,8 @@ docker build -t mineru-vllm:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台,
|
||||
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,
|
||||
> 该版本的vLLM v1 engine对显卡型号支持有限,如您无法在Turing及更早架构的显卡上使用vLLM加速推理,可通过更改基础镜像为`vllm/vllm-openai:v0.10.2`来解决该问题。
|
||||
|
||||
## Docker说明
|
||||
|
||||
|
||||
@@ -397,10 +397,10 @@ inference_result: list[PageInferenceResults] = []
|
||||
{
|
||||
"type": "image",
|
||||
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
|
||||
"img_caption": [
|
||||
"image_caption": [
|
||||
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
|
||||
],
|
||||
"img_footnote": [],
|
||||
"image_footnote": [],
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
|
||||
365
docs/zh/usage/acceleration_cards/AMD.md
Normal file
@@ -0,0 +1,365 @@
|
||||
## 基于Triton的ROCm 不同后端实现优化,基本实现vllm后端正常推理,以及pipeline后端中第一步layout用的DocLayout-YOLO
|
||||
|
||||
**已有完整python vllm和mineru环境直接跳转第五步!!!**
|
||||
**其他GPU执行问题可以参考,先prof查看定位找到哪个算子问题,然后triton后端实现即可**
|
||||
测试了一下,基本和MinerU官网效果差不多,用AMD的人也不是很多,就在评论区分享给大家了
|
||||
|
||||
### 1.结果介绍
|
||||
**补充一个200页的PDF python编程书测试一下速度,可以到1.99it/s:**
|
||||
Two Step Extraction: 100%|████████████████████████████████████████| 200/200 [01:40<00:00, 1.99it/s]
|
||||
|
||||
**下面为之前14学术论文测试结果:**
|
||||
7900xtx mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true 速度大概为**1.6-1.8s/it**,没有仔细测试,简单试了两个文档。第二种矩阵乘法代替原来的dots点乘可以进一步提速到1.3s/it,优化后的主要算子耗时在hipblast(这个没法提升了)和vllm triton后端,各占25%耗时吧,vllm tirion后端这个这个只能等官方优化了。。。。
|
||||
doclayout-yolo的layout速度从原来的1.6it/s提高到15it/s,注意需要缓存一下输入的pdf尺寸后,triton必须要缓存尺寸没办法。主要是为了保留模型输入输出接口,最小代码改动。
|
||||
采用-b vlm-vllm-engine模式举个例子
|
||||
|
||||
---
|
||||
**测试结果为优化为5d矩阵乘代替原来的点积结果:**
|
||||
2025-10-05 15:45:12.985 | INFO | mineru.backend.vlm.vlm_analyze:get_model:128 - get vllm-engine predictor cost: 18.45s
|
||||
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████| 14/14 [00:01<00:00, 12.20it/s]
|
||||
Processed prompts: 100%|█████████████████████| 14/14 [00:08<00:00, 1.56it/s, est. speed input: 2174.18 toks/s, output: 791.87 toks/s]
|
||||
Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████| 278/278 [00:00<00:00, 323.03it/s]
|
||||
Processed prompts: 100%|██████████████████| 278/278 [00:07<00:00, 37.63it/s, est. speed input: 5264.66 toks/s, output: 2733.31 toks/s]
|
||||
|
||||
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true测试:
|
||||
2025-10-05 15:46:55.953 | WARNING | mineru.cli.common:convert_pdf_bytes_to_bytes_by_pypdfium2:54 - end_page_id is out of range, use pdf_docs length
|
||||
Two Step Extraction: 100%|████████████████████████████████████████████████████████████████████████████| 14/14 [00:18<00:00, 1.30s/it]
|
||||
|
||||
---
|
||||
|
||||
### 2.原因介绍
|
||||
AMD RDNA使用vllm后端有严重的性能问题,原因是因为vllm的**qwen2_vl.py**中有一个算子在rocm kernel上没有对应的实现,导致性能出现严重的卷积计算回退,一次执行花了12s,。。。。。。。。一言难尽。即**MIOpen 库中缺少模型中特定 Conv3d(bfloat16) 的优化内核**。
|
||||
DocLayout-YOLO的**g2l_crm.py**空洞卷积也是这个问题,专业的CDNA MI210也没解决这个问题
|
||||
正好一起处理了。
|
||||
|
||||
---
|
||||
|
||||
### 3.环境介绍
|
||||
System: Ubuntu 24.04.3 Kernel: Linux 6.14.0-33-generic ROCm version: 7.0.1
|
||||
python环境:
|
||||
python 3.12
|
||||
pytorch-triton-rocm 3.5.0+gitbbb06c03
|
||||
torch 2.10.0.dev20251001+rocm7.0
|
||||
torchvision 0.25.0.dev20251003+rocm7.0
|
||||
vllm 0.11.0rc2.dev198+g736fbf4c8.rocm701
|
||||
不同版本无所谓,处理方法是一样的。
|
||||
|
||||
---
|
||||
|
||||
### 4.前置环境安装
|
||||
```
|
||||
uv venv --python python3.12
|
||||
source .venv/bin/activate
|
||||
uv pip install --pre torch torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple/ --extra-index-url https://download.pytorch.org/whl/nightly/rocm7.0
|
||||
uv pip install pip
|
||||
# 避免覆盖我们本地的pytorch,改用pip而没有继续使用uv pip
|
||||
pip install -U "mineru[core]" -i https://pypi.mirrors.ustc.edu.cn/simple/
|
||||
```
|
||||
vllm 安装参考官方手册[Vllm](https://docs.vllm.com.cn/en/latest/getting_started/installation/gpu.html#amd-rocm)
|
||||
```
|
||||
#手动安装aiter,vllm,amd-smi等,自行找一个位置clone,然后进入该目录吧
|
||||
git clone --recursive https://github.com/ROCm/aiter.git
|
||||
cd aiter
|
||||
git submodule sync; git submodule update --init --recursive
|
||||
python setup.py develop
|
||||
cd ..
|
||||
git clone https://github.com/vllm-project/vllm.git
|
||||
cd vllm/
|
||||
cp -r /opt/rocm/share/amd_smi ~/Pytorch/vllm/
|
||||
pip install amd_smi/
|
||||
pip install --upgrade numba \
|
||||
scipy \
|
||||
huggingface-hub[cli,hf_transfer] \
|
||||
setuptools_scm
|
||||
pip install -r requirements/rocm.txt
|
||||
export PYTORCH_ROCM_ARCH="gfx1100" #根据自己的GPU架构 rocminfo | grep gfx
|
||||
python setup.py develop
|
||||
```
|
||||
---
|
||||
|
||||
### 5.vllm中关键triton算子添加
|
||||
#### 这里我给出两种解决方法,第一种解决方法就是前面提到的优化到1.5到1.8s/it,第二种方法有手动优化算子到矩阵乘法,7900xtx肯定适用,大概1.3s/it,其他AMD GPU相对方案一也有提速,但是不一定是最佳速度实现,里面的手动部分可能需要微调。
|
||||
**注意pip把triton 后端的flash_attn卸载了,搞了半天各种尝试还是报错,问题比较大,直接不用就行了**
|
||||
```
|
||||
#定位自己vllm位置XXX
|
||||
pip show vllm
|
||||
```
|
||||
**关键更改**
|
||||
XXX/vllm/model_executor/models/qwen2_vl.py文件:
|
||||
**1.qwen2_vl.py文件33行下增加from .qwen2_vl_vision_kernels import triton_conv3d_patchify**
|
||||
```
|
||||
from collections.abc import Iterable, Mapping, Sequence
|
||||
from functools import partial
|
||||
from typing import Annotated, Any, Callable, Literal, Optional, Union
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
from .qwen2_vl_vision_kernels import triton_conv3d_patchify
|
||||
```
|
||||
**接下来分为方案一(2.1和3.1)和方案二(2.2和3.2),选取一种实现即可**
|
||||
|
||||
---
|
||||
**方案1**
|
||||
**2.1qwen2_vl.py文件498行class Qwen2VisionPatchEmbed(nn.Module),PS.就是这玩意AMD没有现成的内核算子导致回退**
|
||||
```
|
||||
class Qwen2VisionPatchEmbed(nn.Module):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
patch_size: int = 14,
|
||||
temporal_patch_size: int = 2,
|
||||
in_channels: int = 3,
|
||||
embed_dim: int = 1152,
|
||||
) -> None:
|
||||
super().__init__()
|
||||
self.patch_size = patch_size
|
||||
self.temporal_patch_size = temporal_patch_size
|
||||
self.embed_dim = embed_dim
|
||||
|
||||
kernel_size = (temporal_patch_size, patch_size, patch_size)
|
||||
self.proj = nn.Conv3d(in_channels,
|
||||
embed_dim,
|
||||
kernel_size=kernel_size,
|
||||
stride=kernel_size,
|
||||
bias=False)
|
||||
def forward(self, x: torch.Tensor) -> torch.Tensor:
|
||||
L, C = x.shape
|
||||
x_reshaped = x.view(L, -1, self.temporal_patch_size, self.patch_size,
|
||||
self.patch_size)
|
||||
|
||||
# Call your custom Triton kernel instead of self.proj
|
||||
x_out = triton_conv3d_patchify(x_reshaped, self.proj.weight)
|
||||
|
||||
# The output of our kernel is already the correct shape [L, embed_dim]
|
||||
return x_out
|
||||
```
|
||||
**3.1XXX/vllm/model_executor/models/目录下创建qwen2_vl_vision_kernels.py文件,用triton实现**
|
||||
```
|
||||
import torch
|
||||
from vllm.triton_utils import tl, triton
|
||||
|
||||
@triton.jit
|
||||
def _conv3d_patchify_kernel(
|
||||
# Pointers to tensors
|
||||
X, W, Y,
|
||||
# Tensor dimensions
|
||||
N, C_in, D_in, H_in, W_in,
|
||||
C_out, KD, KH, KW,
|
||||
# Stride and padding for memory access
|
||||
stride_xn, stride_xc, stride_xd, stride_xh, stride_xw,
|
||||
stride_wn, stride_wc, stride_wd, stride_wh, stride_ww,
|
||||
stride_yn, stride_yc,
|
||||
# Triton-specific metaparameters
|
||||
BLOCK_SIZE: tl.constexpr,
|
||||
):
|
||||
"""
|
||||
Triton kernel for a non-overlapping 3D patching convolution.
|
||||
Each kernel instance computes one output value for one patch.
|
||||
"""
|
||||
# Get the program IDs for the N (patch) and C_out (output channel) dimensions
|
||||
pid_n = tl.program_id(0) # The index of the patch we are processing
|
||||
pid_cout = tl.program_id(1) # The index of the output channel we are computing
|
||||
|
||||
# --- Calculate memory pointers ---
|
||||
# Pointer to the start of the current input patch
|
||||
x_ptr = X + (pid_n * stride_xn)
|
||||
# Pointer to the start of the current filter (weight)
|
||||
w_ptr = W + (pid_cout * stride_wn)
|
||||
# Pointer to where the output will be stored
|
||||
y_ptr = Y + (pid_n * stride_yn + pid_cout * stride_yc)
|
||||
|
||||
# --- Perform the convolution (element-wise product and sum) ---
|
||||
# This is a dot product between the flattened patch and the flattened filter.
|
||||
accumulator = tl.zeros((BLOCK_SIZE,), dtype=tl.float32)
|
||||
|
||||
# Iterate over the elements of the patch/filter
|
||||
for c_offset in range(0, C_in):
|
||||
for d_offset in range(0, KD):
|
||||
for h_offset in range(0, KH):
|
||||
# Unrolled loop for the innermost dimension (width) for performance
|
||||
for w_offset in range(0, KW, BLOCK_SIZE):
|
||||
# Create masks to handle cases where KW is not a multiple of BLOCK_SIZE
|
||||
w_range = w_offset + tl.arange(0, BLOCK_SIZE)
|
||||
w_mask = w_range < KW
|
||||
|
||||
# Calculate offsets to load data
|
||||
patch_offset = (c_offset * stride_xc + d_offset * stride_xd +
|
||||
h_offset * stride_xh + w_range * stride_xw)
|
||||
filter_offset = (c_offset * stride_wc + d_offset * stride_wd +
|
||||
h_offset * stride_wh + w_range * stride_ww)
|
||||
|
||||
# Load patch and filter data, applying masks
|
||||
patch_vals = tl.load(x_ptr + patch_offset, mask=w_mask, other=0.0)
|
||||
filter_vals = tl.load(w_ptr + filter_offset, mask=w_mask, other=0.0)
|
||||
|
||||
# Multiply and accumulate
|
||||
accumulator += patch_vals.to(tl.float32) * filter_vals.to(tl.float32)
|
||||
|
||||
# Sum the accumulator block and store the single output value
|
||||
output_val = tl.sum(accumulator, axis=0)
|
||||
tl.store(y_ptr, output_val)
|
||||
|
||||
|
||||
def triton_conv3d_patchify(x: torch.Tensor, weight: torch.Tensor) -> torch.Tensor:
|
||||
"""
|
||||
Python wrapper for the 3D patching convolution Triton kernel.
|
||||
"""
|
||||
# Get tensor dimensions
|
||||
N, C_in, D_in, H_in, W_in = x.shape
|
||||
C_out, _, KD, KH, KW = weight.shape
|
||||
|
||||
# Create the output tensor
|
||||
# The output of this specific conv is (N, C_out, 1, 1, 1), which we squeeze
|
||||
Y = torch.empty((N, C_out), dtype=x.dtype, device=x.device)
|
||||
|
||||
# Define the grid for launching the Triton kernel
|
||||
# Each kernel instance handles one patch (N) for one output channel (C_out)
|
||||
grid = (N, C_out)
|
||||
|
||||
# Launch the kernel
|
||||
# We pass all strides to make the kernel flexible
|
||||
_conv3d_patchify_kernel[grid](
|
||||
x, weight, Y,
|
||||
N, C_in, D_in, H_in, W_in,
|
||||
C_out, KD, KH, KW,
|
||||
x.stride(0), x.stride(1), x.stride(2), x.stride(3), x.stride(4),
|
||||
weight.stride(0), weight.stride(1), weight.stride(2), weight.stride(3), weight.stride(4),
|
||||
Y.stride(0), Y.stride(1),
|
||||
BLOCK_SIZE=16, # A reasonable default, can be tuned
|
||||
)
|
||||
|
||||
return Y
|
||||
```
|
||||
---
|
||||
**方案2**
|
||||
**2.2qwen2_vl.py文件498行class Qwen2VisionPatchEmbed(nn.Module)函数,PS.就是这玩意AMD没有现成的内核算子导致回退,这里我们直接5D张量一步到位,改为矩阵乘法**
|
||||
```
|
||||
class Qwen2VisionPatchEmbed(nn.Module):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
patch_size: int = 14,
|
||||
temporal_patch_size: int = 2,
|
||||
in_channels: int = 3,
|
||||
embed_dim: int = 1152,
|
||||
) -> None:
|
||||
super().__init__()
|
||||
self.patch_size = patch_size
|
||||
self.temporal_patch_size = temporal_patch_size
|
||||
self.embed_dim = embed_dim
|
||||
|
||||
kernel_size = (temporal_patch_size, patch_size, patch_size)
|
||||
|
||||
self.proj = nn.Conv3d(in_channels,
|
||||
embed_dim,
|
||||
kernel_size=kernel_size,
|
||||
stride=kernel_size,
|
||||
bias=False)
|
||||
|
||||
def forward(self, x: torch.Tensor) -> torch.Tensor:
|
||||
L, C = x.shape
|
||||
x_reshaped_5d = x.view(L, -1, self.temporal_patch_size, self.patch_size,
|
||||
self.patch_size)
|
||||
|
||||
return triton_conv3d_patchify(x_reshaped_5d, self.proj.weight)
|
||||
```
|
||||
**3.2XXX/vllm/model_executor/models/目录下创建qwen2_vl_vision_kernels.py文件,用triton实现**
|
||||
```
|
||||
import torch
|
||||
from vllm.triton_utils import tl, triton
|
||||
|
||||
@triton.jit
|
||||
def _conv_gemm_kernel(
|
||||
A, B, C, M, N, K,
|
||||
stride_am, stride_ak,
|
||||
stride_bk, stride_bn,
|
||||
stride_cm, stride_cn,
|
||||
BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr, BLOCK_K: tl.constexpr,
|
||||
):
|
||||
pid_m = tl.program_id(0)
|
||||
pid_n = tl.program_id(1)
|
||||
offs_m = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
|
||||
offs_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
|
||||
offs_k = tl.arange(0, BLOCK_K)
|
||||
a_ptrs = A + (offs_m[:, None] * stride_am + offs_k[None, :] * stride_ak)
|
||||
b_ptrs = B + (offs_k[:, None] * stride_bk + offs_n[None, :] * stride_bn)
|
||||
accumulator = tl.zeros((BLOCK_M, BLOCK_N), dtype=tl.float32)
|
||||
for k in range(0, K, BLOCK_K):
|
||||
a = tl.load(a_ptrs, mask=(offs_m[:, None] < M) & (offs_k[None, :] < K), other=0.0)
|
||||
b = tl.load(b_ptrs, mask=(offs_k[:, None] < K) & (offs_n[None, :] < N), other=0.0)
|
||||
accumulator += tl.dot(a, b)
|
||||
a_ptrs += BLOCK_K * stride_ak
|
||||
b_ptrs += BLOCK_K * stride_bk
|
||||
offs_k += BLOCK_K
|
||||
c = accumulator.to(C.dtype.element_ty)
|
||||
offs_cm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
|
||||
offs_cn = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
|
||||
c_ptrs = C + stride_cm * offs_cm[:, None] + stride_cn * offs_cn[None, :]
|
||||
c_mask = (offs_cm[:, None] < M) & (offs_cn[None, :] < N)
|
||||
tl.store(c_ptrs, c, mask=c_mask)
|
||||
|
||||
def triton_conv3d_patchify(x_5d: torch.Tensor, weight_5d: torch.Tensor) -> torch.Tensor:
|
||||
N_patches, _, _, _, _ = x_5d.shape
|
||||
C_out, _, _, _, _ = weight_5d.shape
|
||||
A = x_5d.view(N_patches, -1)
|
||||
B = weight_5d.view(C_out, -1).transpose(0, 1).contiguous()
|
||||
M, K = A.shape
|
||||
_K, N = B.shape
|
||||
assert K == _K
|
||||
C = torch.empty((M, N), device=A.device, dtype=A.dtype)
|
||||
|
||||
# --- 针对7900xtx的手动调优配置,其他GPU的最优组合可能需要自行寻找,AMD的autotune效果就是没有效果 ---
|
||||
best_config = {
|
||||
'BLOCK_M': 128,
|
||||
'BLOCK_N': 128,
|
||||
'BLOCK_K': 32,
|
||||
}
|
||||
num_stages = 4
|
||||
num_warps = 8
|
||||
|
||||
grid = (triton.cdiv(M, best_config['BLOCK_M']),
|
||||
triton.cdiv(N, best_config['BLOCK_N']))
|
||||
|
||||
_conv_gemm_kernel[grid](
|
||||
A, B, C,
|
||||
M, N, K,
|
||||
A.stride(0), A.stride(1),
|
||||
B.stride(0), B.stride(1),
|
||||
C.stride(0), C.stride(1),
|
||||
**best_config,
|
||||
num_stages=num_stages,
|
||||
num_warps=num_warps
|
||||
)
|
||||
|
||||
return C
|
||||
```
|
||||
---
|
||||
**4.关闭终端后再次使用mineru-gradio会报一个Lora错误,修改代码跳过它**
|
||||
```
|
||||
pip show mineru_vl_utils
|
||||
```
|
||||
|
||||
打开该文件XXX/mineru_vl_utils/vlm_client/vllm_async_engine_client.py修改第58行self.tokenizer = vllm_async_llm.tokenizer.get_lora_tokenizer()为:
|
||||
```
|
||||
try:
|
||||
self.tokenizer = vllm_async_llm.tokenizer.get_lora_tokenizer()
|
||||
except AttributeError:
|
||||
# 如果没有 get_lora_tokenizer 方法,直接使用原始 tokenizer
|
||||
self.tokenizer = vllm_async_llm.tokenizer
|
||||
```
|
||||
|
||||
**最后整两个环境变量后愉快玩耍即可**
|
||||
```
|
||||
export MINERU_MODEL_SOURCE=modelscope
|
||||
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
|
||||
```
|
||||
---
|
||||
|
||||
### 6.vllm后端已经没有问题,下面是pipeline 中layout用的doclayout-yolo模型空洞卷积问题
|
||||
### 我在 [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO/issues/120#issuecomment-3368144275) 下做了一个回答,因此 pipeline 的空洞卷积问题不在这里赘述,直接点击链接查看即可。
|
||||
查看自己doclayout-yolo安装位置如下,然后进入修改链接中回复介绍的文件即可
|
||||
```
|
||||
pip show doclayout-yolo
|
||||
```
|
||||
|
||||
64
docs/zh/usage/acceleration_cards/Ascend.md
Normal file
@@ -0,0 +1,64 @@
|
||||
#### 1 系统
|
||||
NAME="Ubuntu"
|
||||
VERSION="20.04.6 LTS (Focal Fossa)"
|
||||
昇腾910B2
|
||||
驱动 23.0.6.2
|
||||
CANN 7.5.X
|
||||
Miner U 2.1.9
|
||||
#### 2 踩坑记录
|
||||
坑1: **图形库相关的问题,总之就是动态库导致TLS的内存分配失败(OpenCV库在ARM64架构上的兼容性问题)**
|
||||
⭐这个错误 ImportError: /lib/aarch64-linux-gnu/libGLdispatch.so.0: cannot allocate memory in static TLS block 是由于OpenCV库在ARM64架构上的兼容性问题导致的。从错误堆栈可以看到,问题出现在导入cv2模块时,这发生在MinerU的VLM后端初始化过程中。
|
||||
解决方法:
|
||||
1 安装减少内存问题的opencv版本
|
||||
```
|
||||
pip install --upgrade albumentations albucore simsimd# Uninstall current opencv
|
||||
pip uninstall opencv-python opencv-contrib-python
|
||||
|
||||
# Install headless version (no GUI dependencies)
|
||||
pip install opencv-python-headless
|
||||
|
||||
python -c "import cv2; print(cv2.__version__)"2 apt-get install一些包
|
||||
```
|
||||
换成清华源然后重命名为sources.list.tuna,然后挪到根目录下面
|
||||
```
|
||||
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal main restricted universe multiverse
|
||||
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-updates main restricted universe multiverse
|
||||
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-backports main restricted universe multiverse
|
||||
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-security main restricted universe multiversesudo apt-get update -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
|
||||
sudo apt-get install libgl1-mesa-glx -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
|
||||
sudo apt-get install libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1 -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
|
||||
sudo apt-get install libgl1-mesa-dev libgles2-mesa-dev -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
|
||||
sudo apt-get install libgomp1 -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
|
||||
export OPENCV_IO_ENABLE_OPENEXR=0 export QT_QPA_PLATFORM=offscreen
|
||||
```
|
||||
↑这些不知道哪些好使,或者有没有好使的
|
||||
|
||||
3 强制覆盖conda环境自带的动态库(conda的和系统的冲突)
|
||||
```
|
||||
查找:find /usr/lib /lib /root/.local/conda -name "libgomp.so*" 2>/dev/null
|
||||
export LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libstdc++.so.6:/usr/lib/aarch64-linux-gnu/libgomp.so.1"
|
||||
export LD_PRELOAD=/lib/aarch64-linux-gnu/libGLdispatch.so.0:$LD_PRELOAD
|
||||
```
|
||||
此外,还可以把conda环境中自带的的强制挪走
|
||||
```
|
||||
mv $CONDA_PREFIX/lib/libstdc++.so.6 $CONDA_PREFIX/lib/libstdc++.so.6.bak
|
||||
mv $CONDA_PREFIX/lib/libgomp.so.1 $CONDA_PREFIX/lib/libgomp.so.1.bak
|
||||
mv $CONDA_PREFIX/lib/libGLdispatch.so.0 $CONDA_PREFIX/lib/libGLdispatch.so.0.bak # 如果有的话
|
||||
simsimd包相关:
|
||||
mv /root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/simsimd./libgomp-947d5fa1.so.1.0.0 /root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/simsimd./libgomp-947d5fa1.so.1.0.0.bak
|
||||
```
|
||||
或者:
|
||||
降级simsimd 3.7.2
|
||||
降级albumentations 1.3.1
|
||||
sklean包相关:
|
||||
```
|
||||
# 找到 scikit-learn 内部的 libgomp 路径
|
||||
SKLEARN_LIBGOMP="/root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/scikit_learn.libs/libgomp-947d5fa1.so.1.0.0"
|
||||
|
||||
# 预加载这个特定的 libgomp 版本
|
||||
export LD_PRELOAD="$SKLEARN_LIBGOMP:$LD_PRELOAD"
|
||||
```
|
||||
4 其他
|
||||
torch / torch_npu 2.5.1
|
||||
pip install "numpy<2.0" 2.0和昇腾不兼容
|
||||
export MINERU_MODEL_SOURCE=modelscope
|
||||
117
docs/zh/usage/acceleration_cards/METAX.md
Normal file
@@ -0,0 +1,117 @@
|
||||
## 在C500+MACA上部署并使用Mineru
|
||||
|
||||
### 获取MACA镜像,包含torch-maca,maca,sglang-maca
|
||||
|
||||
镜像获取地址:https://developer.metax-tech.com/softnova/docker ,
|
||||
选择maca-c500-pytorch:2.33.0.6-ubuntu22.04-amd64
|
||||
|
||||
若在docker上部署镜像则需要启动GPU设备访问
|
||||
```bash
|
||||
docker run --device=/dev/dri --device=/dev/mxcd....
|
||||
```
|
||||
|
||||
#### 注意事项
|
||||
|
||||
由于此镜像默认开启TORCH_ALLOW_TF32_CUBLAS_OVERRIDE,会导致backed:vlm-transformers推理结果错误
|
||||
|
||||
```bash
|
||||
unset TORCH_ALLOW_TF32_CUBLAS_OVERRIDE
|
||||
```
|
||||
|
||||
### 安装MinerU
|
||||
|
||||
使用--no-deps,去除对一些cuda版本包的依赖,后续采用pip install-r requirements.txt 安装其他依赖
|
||||
```bash
|
||||
pip install -U "mineru[core]" --no-deps
|
||||
```
|
||||
|
||||
```tex
|
||||
boto3>=1.28.43
|
||||
click>=8.1.7
|
||||
loguru>=0.7.2
|
||||
numpy==1.26.4
|
||||
pdfminer.six==20250506
|
||||
tqdm>=4.67.1
|
||||
requests
|
||||
httpx
|
||||
pillow>=11.0.0
|
||||
pypdfium2>=4.30.0
|
||||
pypdf>=5.6.0
|
||||
reportlab
|
||||
pdftext>=0.6.2
|
||||
modelscope>=1.26.0
|
||||
huggingface-hub>=0.32.4
|
||||
json-repair>=0.46.2
|
||||
opencv-python>=4.11.0.86
|
||||
fast-langdetect>=0.2.3,<0.3.0
|
||||
transformers>=4.51.1
|
||||
accelerate>=1.5.1
|
||||
pydantic
|
||||
matplotlib>=3.10,<4
|
||||
ultralytics>=8.3.48,<9
|
||||
dill>=0.3.8,<1
|
||||
rapid_table>=1.0.5,<2.0.0
|
||||
PyYAML>=6.0.2,<7
|
||||
ftfy>=6.3.1,<7
|
||||
openai>=1.70.0,<2
|
||||
shapely>=2.0.7,<3
|
||||
pyclipper>=1.3.0,<2
|
||||
omegaconf>=2.3.0,<3
|
||||
transformers>=4.49.0,!=4.51.0,<5.0.0
|
||||
fastapi
|
||||
python-multipart
|
||||
uvicorn
|
||||
gradio>=5.34,<6
|
||||
gradio-pdf>=0.0.22
|
||||
albumentations
|
||||
beautifulsoup4
|
||||
scikit-image==0.25.0
|
||||
outlines==0.1.11
|
||||
magika>=0.6.2,<0.7.0
|
||||
mineru-vl-utils>=0.1.6,<1
|
||||
```
|
||||
上述内容保存为requirments.txt,进行安装
|
||||
```bash
|
||||
pip install -r requirments.txt
|
||||
```
|
||||
安装doclayout_yolo,这里doclayout_yolo会依赖torch-cuda,使用--no-deps
|
||||
```bash
|
||||
pip install doclayout-yolo --no-deps
|
||||
```
|
||||
### 在线使用
|
||||
**基础使用命令为:mineru -p <input_path> -o <output_path> -b vlm-transformers**
|
||||
|
||||
- `<input_path>`: Local PDF/image file or directory
|
||||
- `<output_path>`: Output directory
|
||||
- -b --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client] (default:pipeline)<br/>
|
||||
|
||||
其他详细使用命令可参考官方文档[Quick Usage - MinerU](https://opendatalab.github.io/MinerU/usage/quick_usage/#quick-model-source-configuration)
|
||||
|
||||
### 离线使用
|
||||
|
||||
**所用模型为本地模型,需要设置环境变量和config配置文件**<br/>
|
||||
#### 下载模型到本地
|
||||
通过mineru交互式命令行工具进行下载,下载完后会自动更新mineru.json配置文件
|
||||
```bash
|
||||
mineru-models-download
|
||||
```
|
||||
也可以在[HuggingFace](http://www.huggingface.co.)或[ModelScope](https://www.modelscope.cn/home)找到所需模型源(PDF-Extract-Kit-1.0和MinerU2.5-2509-1.2B)进行下载,
|
||||
下载完成后,创建mineru.json文件,按如下进行修改
|
||||
```json
|
||||
{
|
||||
"models-dir": {
|
||||
"pipeline": "/path/pdf-extract-kit-1.0/",
|
||||
"vlm": "/path/MinerU2.5-2509-1.2B"
|
||||
},
|
||||
"config_version": "1.3.0"
|
||||
}
|
||||
```
|
||||
path为本地模型的存储路径,其中models-dir为本地模型的路径,pipeline代表backend为pipeline时,所需要的模型路径,vlm代表backend为vlm-开头,所需要的模型路径
|
||||
|
||||
#### 修改环境变量
|
||||
|
||||
```bash
|
||||
export MINERU_MODEL_SOURCE=local
|
||||
export MINERU_TOOLS_CONFIG_JSON=/path/mineru.json //此环境变量为配置文件的路径
|
||||
```
|
||||
修改完成后即可正常使用<br/>
|
||||
73
docs/zh/usage/acceleration_cards/Tecorigin.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# TECO适配
|
||||
|
||||
## 快速开始
|
||||
使用本工具执行推理的主要流程如下:
|
||||
1. 基础环境安装:介绍推理前需要完成的基础环境检查和安装。
|
||||
3. 构建Docker环境:介绍如何使用Dockerfile创建模型推理时所需的Docker环境。
|
||||
4. 启动推理:介绍如何启动推理。
|
||||
|
||||
### 1 基础环境安装
|
||||
请参考[Teco用户手册的安装准备章节](http://docs.tecorigin.com/release/torch_2.4/v2.2.0/#fc980a30f1125aa88bad4246ff0cedcc),完成训练前的基础环境检查和安装。
|
||||
|
||||
### 2 构建docker
|
||||
#### 2.1 执行以下命令,下载Docker镜像至本地(Docker镜像包:pytorch-3.0.0-torch_sdaa3.0.0.tar)
|
||||
|
||||
wget 镜像下载链接(链接获取请联系太初内部人员)
|
||||
|
||||
#### 2.2 校验Docker镜像包,执行以下命令,生成MD5码是否与官方MD5码b2a7f60508c0d199a99b8b6b35da3954一致:
|
||||
|
||||
md5sum pytorch-3.0.0-torch_sdaa3.0.0.tar
|
||||
|
||||
#### 2.3 执行以下命令,导入Docker镜像
|
||||
|
||||
docker load < pytorch-3.0.0-torch_sdaa3.0.0.tar
|
||||
|
||||
#### 2.4 执行以下命令,构建名为MinerU的Docker容器
|
||||
|
||||
docker run -itd --name="MinerU" --net=host --device=/dev/tcaicard0 --device=/dev/tcaicard1 --device=/dev/tcaicard2 --device=/dev/tcaicard3 --cap-add SYS_PTRACE --cap-add SYS_ADMIN --shm-size 64g jfrog.tecorigin.net/tecotp-docker/release/ubuntu22.04/x86_64/pytorch:3.0.0-torch_sdaa3.0.0 /bin/bash
|
||||
|
||||
#### 2.5 执行以下命令,进入名称为tecopytorch_docker的Docker容器。
|
||||
|
||||
docker exec -it MinerU bash
|
||||
|
||||
|
||||
### 3 执行以下命令安装MinerU
|
||||
- 安装前的准备
|
||||
```
|
||||
cd <MinerU>
|
||||
pip install --upgrade pip
|
||||
pip install uv
|
||||
```
|
||||
- 由于镜像中安装了torch,并且不需要安装nvidia-nccl-cu12、nvidia-cudnn-cu12等包,因此需要注释掉一部分安装依赖。
|
||||
- 请注释掉<MinerU>/pyproject.toml文件中所有的"doclayout_yolo==0.0.4"依赖,并且将torch开头的包也注释掉。
|
||||
- 执行以下命令安装MinerU
|
||||
```
|
||||
uv pip install -e .[core]
|
||||
```
|
||||
- 下载安装doclayout_yolo==0.0.4
|
||||
```
|
||||
pip install doclayout_yolo==0.0.4 --no-deps
|
||||
```
|
||||
- 下载安装其他包(doclayout_yolo==0.0.4的依赖)
|
||||
```
|
||||
pip install albumentations py-cpuinfo seaborn thop numpy==1.24.4
|
||||
```
|
||||
- 由于部分张量内部内存分布不连续,需要修改如下两个文件
|
||||
<ultralytics安装路径>/ultralytics/utils/tal.py(330行左右,将view --> reshape)
|
||||
<doclayout_yolo安装路径>/doclayout_yolo/utils/tal.py(375行左右,将view --> reshape)
|
||||
### 4 执行推理
|
||||
- 开启sdaa环境
|
||||
```
|
||||
export TORCH_SDAA_AUTOLOAD=cuda_migrate
|
||||
```
|
||||
- 首次运行推理命令前请添加以下环境下载模型权重
|
||||
```
|
||||
export HF_ENDPOINT=https://hf-mirror.com
|
||||
```
|
||||
- 运行以下命令执行推理
|
||||
```
|
||||
mineru -p 'input path' -o 'output_path' --lang 'model_name'
|
||||
```
|
||||
其中model_name可从'ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka', 'latin', 'arabic', 'east_slavic', 'cyrillic', 'devanagari'选择
|
||||
### 5 适配用到的软件栈版本列表
|
||||
使用v3.0.0软件栈版本适配,获取方式联系太初内部人员
|
||||
@@ -81,7 +81,16 @@ MinerU命令行工具的某些参数存在相同功能的环境变量配置,
|
||||
- `MINERU_FORMULA_ENABLE`:
|
||||
* 用于启用公式解析
|
||||
* 默认为`true`,可通过环境变量设置为`false`来禁用公式解析。
|
||||
|
||||
- `MINERU_FORMULA_CH_SUPPORT`:
|
||||
* 用于启用中文公式解析优化(实验性功能)
|
||||
* 默认为`false`,可通过环境变量设置为`true`来启用中文公式解析优化。
|
||||
* 仅对`pipeline`后端生效。
|
||||
|
||||
- `MINERU_TABLE_ENABLE`:
|
||||
* 用于启用表格解析
|
||||
* 默认为`true`,可通过环境变量设置为`false`来禁用表格解析。
|
||||
|
||||
- `MINERU_TABLE_MERGE_ENABLE`:
|
||||
* 用于启用表格合并功能
|
||||
* 默认为`true`,可通过环境变量设置为`false`来禁用表格合并功能。
|
||||
|
||||
@@ -3,11 +3,28 @@
|
||||
本章节提供了项目的完整使用说明。我们将通过以下几个部分,帮助您从基础到进阶逐步掌握项目的使用方法:
|
||||
|
||||
## 目录
|
||||
|
||||
- [快速使用](./quick_usage.md) - 快速上手和基本使用
|
||||
- [模型源配置](./model_source.md) - 模型源的详细配置说明
|
||||
- [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
|
||||
- [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
|
||||
- 本地部署
|
||||
* [快速使用](./quick_usage.md) - 快速上手和基本使用
|
||||
* [模型源配置](./model_source.md) - 模型源的详细配置说明
|
||||
* [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
|
||||
* [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
|
||||
- 插件与生态
|
||||
* [Cherry Studio](plugin/Cherry_Studio.md)
|
||||
* [Sider](plugin/Sider.md)
|
||||
* [Dify](plugin/Dify.md)
|
||||
* [n8n](plugin/n8n.md)
|
||||
* [Coze](plugin/Coze.md)
|
||||
* [FastGPT](plugin/FastGPT.md)
|
||||
* [ModelWhale](plugin/ModelWhale.md)
|
||||
* [DingTalk](plugin/DingTalk.md)
|
||||
* [DataFlow](plugin/DataFlow.md)
|
||||
* [BISHENG](plugin/BISHENG.md)
|
||||
* [RagFlow](plugin/RagFlow.md)
|
||||
- 其他加速卡适配(由社区贡献)
|
||||
* [昇腾 Ascend](acceleration_cards/Ascend.md) [#3233](https://github.com/opendatalab/MinerU/discussions/3233)
|
||||
* [沐曦 METAX](acceleration_cards/METAX.md) [#3477](https://github.com/opendatalab/MinerU/pull/3477)
|
||||
* [AMD](acceleration_cards/AMD.md) [#3662](https://github.com/opendatalab/MinerU/discussions/3662)
|
||||
* [太初元碁 Tecorigin](acceleration_cards/Tecorigin.md) [#3767](https://github.com/opendatalab/MinerU/pull/3767)
|
||||
|
||||
## 开始使用
|
||||
|
||||
|
||||
11
docs/zh/usage/plugin/BISHENG.md
Normal file
@@ -0,0 +1,11 @@
|
||||
# BISHENG 简介
|
||||
|
||||
BISHENG毕昇 是一款开源 LLM应用开发平台,主攻企业场景, 已有大量行业头部组织及世界500强企业在使用。“毕昇”是活字印刷术的发明人,活字印刷术为人类知识的传递起到了巨大的推动作用。BISHENG毕昇团队希望“BISHENG毕昇”同样能够为智能应用的广泛落地提供有力支撑。
|
||||
|
||||

|
||||
|
||||
|
||||
- 官网地址:https://bisheng.dataelem.com/
|
||||
- Miner 在BISHENG毕昇 项目中的插件项目:https://github.com/dataelement/bisheng/pulls
|
||||
|
||||
特别鸣谢 [@pzc163](https://github.com/pzc163)
|
||||
238
docs/zh/usage/plugin/Cherry_Studio.md
Normal file
@@ -0,0 +1,238 @@
|
||||
# Cherry Studio 简介
|
||||
|
||||
Cherry Studio 是一款功能强大的多模型 AI 客户端软件,支持 Windows、macOS 和 Linux 等多平台运行,集成了 OpenAI、DeepSeek、Gemini、Anthropic 等主流 AI 云服务,同时支持本地模型运行,用户可以灵活切换不同的AI模型。
|
||||
|
||||
目前,MinerU 强大的文档解析能力已深度集成到 Cherry Studio 的知识库与对话交互中,为用户带来更便捷的文档处理与信息获取体验。
|
||||
|
||||

|
||||
|
||||
- Cherry Studio 官网地址:https://www.cherry-ai.com/
|
||||
|
||||
|
||||
# MinerU 在 Cherry Studio 中的使用方法
|
||||
|
||||
## 进入 Cherry Studio 设置
|
||||
|
||||
a. 打开 Cherry Studio 应用程序
|
||||
|
||||
b. 点击左下角的"设置"按钮,进入设置页面
|
||||
|
||||
c. 在左侧菜单中,选择"MCP 服务器"
|
||||
|
||||
在右侧的 MCP 服务器配置界面中,您可以看到已有的 MCP 服务器列表。点击右上角的"添加服务器"按钮来创建新的 MCP 服务,或者点击现有服务来编辑配置。
|
||||
|
||||
## 添加 MinerU-MCP 配置
|
||||
|
||||
点击"添加服务器"后,您将看到一个配置表单。请按以下步骤填写:
|
||||
|
||||
**a. 名称**:输入"MinerU-MCP"或您喜欢的其他名称
|
||||
|
||||
**b. 描述**:可选,如"文档转换为Markdown工具"
|
||||
|
||||
**c. 类型**:选择"标准输入/输出(stdio)"
|
||||
|
||||
**d. 命令**:输入 uvx
|
||||
|
||||
**e. 参数**:输入 mineru-mcp
|
||||
|
||||
**f. 环境变量**:添加以下环境变量
|
||||
|
||||
```Plain
|
||||
MINERU_API_BASE=https://mineru.net
|
||||
MINERU_API_KEY=您的API密钥
|
||||
OUTPUT_DIR=./downloads
|
||||
USE_LOCAL_API=false
|
||||
LOCAL_MINERU_API_BASE=http://localhost:8888
|
||||
```
|
||||
|
||||
使用 *`uvx`* 命令可以自动处理 mineru-mcp 的安装和运行,**无需预先手动安装 mineru-mcp 包**。这是最简单的配置方式。
|
||||
|
||||
## 保存配置
|
||||
|
||||
确认无误后,点击界面右上角的"保存"按钮完成配置。保存后,MCP 服务器列表中会显示您刚刚添加的 MinerU-MCP 服务。
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
## 使用 Cherry Studio 中的 MinerU MCP
|
||||
|
||||
一旦配置完成,您可以在 Cherry Studio 中的对话中使用 MinerU MCP 工具。在 Cherry Studio 中,您可以使用如下提示让模型调用 MinerU MCP 工具。模型会自动识别任务并调用相应的工具。
|
||||
|
||||
## 示例 1: 使用 URL 转换文档
|
||||
|
||||
**用户输入:**
|
||||
|
||||
```Plain
|
||||
请使用 MinerU MCP 将以下 URL 的 PDF 文档转换为 Markdown 格式:https://example.com/sample.pdf
|
||||
```
|
||||
|
||||
**模型将执行的步骤:**
|
||||
|
||||
模型识别这是文档转换任务,并调用 *`parse_documents`* 工具,参数为:
|
||||
|
||||
```Plain
|
||||
{"file_sources": "https://example.com/sample.pdf"}
|
||||
```
|
||||
|
||||
工具处理完成后,模型会告知您转换结果。
|
||||
|
||||

|
||||
|
||||
## 示例 2: 转换本地文档
|
||||
|
||||
**用户输入:**
|
||||
|
||||
```Plain
|
||||
请使用 MinerU-MCP 将本地的 D://sample.pdf 文件转换为 Markdown 格式
|
||||
```
|
||||
|
||||
**模型将执行的步骤:**
|
||||
|
||||
模型识别这是本地文档转换任务,调用 `parse_documents` 工具,参数为:
|
||||
|
||||
```Plain
|
||||
{"file_sources": "D://sample.pdf"}
|
||||
```
|
||||
|
||||

|
||||
|
||||
## 示例 3: 启用 OCR 处理扫描文档
|
||||
|
||||
**用户输入:**
|
||||
|
||||
```Plain
|
||||
请使用 MinerU-MCP 将以下 URL 的扫描版 PDF 文档转换为 Markdown 格式,并启用 OCR 功能:
|
||||
https://example.com/scanned.pdf
|
||||
```
|
||||
|
||||
**模型将执行的步骤:**
|
||||
|
||||
模型识别这是需要 OCR 处理的文档转换任务,调用 `parse_documents` 工具,并启用 OCR 参数:
|
||||
|
||||
```Plain
|
||||
{"file_sources": "https://example.com/scanned.pdf","enable_ocr": true}
|
||||
```
|
||||
|
||||

|
||||
|
||||
## 示例 4: 完整对话流程
|
||||
|
||||
以下是一个完整的对话流程示例:
|
||||
|
||||
**用户:**
|
||||
|
||||
```Plain
|
||||
我有一份学术论文的 PDF,网址是 https://arxiv.org/pdf/2303.08774.pdf,能帮我转换成 Markdown 格式吗?
|
||||
```
|
||||
|
||||
**模型:**
|
||||
|
||||
```Plain
|
||||
我可以帮您将这份学术论文转换为 Markdown 格式。我将使用 MinerU-MCP 工具来处理这个任务。
|
||||
[模型调用 parse_documents 工具,处理过程中...]
|
||||
论文已成功转换为 Markdown 格式。转换结果如下:
|
||||
# The Capacity of Diffusion Models to Memorize and Generate Training Data
|
||||
## Abstract
|
||||
Recent diffusion models can generate high-quality images that are nearly indistinguishable from real ones...
|
||||
[显示论文内容...]
|
||||
```
|
||||
|
||||

|
||||
|
||||
## 工具参数详解
|
||||
|
||||
在使用过程中,模型会根据您的指令自动选择合适的工具和参数。以下是主要工具的参数说明:
|
||||
|
||||
### ● *parse_documents* 工具参数
|
||||
|
||||

|
||||
|
||||
### ● *get_ocr_languages* 工具参数
|
||||
|
||||
无需参数,用于获取OCR支持的语言列表。
|
||||
|
||||
## 高级用法
|
||||
|
||||
### 指定语言和页码范围
|
||||
|
||||
**用户输入:**
|
||||
|
||||
```Plain
|
||||
请使用 MinerU MCP 将以下 URL 的文档转换为 Markdown 格式,只处理第 5-10 页,并指定语言为中文:https://example.com/document.pdf
|
||||
```
|
||||
|
||||
模型会使用 *`parse_documents`* 工具,并设置 *`language`* 参数为 "ch",*`page_ranges`* 参数为 "5-10"。
|
||||
|
||||
### 批量处理多个文档
|
||||
|
||||
**用户输入:**
|
||||
|
||||
```Plain
|
||||
请使用 MinerU-MCP 将以下多个 URL 的文档转换为 Markdown 格式:
|
||||
https://example.com/doc1.pdf
|
||||
https://example.com/doc2.pdf
|
||||
https://example.com/doc3.pdf
|
||||
```
|
||||
|
||||
模型会调用 *`parse_documents`* 工具,并将多个 URL 以逗号分隔传入 *`file_sources`* 参数。
|
||||
|
||||
## 注意事项
|
||||
|
||||
● 当设置 *`USE_LOCAL_API=true`* 时,使用本地配置的API进行解析
|
||||
|
||||
● 当设置 *`USE_LOCAL_API=false`* 时,会使用 MinerU 官网的API进行解析
|
||||
|
||||
● 处理大型文档可能需要较长时间,请耐心等待
|
||||
|
||||
● 如果遇到超时问题,请考虑分批处理文档或使用本地API模式
|
||||
|
||||
## 常见问题与解决方案
|
||||
|
||||
### 无法启动 MCP 服务
|
||||
|
||||
**问题**:运行 *`uv run -m mineru.cli`*` `时报错。
|
||||
|
||||
**解决方案**:
|
||||
|
||||
● 确保已激活虚拟环境
|
||||
|
||||
● 检查是否已安装所有依赖
|
||||
|
||||
● 尝试使用 *`python -m mineru.cli`*` `命令替代
|
||||
|
||||
### 文件转换失败
|
||||
|
||||
**问题**:文件上传成功但转换失败。
|
||||
|
||||
**解决方案**:
|
||||
|
||||
● 检查文件格式是否受支持
|
||||
|
||||
● 确认API密钥是否正确
|
||||
|
||||
● 查看MCP服务日志获取详细错误信息
|
||||
|
||||
### 文件路径问题
|
||||
|
||||
**问题**:使用 `parse_documents` 工具处理本地文件时报找不到文件错误。
|
||||
|
||||
**解决方案**:请确保使用绝对路径,或者相对于服务器运行目录的正确相对路径。
|
||||
|
||||
### MCP 服务调用超时问题
|
||||
|
||||
**问题**:调用 *`parse_documents`* 工具时出现 *`Error calling tool 'parse_documents': MCP error -32001: Request timed out`* 错误。
|
||||
|
||||
**解决方案**:这个问题常见于处理大型文档或网络不稳定的情况。在某些 MCP 客户端(如 Cursor)中,超时后可能导致无法再次调用 MCP 服务,需要重启客户端。最新版本的 Cursor 中可能会显示正在调用 MCP,但实际上没有真正调用成功。建议:
|
||||
|
||||
**● 等待官方修复**:这是Cursor客户端的已知问题,建议等待Cursor官方修复
|
||||
|
||||
**● 处理小文件**:尽量只处理少量小文件,避免处理大型文档导致超时
|
||||
|
||||
**● 分批处理**:将多个文件分成多次请求处理,每次只处理一两个文件
|
||||
|
||||
● 增加超时时间设置(如果客户端支持)
|
||||
|
||||
● 对于超时后无法再次调用的问题,需要重启 MCP 客户端
|
||||
|
||||
● 如果反复出现超时,请检查网络连接或考虑使用本地 API 模式
|
||||
92
docs/zh/usage/plugin/Coze.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Coze 简介
|
||||
|
||||
Coze(中文版名称:扣子) 是字节跳动推出的零代码 AI 应用开发平台。无论用户是否有编程经验,都可以通过该平台快速创建各种类型的聊天机器人、智能体、AI 应用和插件,并将其部署在社交平台和即时聊天应用程序中。
|
||||
|
||||
目前,MinerU 插件已在 Coze 插件商店上线,通过其强大的文档解析能力,为用户搭建智能体与工作流提供文档解析能力,加快用户 AI 应用的开发。
|
||||
|
||||

|
||||
|
||||
- 扣子官网地址:https://www.coze.cn/
|
||||
- MinerU 扣子插件下载地址:https://www.coze.cn/store/plugin/7527957359730360354
|
||||
|
||||
# MinerU 在 Coze 中的使用方法
|
||||
|
||||
## **Coze:集成应用**
|
||||
|
||||
- 进入 https://www.coze.cn/home coze 开发平台
|
||||
|
||||
## 智能体
|
||||
|
||||
### 工作空间 -> 项目开发 -> 创建 -> 创建智能体 -> 创建 -> 输入项目名
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 插件配置 -> 添加 `插件` -> 搜索 `MinerU`
|
||||
|
||||

|
||||
|
||||
### 添加 `parse_file` 工具(在线版)
|
||||
|
||||

|
||||
|
||||
### 选择 `MinerU` 插件 -> 编辑参数 -> 填写 api key
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
> 记得关闭 url 和 token 显示
|
||||
|
||||
### 调试 `智能体`
|
||||
|
||||

|
||||
|
||||
## 工作流
|
||||
|
||||
> 用工作流的方式使用 minerU
|
||||
|
||||
### 工作流 -> 创建工作流
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 工作流插件配置 -> 添加 `插件` -> 搜索 `MinerU` -> 添加
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 选择`MinerU` 插件 -> 编辑参数 -> 填写 api key
|
||||
|
||||

|
||||
|
||||
### 选择开始节点 -> 配置 `input` 类型为文件类型 -> 连接到 `mineru` 节点
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 选择结束节点 -> 连接到 `mineru` 节点 -> 配置 `output` 输出为 `mineru` 节点的 `parse_file.text`
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 上传文件 -> 试运行
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 发布 -> 添加到当前智能体
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### 移除 `mineru` 插件 -> 调试
|
||||
|
||||

|
||||