Compare commits

..

192 Commits

Author SHA1 Message Date
Xiaomeng Zhao
8825235088 Merge pull request #3835 from myhloli/dev
chore: update changelog for 2.6.2 release with OCR model optimizations and backend improvements
2025-10-24 18:35:17 +08:00
myhloli
44a60785c6 chore: update changelog for 2.6.2 release with OCR model optimizations and backend improvements 2025-10-24 18:33:15 +08:00
Xiaomeng Zhao
473e235397 Merge pull request #3834 from myhloli/dev
refactor: remove deprecated model configurations from arch_config.yaml and models_config.yml
2025-10-24 18:29:59 +08:00
myhloli
16814e1e1d refactor: remove deprecated model configurations from arch_config.yaml and models_config.yml 2025-10-24 18:11:50 +08:00
myhloli
3546766e72 fix: update CTCLabelDecode output channels and clean up Latin dictionary 2025-10-24 18:04:28 +08:00
Xiaomeng Zhao
b57d9caef3 Merge pull request #3833 from opendatalab/master
master->dev
2025-10-24 17:39:27 +08:00
myhloli
0603edc202 Update version.py with new version 2025-10-24 09:28:52 +00:00
Xiaomeng Zhao
2a0cb7963a Merge pull request #3829 from opendatalab/release-2.6.1
Release 2.6.1
2025-10-24 17:27:18 +08:00
Xiaomeng Zhao
a56bd6c334 Merge pull request #3831 from opendatalab/dev
Dev
2025-10-24 17:25:03 +08:00
Xiaomeng Zhao
f5400f0c94 Merge pull request #3830 from myhloli/dev
fix: correct spelling of set_default_gpu_memory_utilization and set_default_batch_size functions
2025-10-24 17:24:31 +08:00
myhloli
6a6c650062 fix: correct spelling of set_default_gpu_memory_utilization and set_default_batch_size functions 2025-10-24 17:23:13 +08:00
Xiaomeng Zhao
ae084eb317 Merge pull request #3828 from myhloli/dev
Dev
2025-10-24 17:17:23 +08:00
myhloli
7c77db7135 fix: import enable_custom_logits_processors in server.py 2025-10-24 17:16:07 +08:00
myhloli
7b14a87b9d fix: update version number to 2.6.1 in README and README_zh-CN 2025-10-24 17:13:08 +08:00
myhloli
0d0ebfd7bc fix: improve GPU memory utilization handling and ensure OMP_NUM_THREADS is set only if not defined 2025-10-24 17:11:19 +08:00
myhloli
dc438fa620 Update version.py with new version 2025-10-24 08:12:26 +00:00
Xiaomeng Zhao
f5a5644d12 Merge pull request #3825 from opendatalab/dev
Dev
2025-10-24 16:01:37 +08:00
Xiaomeng Zhao
91cc2524d5 Merge pull request #3824 from myhloli/dev
fix: update README and Chinese README to include GitHub link for optimization contributor
2025-10-24 16:00:54 +08:00
myhloli
e504e5e012 fix: update README and Chinese README to include GitHub link for optimization contributor 2025-10-24 15:58:23 +08:00
Xiaomeng Zhao
6b2f414438 Merge pull request #3823 from opendatalab/release-2.6.0
Release 2.6.0
2025-10-24 15:54:23 +08:00
Xiaomeng Zhao
a0da3029fd Update mineru/model/utils/pytorchocr/modeling/backbones/rec_lcnetv3.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 15:54:12 +08:00
Xiaomeng Zhao
30fe325428 Update mineru/model/utils/tools/infer/predict_rec.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 15:53:55 +08:00
Xiaomeng Zhao
6131013ce9 Merge pull request #3822 from opendatalab/dev
Dev
2025-10-24 15:46:40 +08:00
Xiaomeng Zhao
f1c145054a Merge pull request #3821 from myhloli/dev
Dev
2025-10-24 15:46:09 +08:00
myhloli
078aaaf150 fix: remove unnecessary parameters from kwargs in vlm_analyze.py initialization 2025-10-24 15:39:44 +08:00
myhloli
c3a55fffab fix: add utility functions for GPU memory utilization and batch size configuration 2025-10-24 15:29:23 +08:00
Xiaomeng Zhao
4eddf28c8f Merge pull request #3820 from opendatalab/dev
Dev
2025-10-24 14:59:35 +08:00
Xiaomeng Zhao
dd92c5b723 Merge pull request #3819 from myhloli/dev
update docs
2025-10-24 14:59:03 +08:00
myhloli
b5922086cb fix: add environment variable configurations for Chinese formula parsing and table merging features 2025-10-24 14:53:00 +08:00
myhloli
df12e4fc79 fix: update README and utils for table merge feature and environment variable configuration 2025-10-24 11:37:14 +08:00
myhloli
90ed311198 fix: refactor table merging logic and add cross-page table merge utility 2025-10-24 10:52:05 +08:00
myhloli
c922c63fbc fix: correct formatting in kernel initialization in rec_lcnetv3.py 2025-10-24 10:22:10 +08:00
myhloli
28b278508f fix: add error handling for PDF conversion in common.py 2025-10-24 10:19:50 +08:00
Xiaomeng Zhao
6b54f321b4 Merge pull request #3814 from myhloli/dev
Dev
2025-10-23 18:00:51 +08:00
myhloli
e47ec7cd10 fix: refactor language lists for improved readability and maintainability in gradio_app.py and pytorch_paddle.py 2025-10-23 17:51:26 +08:00
myhloli
701f6018f2 fix: add logging for improved traceability in prediction logic of predict_formula.py 2025-10-23 17:26:16 +08:00
myhloli
5ade203e31 fix: remove commented-out code for autocasting in prediction logic of predict_formula.py 2025-10-23 17:12:00 +08:00
Xiaomeng Zhao
6e83f37754 Merge branch 'opendatalab:dev' into dev 2025-10-23 17:09:20 +08:00
Xiaomeng Zhao
972161a991 Merge pull request #3812 from Sidney233/dev
feat: add PPv5 arabic cyrillic devanagari ta te
2025-10-23 17:08:52 +08:00
Sidney233
700e11d342 feat: add PPv5 arabic cyrillic devanagari ta te 2025-10-23 16:49:01 +08:00
myhloli
fd79885b23 fix: remove commented-out code for autocasting in prediction logic of predict_formula.py 2025-10-23 16:03:34 +08:00
myhloli
a0810b5b6e fix: add debug logging for LaTeX text processing in processors.py 2025-10-23 02:30:47 +08:00
myhloli
39271b45de fix: adjust batch size calculation in prediction logic of predict_formula.py 2025-10-23 02:15:14 +08:00
Xiaomeng Zhao
db68aaf4ac Merge pull request #3806 from myhloli/dev
fix: update Gradio API access instructions in quick_usage.md
2025-10-22 22:51:37 +08:00
myhloli
a6cc8fa90d fix: update Gradio API access instructions in quick_usage.md 2025-10-22 22:50:36 +08:00
Xiaomeng Zhao
47f34f4ce8 Merge pull request #3805 from myhloli/dev
fix: handle empty input in prediction logic of predict_formula.py
2025-10-22 22:21:38 +08:00
myhloli
b7a8347f45 fix: handle empty input in prediction logic of predict_formula.py 2025-10-22 22:20:06 +08:00
Xiaomeng Zhao
c6d241f4f4 Merge pull request #3804 from myhloli/dev
fix: update model paths in models_download.py to include pp_formulanet_plus_m
2025-10-22 20:47:26 +08:00
myhloli
06b2fda1c1 fix: update model paths in models_download.py to include pp_formulanet_plus_m 2025-10-22 20:46:15 +08:00
Xiaomeng Zhao
5c1ca9271e Merge pull request #3803 from myhloli/dev
Dev
2025-10-22 20:33:42 +08:00
Xiaomeng Zhao
e7485c5d79 Update mineru/model/mfr/pp_formulanet_plus_m/predict_formula.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 20:32:36 +08:00
Xiaomeng Zhao
80436a89f9 Update mineru/model/utils/pytorchocr/modeling/heads/rec_ppformulanet_head.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 20:32:06 +08:00
Xiaomeng Zhao
b36793cef0 Update mineru/model/mfr/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 20:31:50 +08:00
myhloli
43b51e78fc fix: add environment variable handling for table merging in JSON processing 2025-10-22 20:19:59 +08:00
myhloli
9688f73046 fix: update package path for PaddleOCR utilities in pyproject.toml 2025-10-22 20:08:52 +08:00
myhloli
c02edd9cba fix: correct docstring for remove_up_commands function in utils.py 2025-10-22 20:07:11 +08:00
myhloli
b4d08e994c feat: implement LaTeX formatting utilities and refactor processing logic 2025-10-22 20:02:59 +08:00
myhloli
a220b8a208 refactor: enhance title hierarchy logic and update model configuration 2025-10-22 15:57:07 +08:00
myhloli
ab480a7a86 fix: update progress bar description in formula prediction 2025-10-22 15:51:56 +08:00
myhloli
f57a6d8d9e refactor: remove commented-out device assignment in predict_formula.py 2025-10-21 18:45:21 +08:00
myhloli
915ba87f7d feat: adjust batch size calculation and enhance device management in model heads 2025-10-21 18:21:25 +08:00
myhloli
42a95e8e20 refactor: improve variable naming and streamline input processing in predict_formula.py 2025-10-21 14:57:57 +08:00
Xiaomeng Zhao
a513357607 Merge pull request #3779 from myhloli/dev
mfr add paddle
2025-10-20 19:14:46 +08:00
Xiaomeng Zhao
c8ccf4cf20 Update mineru/model/utils/pytorchocr/modeling/heads/rec_ppformulanet_head.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-20 19:14:16 +08:00
Xiaomeng Zhao
33d43a5afc Update mineru/model/utils/pytorchocr/modeling/heads/rec_ppformulanet_head.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-20 19:14:05 +08:00
Xiaomeng Zhao
3b057c7996 Merge pull request #19 from myhloli/mfr-add-paddle
Mfr add paddle
2025-10-20 18:59:48 +08:00
myhloli
34547262a2 refactor: remove unused Formula constant from model_list.py 2025-10-20 18:57:35 +08:00
myhloli
cd0ed982c0 fix: revert MFR_MODEL to unimernet_small in model initialization 2025-10-20 18:55:30 +08:00
myhloli
52dcbcbfa5 Bump mineru-vl-utils version to 0.1.14 2025-10-20 15:03:39 +08:00
myhloli
0758de6d24 Update vllm version and increase default GPU memory utilization 2025-10-20 11:45:58 +08:00
Xiaomeng Zhao
ae7892a6f9 Merge pull request #3770 from myhloli/dev
Update acceleration card links to include discussion and pull request references
2025-10-17 19:01:33 +08:00
myhloli
73567ccedc Update acceleration card links to include discussion and pull request references 2025-10-17 19:00:15 +08:00
Xiaomeng Zhao
bb552282f3 Merge pull request #3769 from myhloli/dev
Add support for domestic acceleration cards in documentation
2025-10-17 18:54:34 +08:00
myhloli
14c38101f7 Add support for domestic acceleration cards in documentation 2025-10-17 18:53:31 +08:00
Xiaomeng Zhao
cb3a30e9ad Merge pull request #3768 from myhloli/dev
Add support for domestic acceleration cards in documentation
2025-10-17 18:41:31 +08:00
myhloli
f4db41d0cb Add support for domestic acceleration cards in documentation 2025-10-17 18:40:40 +08:00
Xiaomeng Zhao
dad59f7d52 Merge pull request #3760 from magicyuan876/master
feat(tianshu): v2.0 架构升级 - Worker主动拉取模式
2025-10-17 18:31:38 +08:00
myhloli
499e877165 refactor: rename files and update import paths for consistency 2025-10-17 18:09:19 +08:00
myhloli
2d249666ba feat: integrate PP-FormulaNet_plus-M architecture and update model initialization 2025-10-17 17:00:22 +08:00
Magic_yuan
cedc62a728 完善markitdown依赖 2025-10-17 16:17:03 +08:00
Xiaomeng Zhao
1e40bac24f Merge pull request #3761 from Sidney233/dev
feat: add PPFormula
2025-10-17 14:40:10 +08:00
Sidney233
23701d0db4 feat: add PPFormula 2025-10-17 14:02:26 +08:00
Magic_yuan
e7d8bf097a 修复codereview建议 2025-10-17 13:04:49 +08:00
Magic_yuan
08a89aeca1 feat(tianshu): v2.0 架构升级 - Worker主动拉取模式
主要改进:
- Worker主动拉取任务,响应速度提升10-20倍 (5-10s → 0.5s)
- 数据库并发安全增强,使用原子操作防止任务重复
- 调度器变为可选监控组件,默认不启动
- 修复多GPU显存占用问题,完全隔离各进程

新增功能:
- API自动返回解析内容
- 结果文件自动清理(可配置)
- 支持图片上传MinIO
2025-10-17 11:46:42 +08:00
Xiaomeng Zhao
1b724f3336 Merge pull request #3756 from myhloli/dev
Set OMP_NUM_THREADS environment variable to 1 for vllm backend initialization
2025-10-16 19:06:45 +08:00
myhloli
ea4271ab37 Set OMP_NUM_THREADS environment variable to 1 for vllm backend initialization 2025-10-16 18:26:06 +08:00
Xiaomeng Zhao
d83b83a5ad Merge pull request #3755 from myhloli/dev
Dev
2025-10-16 17:46:44 +08:00
myhloli
0853b84e87 Update README files to use external image link for MinerU logo 2025-10-16 17:45:42 +08:00
myhloli
36225160a3 Update arXiv badge to reflect MinerU technical report and add badge for MinerU2.5 2025-10-16 17:41:41 +08:00
myhloli
a36118f8ba Add mineru_tianshu project to README files for version 2.0 compatibility 2025-10-16 17:38:57 +08:00
myhloli
a38384e7fb Update mineru-vl-utils dependency version to allow upgrades to 0.1.13 2025-10-16 17:36:45 +08:00
Xiaomeng Zhao
4b7c2bbcc0 Merge pull request #3754 from myhloli/dev
Refactor table merging logic to enhance colspan adjustments and improve caption handling
2025-10-16 17:35:28 +08:00
Xiaomeng Zhao
504fe6ada3 Merge pull request #3742 from magicyuan876/master
feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务
2025-10-16 17:33:54 +08:00
myhloli
39be54023b Refactor table merging logic to enhance colspan adjustments and improve caption handling 2025-10-16 17:31:57 +08:00
Magic_yuan
484ff5a6f9 修复codereview问题 2025-10-16 16:04:42 +08:00
myhloli
59a7a577b3 Add backend name dropdown and update version constraints in bug report template 2025-10-16 14:55:48 +08:00
Xiaomeng Zhao
0e73ef9615 Merge pull request #3750 from myhloli/dev
Update openai dependency version to allow upgrades to version 3
2025-10-16 14:43:57 +08:00
myhloli
d580d6c7f8 Update openai dependency version to allow upgrades to version 3 2025-10-16 14:43:05 +08:00
Xiaomeng Zhao
4c8bb038ce Merge pull request #3748 from myhloli/dev
Enhance table merging logic to adjust colspan attributes based on row structures
2025-10-16 14:24:14 +08:00
myhloli
a89715b9a2 Refactor table merging logic to improve caption handling and prevent merging with non-continuation captions 2025-10-16 14:11:15 +08:00
myhloli
f05ea7c2e6 Simplify model output path handling by removing conditional checks for backend type 2025-10-16 14:09:30 +08:00
Xiaomeng Zhao
b68db3ab90 Merge pull request #3740 from yongtenglei/master
docs: Fix outdated sample data for output reference
2025-10-16 10:43:22 +08:00
yongtenglei
3539cfba36 docs: Fix sample data for output reference 2025-10-16 10:33:13 +08:00
Magic_yuan
3bf50d5267 feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务
项目简介:
天枢(Tianshu)是基于 MinerU 的文档解析服务,采用 SQLite 任务队列 +
LitServe GPU 负载均衡架构,支持异步处理、任务持久化和多格式文档智能解析。

核心功能:
- 异步任务处理:客户端立即响应,后台处理任务
- 智能解析器:PDF/图片使用 MinerU(GPU加速),Office/文本使用 MarkItDown
- GPU 负载均衡:基于 LitServe 实现多GPU自动调度
- 任务持久化:SQLite 存储,服务重启任务不丢失
- 优先级队列:支持任务优先级设置
- RESTful API:完整的任务管理接口
- MinIO 集成:支持图片上传到对象存储

项目架构:
- api_server.py: FastAPI Web 服务器,提供 RESTful API
- task_db.py: SQLite 任务数据库管理器
- litserve_worker.py: LitServe Worker Pool,GPU 负载均衡
- task_scheduler.py: 异步任务调度器
- start_all.py: 统一启动脚本
- client_example.py: Python 客户端示例

技术栈:
FastAPI, LitServe, SQLite, MinerU, MarkItDown, MinIO, Loguru
2025-10-16 08:41:51 +08:00
myhloli
2108019698 Enhance table merging logic to adjust colspan attributes based on row structures 2025-10-15 19:05:28 +08:00
Xiaomeng Zhao
17a9921ba9 Merge pull request #3737 from myhloli/dev
Refactor block processing to handle non-contiguous indices in captions and footnotes
2025-10-15 17:06:22 +08:00
myhloli
3baee1d077 Refactor block processing to handle non-contiguous indices in captions and footnotes 2025-10-15 17:04:29 +08:00
myhloli
e1ee728e31 Sort blocks by index and clean up unprocessed blocks handling 2025-10-15 16:06:03 +08:00
Xiaomeng Zhao
1b45e6e1bc Merge pull request #3723 from myhloli/dev
Rename plugin documentation files for consistency and update index links
2025-10-14 19:00:38 +08:00
myhloli
966aadd1d3 Rename plugin documentation files for consistency and update index links 2025-10-14 18:58:24 +08:00
Xiaomeng Zhao
ecb8e3f0ac Merge pull request #3722 from myhloli/dev
Add documentation for Cherry Studio, Sider, Dify, n8n, Coze, FastGPT, ModelWhale, DingTalk, DataFlow, BISHENG, and RagFlow plugins
2025-10-14 18:55:19 +08:00
myhloli
1bef6e3526 Add documentation for Cherry Studio, Sider, Dify, n8n, Coze, FastGPT, ModelWhale, DingTalk, DataFlow, BISHENG, and RagFlow plugins 2025-10-14 18:54:15 +08:00
myhloli
4c4d1d0f95 Update supported version range in bug_report.yml to include 2.2.x and 2.5.x 2025-10-14 16:09:30 +08:00
Xiaomeng Zhao
c36aa54370 Merge pull request #3709 from myhloli/dev
Add max_concurrency parameter to improve backend processing
2025-10-13 15:57:34 +08:00
myhloli
4b480cfcf7 Add max_concurrency parameter to improve backend processing 2025-10-13 15:56:49 +08:00
Xiaomeng Zhao
7e18e1bb76 Merge pull request #3707 from myhloli/dev
Refactor async function and improve output directory handling in prediction
2025-10-13 11:59:33 +08:00
myhloli
44fdeb663f Refactor async function and improve output directory handling in prediction 2025-10-13 11:32:28 +08:00
myhloli
cf59949ba9 add tiff 2025-10-12 11:45:49 +08:00
Xiaomeng Zhao
c8c2f28afc Merge pull request #3701 from opendatalab/ocr_enhance
Ocr enhance
2025-10-11 19:33:32 +08:00
Xiaomeng Zhao
aa4bc6259b Merge pull request #3700 from myhloli/ocr_enhance
Reduce recognition batch size from 8 to 6
2025-10-11 19:29:09 +08:00
myhloli
b7e4ea0b49 Reduce recognition batch size from 8 to 6 for improved OCR performance 2025-10-11 19:28:16 +08:00
Xiaomeng Zhao
998197a47f Merge pull request #3672 from cjsdurj/optimize_ocr
优化pytorch_paddle ocr的推理性性能,总体提升约400%
2025-10-11 18:44:02 +08:00
Xiaomeng Zhao
3c8b6e6b6b Merge pull request #3499 from jinghuan-Chen/fix/fill_blank_rec_crop_empty_image
Avoid cropping empty images.
2025-10-11 11:14:05 +08:00
Xiaomeng Zhao
be42b46ff9 Merge pull request #3688 from myhloli/dev 2025-10-10 19:43:03 +08:00
myhloli
7c689e33b8 Refactor fix_two_layer_blocks function to improve handling of captions and footnotes in table blocks 2025-10-10 19:12:18 +08:00
cjsdurj
af66bc02c2 优化ocr推理性能400% 2025-10-09 13:03:22 +00:00
Xiaomeng Zhao
752f75ad8e Merge pull request #3651 from opendatalab/dev
Dev
2025-09-30 06:31:24 +08:00
Xiaomeng Zhao
1cfde98585 Merge pull request #3650 from myhloli/dev
Dev
2025-09-30 06:30:12 +08:00
Xiaomeng Zhao
54676295d5 Update README_zh-CN.md 2025-09-30 06:29:05 +08:00
Xiaomeng Zhao
61c7c65d8b Update README.md 2025-09-30 06:18:00 +08:00
Xiaomeng Zhao
6f05f735d0 Update header.html 2025-09-30 06:11:43 +08:00
Xiaomeng Zhao
befb16e531 Merge pull request #3649 from opendatalab/master
master->dev
2025-09-30 06:08:54 +08:00
Bin Wang
abc433d6f2 Merge pull request #3635 from wangbinDL/master
docs: Update arXiv link for technical report
2025-09-29 09:36:45 +08:00
wangbinDL
e7c1385068 docs: Update arXiv link for technical report 2025-09-29 09:32:30 +08:00
Bin Wang
342c5aa34a Merge pull request #3619 from wangbinDL/master
docs: Update MinerU2.5 Technical Report
2025-09-26 18:35:31 +08:00
wangbinDL
f25ddfa024 docs: Update MinerU2.5 Technical Report 2025-09-26 18:27:22 +08:00
Bin Wang
e31de3a453 Merge pull request #3615 from wangbinDL/master
docs: Add MinerU2.5 technical report and BibTeX
2025-09-26 11:51:45 +08:00
wangbinDL
2f01754410 docs: Add MinerU2.5 technical report and BibTeX 2025-09-26 11:42:59 +08:00
Xiaomeng Zhao
8a9921fb22 Merge pull request #3610 from opendatalab/master
master->dev
2025-09-26 06:17:20 +08:00
myhloli
652e11a253 Update version.py with new version 2025-09-25 21:57:26 +00:00
Xiaomeng Zhao
61cc6886fe Merge pull request #3608 from opendatalab/release-2.5.4
Release 2.5.4
2025-09-26 05:53:36 +08:00
Xiaomeng Zhao
80dc57e7ce Merge pull request #3609 from myhloli/dev
Bump mineru-vl-utils dependency to version 0.1.11
2025-09-26 05:48:32 +08:00
myhloli
d84a006f6d Bump mineru-vl-utils dependency to version 0.1.11 2025-09-26 05:47:27 +08:00
Xiaomeng Zhao
2c5361bf8e Merge pull request #3607 from myhloli/dev
Update changelog for version 2.5.4 to document PDF identification fix
2025-09-26 05:43:50 +08:00
myhloli
eb01b7acf9 Update changelog for version 2.5.4 to document PDF identification fix 2025-09-26 05:42:43 +08:00
Xiaomeng Zhao
5656f1363b Merge pull request #3606 from myhloli/dev
Dev
2025-09-26 05:35:29 +08:00
myhloli
c9315b8e10 Refactor suffix guessing to handle PDF extensions for AI files 2025-09-26 05:31:46 +08:00
myhloli
907099762f Normalize PDF suffix handling for AI files to be case-insensitive 2025-09-26 05:09:19 +08:00
myhloli
2c356cccee Fix suffix identification for AI files to correctly handle PDF extensions 2025-09-26 05:02:56 +08:00
myhloli
0f62f166e6 Enhance image link replacement to handle only .jpg files while preserving other formats 2025-09-26 04:52:05 +08:00
Xiaomeng Zhao
c7a64e72dc Merge pull request #3563 from myhloli/dev
Update model output handling in test_e2e.py to write JSON format instead of text
2025-09-21 02:49:31 +08:00
myhloli
3cb3a94830 Merge remote-tracking branch 'origin/dev' into dev 2025-09-21 02:48:45 +08:00
myhloli
8301fa4c20 Update model output handling in test_e2e.py to write JSON format instead of text 2025-09-21 02:47:56 +08:00
Xiaomeng Zhao
4400f4b75f Merge pull request #3558 from opendatalab/master
master->dev
2025-09-20 15:37:45 +08:00
myhloli
92efb8f96e Update version.py with new version 2025-09-20 07:36:01 +00:00
Xiaomeng Zhao
9a88cbfb09 Merge pull request #3545 from opendatalab/release-2.5.3
Release 2.5.3
2025-09-20 15:33:58 +08:00
Xiaomeng Zhao
e96e4a0ce4 Merge pull request #3557 from opendatalab/dev
Dev
2025-09-20 15:30:40 +08:00
Xiaomeng Zhao
c7bde0ab39 Merge pull request #3556 from myhloli/dev
Refactor batch image orientation classification logic for improved cl…
2025-09-20 15:30:08 +08:00
myhloli
8754c24e42 Refactor batch image orientation classification logic for improved clarity and performance 2025-09-20 15:24:28 +08:00
Xiaomeng Zhao
4f8c00cc34 Merge pull request #3555 from opendatalab/dev
Dev
2025-09-20 15:18:19 +08:00
Xiaomeng Zhao
89681f98ad Merge pull request #3554 from myhloli/dev
Fix formatting in changelog sections of README.md and README_zh-CN.md…
2025-09-20 15:14:16 +08:00
myhloli
66d328dbc5 Fix formatting in changelog sections of README.md and README_zh-CN.md for improved readability 2025-09-20 15:13:29 +08:00
Xiaomeng Zhao
f0c1318545 Merge pull request #3553 from myhloli/dev
Fix formatting in changelog sections of README.md and README_zh-CN.md…
2025-09-20 15:11:43 +08:00
myhloli
6e97f3cf70 Fix formatting in changelog sections of README.md and README_zh-CN.md for improved readability 2025-09-20 15:10:25 +08:00
Xiaomeng Zhao
aede62167e Merge pull request #3552 from opendatalab/dev
Dev
2025-09-20 15:08:40 +08:00
Xiaomeng Zhao
5f2740f743 Merge pull request #3551 from myhloli/dev
Fix compute capability comparison in custom_logits_processors.py for …
2025-09-20 15:08:14 +08:00
myhloli
a888d2b625 Fix compute capability comparison in custom_logits_processors.py for correct version handling 2025-09-20 15:06:49 +08:00
Xiaomeng Zhao
4275876331 Merge pull request #3550 from opendatalab/dev
Dev
2025-09-20 15:01:39 +08:00
Xiaomeng Zhao
ec9f7f54ab Merge pull request #3549 from myhloli/dev
Update README.md and README_zh-CN.md to include changelog for v2.5.3 …
2025-09-20 15:00:50 +08:00
myhloli
7861e5e369 Remove redundant newline in README.md for improved formatting 2025-09-20 15:00:12 +08:00
myhloli
159f3a89a3 Update README.md and README_zh-CN.md to include changelog for v2.5.3 release with compatibility fixes and performance adjustments 2025-09-20 14:57:54 +08:00
Xiaomeng Zhao
d9452bbeb9 Merge pull request #3546 from myhloli/dev
Update docker_deployment.md for improved clarity on base image usage …
2025-09-20 14:48:50 +08:00
myhloli
d808a32c0b Update docker_deployment.md for improved clarity on base image usage and GPU support 2025-09-20 13:52:16 +08:00
Xiaomeng Zhao
12ce3bd024 Merge pull request #3544 from myhloli/dev
Dev
2025-09-20 13:26:18 +08:00
myhloli
e3d7aece50 Remove warning log for default VLLM_USE_V1 value in custom_logits_processors.py 2025-09-20 13:25:11 +08:00
Xiaomeng Zhao
7c55a0ea65 Update mineru/backend/vlm/custom_logits_processors.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-20 13:22:40 +08:00
myhloli
f1659eb7a7 Refactor logits processor handling in server.py and vlm_analyze.py for improved clarity and consistency 2025-09-20 13:21:05 +08:00
myhloli
c6bffd9382 Restrict vllm version to <0.11 for compatibility 2025-09-20 11:49:06 +08:00
myhloli
857dcb2ef5 Update docker_deployment.md to clarify GPU model support and base image options for vLLM 2025-09-20 11:45:33 +08:00
myhloli
ef69f98cd6 Update Dockerfile to include comments for GPU architecture compatibility based on Compute Capability 2025-09-20 03:15:58 +08:00
myhloli
6d5d1cf26b Refactor image rotation handling in batch_analyze.py and paddle_ori_cls.py for improved compatibility with torch versions 2025-09-20 03:07:47 +08:00
myhloli
7c481796f8 Refactor custom logits processors to include vllm version checks and improve logging 2025-09-20 01:22:06 +08:00
myhloli
7d62b7b7cc Update mineru-vl-utils dependency version to 0.1.8 2025-09-20 00:31:14 +08:00
myhloli
5a0cf9af7f Enhance custom logits processors with improved compute capability checks and environment variable handling 2025-09-20 00:21:43 +08:00
myhloli
f5e0e67545 Add custom logits processors functionality with compute capability check 2025-09-19 19:21:56 +08:00
myhloli
a4cac624df Add compute capability check for custom logits processors in server.py and vlm_analyze.py 2025-09-19 19:00:41 +08:00
Xiaomeng Zhao
e1eb318b9b Merge pull request #3535 from opendatalab/master
master->dev
2025-09-19 16:51:13 +08:00
myhloli
31834b1e68 Update version.py with new version 2025-09-19 08:48:17 +00:00
Xiaomeng Zhao
100ace2e99 Merge pull request #3534 from opendatalab/release-2.5.2
Release 2.5.2
2025-09-19 16:45:57 +08:00
myhloli
c343afd20c Update version.py with new version 2025-09-19 03:45:08 +00:00
Xiaomeng Zhao
6586c7c01e Merge pull request #3529 from opendatalab/release-2.5.1
Release 2.5.1
2025-09-19 11:43:51 +08:00
jinghuan-Chen
8bb8b715c1 Avoid cropping empty images. 2025-09-18 17:08:40 +08:00
223 changed files with 14345 additions and 25498 deletions

View File

@@ -122,7 +122,21 @@ body:
#multiple: false
options:
-
- "2.0.x"
- "<2.2.0"
- "2.2.x"
- ">=2.5"
validations:
required: true
- type: dropdown
id: backend_name
attributes:
label: Backend name | 解析后端
#multiple: false
options:
-
- "vlm"
- "pipeline"
validations:
required: true

View File

@@ -1,7 +1,7 @@
<div align="center" xmlns="http://www.w3.org/1999/html">
<!-- logo -->
<p align="center">
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
</p>
<!-- icon -->
@@ -18,7 +18,8 @@
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
@@ -43,6 +44,28 @@
</div>
# Changelog
- 2025/10/24 2.6.2 Release
- `pipeline` backend optimizations
- Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
- `OCR` speed significantly improved by 200%~300%, thanks to the optimization solution provided by [@cjsdurj](https://github.com/cjsdurj)
- `OCR` models optimized for improved accuracy and coverage of Latin script recognition, and updated Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) language systems to `ppocr-v5` version, with accuracy improved by over 40% compared to previous models
- `vlm` backend optimizations
- `table_caption` and `table_footnote` matching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page
- Optimized CPU resource usage during high concurrency when using `vllm` backend, reducing server pressure
- Adapted to `vllm` version 0.11.0
- General optimizations
- Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios
- Added environment variable configuration option `MINERU_TABLE_MERGE_ENABLE` for table merging feature. Table merging is enabled by default and can be disabled by setting this variable to `0`
- 2025/09/26 2.5.4 released
- 🎉🎉 The MinerU2.5 [Technical Report](https://arxiv.org/abs/2509.22186) is now available! We welcome you to read it for a comprehensive overview of its model architecture, training strategy, data engineering and evaluation results.
- Fixed an issue where some `PDF` files were mistakenly identified as `AI` files, causing parsing failures
- 2025/09/20 2.5.3 Released
- Dependency version range adjustment to enable Turing and earlier architecture GPUs to use vLLM acceleration for MinerU2.5 model inference.
- `pipeline` backend compatibility fixes for torch 2.8.0.
- Reduced default concurrency for vLLM async backend to lower server pressure and avoid connection closure issues caused by high load.
- More compatibility-related details can be found in the [announcement](https://github.com/opendatalab/MinerU/discussions/3548)
- 2025/09/19 2.5.2 Released
@@ -733,6 +756,16 @@ Currently, some models in this project are trained based on YOLO. However, since
# Citation
```bibtex
@misc{niu2025mineru25decoupledvisionlanguagemodel,
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
year={2025},
eprint={2509.22186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.22186},
}
@misc{wang2024mineruopensourcesolutionprecise,
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
@@ -771,4 +804,4 @@ Currently, some models in this project are trained based on YOLO. However, since
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)

View File

@@ -1,7 +1,7 @@
<div align="center" xmlns="http://www.w3.org/1999/html">
<!-- logo -->
<p align="center">
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
</p>
<!-- icon -->
@@ -18,7 +18,8 @@
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
@@ -43,6 +44,28 @@
</div>
# 更新记录
- 2025/10/24 2.6.2 发布
- `pipline`后端优化
- 增加对中文公式的实验性支持,可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题建议仅在需要解析中文公式的场景下开启。如需关闭该功能可将环境变量设置为`0`
- `OCR`速度大幅提升200%~300%,感谢 [@cjsdurj](https://github.com/cjsdurj) 提供的优化方案
- `OCR`模型优化拉丁文识别的准度和广度,并更新西里尔文(cyrillic)、阿拉伯文(arabic)、天城文(devanagari)、泰卢固语(te)、泰米尔语(ta)语系至`ppocr-v5`版本精度相比上代模型提升40%以上
- `vlm`后端优化
- `table_caption``table_footnote`匹配逻辑优化,提升页内多张连续表场景下的表格标题和脚注的匹配准确率和阅读顺序合理性
- 优化使用`vllm`后端时高并发时的cpu资源占用降低服务端压力
- 适配`vllm`0.11.0版本
- 通用优化
- 跨页表格合并效果优化,新增跨页续表合并支持,提升在多列合并场景下的表格合并效果
- 为表格合并功能增加环境变量配置选项`MINERU_TABLE_MERGE_ENABLE`,表格合并功能默认开启,可通过设置该变量为`0`来关闭表格合并功能
- 2025/09/26 2.5.4 发布
- 🎉🎉 MinerU2.5[技术报告](https://arxiv.org/abs/2509.22186)现已发布,欢迎阅读全面了解其模型架构、训练策略、数据工程和评测结果。
- 修复部分`pdf`文件被识别成`ai`文件导致无法解析的问题
- 2025/09/20 2.5.3 发布
- 依赖版本范围调整使得Turing及更早架构显卡可以使用vLLM加速推理MinerU2.5模型。
- `pipeline`后端对torch 2.8.0的一些兼容性修复。
- 降低vLLM异步后端默认的并发数降低服务端压力以避免高压导致的链接关闭问题。
- 更多兼容性相关内容详见[公告](https://github.com/opendatalab/MinerU/discussions/3547)
- 2025/09/19 2.5.2 发布
我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型并显著领先于主流文档解析专用模型如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。
@@ -719,6 +742,16 @@ mineru -p <input_path> -o <output_path>
# Citation
```bibtex
@misc{niu2025mineru25decoupledvisionlanguagemodel,
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
year={2025},
eprint={2509.22186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.22186},
}
@misc{wang2024mineruopensourcesolutionprecise,
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
@@ -757,4 +790,4 @@ mineru -p <input_path> -o <output_path>
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)

View File

@@ -1,9 +1,16 @@
# Use DaoCloud mirrored vllm image for China region
# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.1.1
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.2
# Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \
apt-get install -y \

View File

@@ -1,6 +1,10 @@
# Use the official vllm image
# Use the official vllm image for gpu with Ampere architecture and above (Compute Capability>=8.0)
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
FROM vllm/vllm-openai:v0.10.1.1
# Use the official vllm image for gpu with Turing architecture and below (Compute Capability<8.0)
# FROM vllm/vllm-openai:v0.10.2
# Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \
apt-get install -y \

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 214 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 147 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 249 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 255 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 177 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 161 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 263 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 264 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 246 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 500 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 276 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

View File

@@ -19,7 +19,8 @@
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
<div align="center">

View File

@@ -10,7 +10,8 @@ docker build -t mineru-vllm:latest -f Dockerfile .
```
> [!TIP]
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default. This version of vLLM v1 engine has limited support for GPU models.
> If you cannot use vLLM accelerated inference on Turing and earlier architecture GPUs, you can resolve this issue by changing the base image to `vllm/vllm-openai:v0.10.2`.
## Docker Description

View File

@@ -397,10 +397,10 @@ Text levels are distinguished through the `text_level` field:
{
"type": "image",
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
"img_caption": [
"image_caption": [
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 19892000. "
],
"img_footnote": [],
"image_footnote": [],
"bbox": [
62,
480,

View File

@@ -87,6 +87,16 @@ Here are the environment variables and their descriptions:
* Used to enable formula parsing
* defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
- `MINERU_TABLE_ENABLE`:
- `MINERU_FORMULA_CH_SUPPORT`:
* Used to enable Chinese formula parsing optimization (experimental feature)
* Default is `false`, can be set to `true` via environment variable to enable Chinese formula parsing optimization.
* Only effective for `pipeline` backend.
- `MINERU_TABLE_ENABLE`:
* Used to enable table parsing
* defaults to `true`, can be set to `false` through environment variables to disable table parsing.
* Default is `true`, can be set to `false` via environment variable to disable table parsing.
- `MINERU_TABLE_MERGE_ENABLE`:
* Used to enable table merging functionality
* Default is `true`, can be set to `false` via environment variable to disable table merging functionality.

View File

@@ -52,7 +52,7 @@ If you need to adjust parsing options through custom parameters, you can also ch
>[!TIP]
>
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
- Using `http-client/server` method:
```bash
# Start vllm server (requires vllm environment)

View File

@@ -19,7 +19,8 @@
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
<div align="center">

View File

@@ -10,7 +10,8 @@ docker build -t mineru-vllm:latest -f Dockerfile .
```
> [!TIP]
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,
> 该版本的vLLM v1 engine对显卡型号支持有限如您无法在Turing及更早架构的显卡上使用vLLM加速推理可通过更改基础镜像为`vllm/vllm-openai:v0.10.2`来解决该问题。
## Docker说明

View File

@@ -397,10 +397,10 @@ inference_result: list[PageInferenceResults] = []
{
"type": "image",
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
"img_caption": [
"image_caption": [
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 19892000. "
],
"img_footnote": [],
"image_footnote": [],
"bbox": [
62,
480,

View File

@@ -0,0 +1,365 @@
## 基于Triton的ROCm 不同后端实现优化基本实现vllm后端正常推理以及pipeline后端中第一步layout用的DocLayout-YOLO
**已有完整python vllm和mineru环境直接跳转第五步**
**其他GPU执行问题可以参考先prof查看定位找到哪个算子问题然后triton后端实现即可**
测试了一下基本和MinerU官网效果差不多用AMD的人也不是很多就在评论区分享给大家了
### 1.结果介绍
**补充一个200页的PDF python编程书测试一下速度可以到1.99it/s**
Two Step Extraction: 100%|████████████████████████████████████████| 200/200 [01:40<00:00, 1.99it/s]
**下面为之前14学术论文测试结果**
7900xtx mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true 速度大概为**1.6-1.8s/it**没有仔细测试简单试了两个文档。第二种矩阵乘法代替原来的dots点乘可以进一步提速到1.3s/it优化后的主要算子耗时在hipblast(这个没法提升了)和vllm triton后端各占25%耗时吧vllm tirion后端这个这个只能等官方优化了。。。。
doclayout-yolo的layout速度从原来的1.6it/s提高到15it/s注意需要缓存一下输入的pdf尺寸后triton必须要缓存尺寸没办法。主要是为了保留模型输入输出接口最小代码改动。
采用-b vlm-vllm-engine模式举个例子
---
**测试结果为优化为5d矩阵乘代替原来的点积结果**
2025-10-05 15:45:12.985 | INFO | mineru.backend.vlm.vlm_analyze:get_model:128 - get vllm-engine predictor cost: 18.45s
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████| 14/14 [00:01<00:00, 12.20it/s]
Processed prompts: 100%|█████████████████████| 14/14 [00:08<00:00, 1.56it/s, est. speed input: 2174.18 toks/s, output: 791.87 toks/s]
Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████| 278/278 [00:00<00:00, 323.03it/s]
Processed prompts: 100%|██████████████████| 278/278 [00:07<00:00, 37.63it/s, est. speed input: 5264.66 toks/s, output: 2733.31 toks/s]
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true测试
2025-10-05 15:46:55.953 | WARNING | mineru.cli.common:convert_pdf_bytes_to_bytes_by_pypdfium2:54 - end_page_id is out of range, use pdf_docs length
Two Step Extraction: 100%|████████████████████████████████████████████████████████████████████████████| 14/14 [00:18<00:00, 1.30s/it]
---
### 2.原因介绍
AMD RDNA使用vllm后端有严重的性能问题原因是因为vllm的**qwen2_vl.py**中有一个算子在rocm kernel上没有对应的实现导致性能出现严重的卷积计算回退一次执行花了12s。。。。。。。。一言难尽。即**MIOpen 库中缺少模型中特定 Conv3d(bfloat16) 的优化内核**。
DocLayout-YOLO的**g2l_crm.py**空洞卷积也是这个问题专业的CDNA MI210也没解决这个问题
正好一起处理了。
---
### 3.环境介绍
System: Ubuntu 24.04.3 Kernel: Linux 6.14.0-33-generic ROCm version: 7.0.1
python环境
python 3.12
pytorch-triton-rocm 3.5.0+gitbbb06c03
torch 2.10.0.dev20251001+rocm7.0
torchvision 0.25.0.dev20251003+rocm7.0
vllm 0.11.0rc2.dev198+g736fbf4c8.rocm701
不同版本无所谓,处理方法是一样的。
---
### 4.前置环境安装
```
uv venv --python python3.12
source .venv/bin/activate
uv pip install --pre torch torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple/ --extra-index-url https://download.pytorch.org/whl/nightly/rocm7.0
uv pip install pip
# 避免覆盖我们本地的pytorch改用pip而没有继续使用uv pip
pip install -U "mineru[core]" -i https://pypi.mirrors.ustc.edu.cn/simple/
```
vllm 安装参考官方手册[Vllm](https://docs.vllm.com.cn/en/latest/getting_started/installation/gpu.html#amd-rocm)
```
#手动安装aitervllmamd-smi等自行找一个位置clone然后进入该目录吧
git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
git submodule sync; git submodule update --init --recursive
python setup.py develop
cd ..
git clone https://github.com/vllm-project/vllm.git
cd vllm/
cp -r /opt/rocm/share/amd_smi ~/Pytorch/vllm/
pip install amd_smi/
pip install --upgrade numba \
scipy \
huggingface-hub[cli,hf_transfer] \
setuptools_scm
pip install -r requirements/rocm.txt
export PYTORCH_ROCM_ARCH="gfx1100" #根据自己的GPU架构 rocminfo | grep gfx
python setup.py develop
```
---
### 5.vllm中关键triton算子添加
#### 这里我给出两种解决方法第一种解决方法就是前面提到的优化到1.5到1.8s/it第二种方法有手动优化算子到矩阵乘法7900xtx肯定适用大概1.3s/it其他AMD GPU相对方案一也有提速但是不一定是最佳速度实现里面的手动部分可能需要微调。
**注意pip把triton 后端的flash_attn卸载了搞了半天各种尝试还是报错问题比较大直接不用就行了**
```
#定位自己vllm位置XXX
pip show vllm
```
**关键更改**
XXX/vllm/model_executor/models/qwen2_vl.py文件
**1.qwen2_vl.py文件33行下增加from .qwen2_vl_vision_kernels import triton_conv3d_patchify**
```
from collections.abc import Iterable, Mapping, Sequence
from functools import partial
from typing import Annotated, Any, Callable, Literal, Optional, Union
import torch
import torch.nn as nn
import torch.nn.functional as F
from .qwen2_vl_vision_kernels import triton_conv3d_patchify
```
**接下来分为方案一(2.1和3.1)和方案二(2.2和3.2),选取一种实现即可**
---
**方案1**
**2.1qwen2_vl.py文件498行class Qwen2VisionPatchEmbed(nn.Module),PS.就是这玩意AMD没有现成的内核算子导致回退**
```
class Qwen2VisionPatchEmbed(nn.Module):
def __init__(
self,
patch_size: int = 14,
temporal_patch_size: int = 2,
in_channels: int = 3,
embed_dim: int = 1152,
) -> None:
super().__init__()
self.patch_size = patch_size
self.temporal_patch_size = temporal_patch_size
self.embed_dim = embed_dim
kernel_size = (temporal_patch_size, patch_size, patch_size)
self.proj = nn.Conv3d(in_channels,
embed_dim,
kernel_size=kernel_size,
stride=kernel_size,
bias=False)
def forward(self, x: torch.Tensor) -> torch.Tensor:
L, C = x.shape
x_reshaped = x.view(L, -1, self.temporal_patch_size, self.patch_size,
self.patch_size)
# Call your custom Triton kernel instead of self.proj
x_out = triton_conv3d_patchify(x_reshaped, self.proj.weight)
# The output of our kernel is already the correct shape [L, embed_dim]
return x_out
```
**3.1XXX/vllm/model_executor/models/目录下创建qwen2_vl_vision_kernels.py文件用triton实现**
```
import torch
from vllm.triton_utils import tl, triton
@triton.jit
def _conv3d_patchify_kernel(
# Pointers to tensors
X, W, Y,
# Tensor dimensions
N, C_in, D_in, H_in, W_in,
C_out, KD, KH, KW,
# Stride and padding for memory access
stride_xn, stride_xc, stride_xd, stride_xh, stride_xw,
stride_wn, stride_wc, stride_wd, stride_wh, stride_ww,
stride_yn, stride_yc,
# Triton-specific metaparameters
BLOCK_SIZE: tl.constexpr,
):
"""
Triton kernel for a non-overlapping 3D patching convolution.
Each kernel instance computes one output value for one patch.
"""
# Get the program IDs for the N (patch) and C_out (output channel) dimensions
pid_n = tl.program_id(0) # The index of the patch we are processing
pid_cout = tl.program_id(1) # The index of the output channel we are computing
# --- Calculate memory pointers ---
# Pointer to the start of the current input patch
x_ptr = X + (pid_n * stride_xn)
# Pointer to the start of the current filter (weight)
w_ptr = W + (pid_cout * stride_wn)
# Pointer to where the output will be stored
y_ptr = Y + (pid_n * stride_yn + pid_cout * stride_yc)
# --- Perform the convolution (element-wise product and sum) ---
# This is a dot product between the flattened patch and the flattened filter.
accumulator = tl.zeros((BLOCK_SIZE,), dtype=tl.float32)
# Iterate over the elements of the patch/filter
for c_offset in range(0, C_in):
for d_offset in range(0, KD):
for h_offset in range(0, KH):
# Unrolled loop for the innermost dimension (width) for performance
for w_offset in range(0, KW, BLOCK_SIZE):
# Create masks to handle cases where KW is not a multiple of BLOCK_SIZE
w_range = w_offset + tl.arange(0, BLOCK_SIZE)
w_mask = w_range < KW
# Calculate offsets to load data
patch_offset = (c_offset * stride_xc + d_offset * stride_xd +
h_offset * stride_xh + w_range * stride_xw)
filter_offset = (c_offset * stride_wc + d_offset * stride_wd +
h_offset * stride_wh + w_range * stride_ww)
# Load patch and filter data, applying masks
patch_vals = tl.load(x_ptr + patch_offset, mask=w_mask, other=0.0)
filter_vals = tl.load(w_ptr + filter_offset, mask=w_mask, other=0.0)
# Multiply and accumulate
accumulator += patch_vals.to(tl.float32) * filter_vals.to(tl.float32)
# Sum the accumulator block and store the single output value
output_val = tl.sum(accumulator, axis=0)
tl.store(y_ptr, output_val)
def triton_conv3d_patchify(x: torch.Tensor, weight: torch.Tensor) -> torch.Tensor:
"""
Python wrapper for the 3D patching convolution Triton kernel.
"""
# Get tensor dimensions
N, C_in, D_in, H_in, W_in = x.shape
C_out, _, KD, KH, KW = weight.shape
# Create the output tensor
# The output of this specific conv is (N, C_out, 1, 1, 1), which we squeeze
Y = torch.empty((N, C_out), dtype=x.dtype, device=x.device)
# Define the grid for launching the Triton kernel
# Each kernel instance handles one patch (N) for one output channel (C_out)
grid = (N, C_out)
# Launch the kernel
# We pass all strides to make the kernel flexible
_conv3d_patchify_kernel[grid](
x, weight, Y,
N, C_in, D_in, H_in, W_in,
C_out, KD, KH, KW,
x.stride(0), x.stride(1), x.stride(2), x.stride(3), x.stride(4),
weight.stride(0), weight.stride(1), weight.stride(2), weight.stride(3), weight.stride(4),
Y.stride(0), Y.stride(1),
BLOCK_SIZE=16, # A reasonable default, can be tuned
)
return Y
```
---
**方案2**
**2.2qwen2_vl.py文件498行class Qwen2VisionPatchEmbed(nn.Module)函数,PS.就是这玩意AMD没有现成的内核算子导致回退这里我们直接5D张量一步到位改为矩阵乘法**
```
class Qwen2VisionPatchEmbed(nn.Module):
def __init__(
self,
patch_size: int = 14,
temporal_patch_size: int = 2,
in_channels: int = 3,
embed_dim: int = 1152,
) -> None:
super().__init__()
self.patch_size = patch_size
self.temporal_patch_size = temporal_patch_size
self.embed_dim = embed_dim
kernel_size = (temporal_patch_size, patch_size, patch_size)
self.proj = nn.Conv3d(in_channels,
embed_dim,
kernel_size=kernel_size,
stride=kernel_size,
bias=False)
def forward(self, x: torch.Tensor) -> torch.Tensor:
L, C = x.shape
x_reshaped_5d = x.view(L, -1, self.temporal_patch_size, self.patch_size,
self.patch_size)
return triton_conv3d_patchify(x_reshaped_5d, self.proj.weight)
```
**3.2XXX/vllm/model_executor/models/目录下创建qwen2_vl_vision_kernels.py文件用triton实现**
```
import torch
from vllm.triton_utils import tl, triton
@triton.jit
def _conv_gemm_kernel(
A, B, C, M, N, K,
stride_am, stride_ak,
stride_bk, stride_bn,
stride_cm, stride_cn,
BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr, BLOCK_K: tl.constexpr,
):
pid_m = tl.program_id(0)
pid_n = tl.program_id(1)
offs_m = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
offs_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
offs_k = tl.arange(0, BLOCK_K)
a_ptrs = A + (offs_m[:, None] * stride_am + offs_k[None, :] * stride_ak)
b_ptrs = B + (offs_k[:, None] * stride_bk + offs_n[None, :] * stride_bn)
accumulator = tl.zeros((BLOCK_M, BLOCK_N), dtype=tl.float32)
for k in range(0, K, BLOCK_K):
a = tl.load(a_ptrs, mask=(offs_m[:, None] < M) & (offs_k[None, :] < K), other=0.0)
b = tl.load(b_ptrs, mask=(offs_k[:, None] < K) & (offs_n[None, :] < N), other=0.0)
accumulator += tl.dot(a, b)
a_ptrs += BLOCK_K * stride_ak
b_ptrs += BLOCK_K * stride_bk
offs_k += BLOCK_K
c = accumulator.to(C.dtype.element_ty)
offs_cm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
offs_cn = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
c_ptrs = C + stride_cm * offs_cm[:, None] + stride_cn * offs_cn[None, :]
c_mask = (offs_cm[:, None] < M) & (offs_cn[None, :] < N)
tl.store(c_ptrs, c, mask=c_mask)
def triton_conv3d_patchify(x_5d: torch.Tensor, weight_5d: torch.Tensor) -> torch.Tensor:
N_patches, _, _, _, _ = x_5d.shape
C_out, _, _, _, _ = weight_5d.shape
A = x_5d.view(N_patches, -1)
B = weight_5d.view(C_out, -1).transpose(0, 1).contiguous()
M, K = A.shape
_K, N = B.shape
assert K == _K
C = torch.empty((M, N), device=A.device, dtype=A.dtype)
# --- 针对7900xtx的手动调优配置其他GPU的最优组合可能需要自行寻找AMD的autotune效果就是没有效果 ---
best_config = {
'BLOCK_M': 128,
'BLOCK_N': 128,
'BLOCK_K': 32,
}
num_stages = 4
num_warps = 8
grid = (triton.cdiv(M, best_config['BLOCK_M']),
triton.cdiv(N, best_config['BLOCK_N']))
_conv_gemm_kernel[grid](
A, B, C,
M, N, K,
A.stride(0), A.stride(1),
B.stride(0), B.stride(1),
C.stride(0), C.stride(1),
**best_config,
num_stages=num_stages,
num_warps=num_warps
)
return C
```
---
**4.关闭终端后再次使用mineru-gradio会报一个Lora错误修改代码跳过它**
```
pip show mineru_vl_utils
```
打开该文件XXX/mineru_vl_utils/vlm_client/vllm_async_engine_client.py修改第58行self.tokenizer = vllm_async_llm.tokenizer.get_lora_tokenizer()为:
```
try:
self.tokenizer = vllm_async_llm.tokenizer.get_lora_tokenizer()
except AttributeError:
# 如果没有 get_lora_tokenizer 方法,直接使用原始 tokenizer
self.tokenizer = vllm_async_llm.tokenizer
```
**最后整两个环境变量后愉快玩耍即可**
```
export MINERU_MODEL_SOURCE=modelscope
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
```
---
### 6.vllm后端已经没有问题下面是pipeline 中layout用的doclayout-yolo模型空洞卷积问题
### 我在 [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO/issues/120#issuecomment-3368144275) 下做了一个回答,因此 pipeline 的空洞卷积问题不在这里赘述,直接点击链接查看即可。
查看自己doclayout-yolo安装位置如下然后进入修改链接中回复介绍的文件即可
```
pip show doclayout-yolo
```

View File

@@ -0,0 +1,64 @@
#### 1 系统
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
昇腾910B2
驱动 23.0.6.2
CANN 7.5.X
Miner U 2.1.9
#### 2 踩坑记录
坑1 **图形库相关的问题总之就是动态库导致TLS的内存分配失败OpenCV库在ARM64架构上的兼容性问题**
⭐这个错误 ImportError: /lib/aarch64-linux-gnu/libGLdispatch.so.0: cannot allocate memory in static TLS block 是由于OpenCV库在ARM64架构上的兼容性问题导致的。从错误堆栈可以看到问题出现在导入cv2模块时这发生在MinerU的VLM后端初始化过程中。
解决方法:
1 安装减少内存问题的opencv版本
```
pip install --upgrade albumentations albucore simsimd# Uninstall current opencv
pip uninstall opencv-python opencv-contrib-python
# Install headless version (no GUI dependencies)
pip install opencv-python-headless
python -c "import cv2; print(cv2.__version__)"2 apt-get install一些包
```
换成清华源然后重命名为sources.list.tuna然后挪到根目录下面
```
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ focal-security main restricted universe multiversesudo apt-get update -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
sudo apt-get install libgl1-mesa-glx -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
sudo apt-get install libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1 -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
sudo apt-get install libgl1-mesa-dev libgles2-mesa-dev -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
sudo apt-get install libgomp1 -o Dir::Etc::sourcelist="sources.list.tuna" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
export OPENCV_IO_ENABLE_OPENEXR=0 export QT_QPA_PLATFORM=offscreen
```
↑这些不知道哪些好使,或者有没有好使的
3 强制覆盖conda环境自带的动态库conda的和系统的冲突
```
查找find /usr/lib /lib /root/.local/conda -name "libgomp.so*" 2>/dev/null
export LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libstdc++.so.6:/usr/lib/aarch64-linux-gnu/libgomp.so.1"
export LD_PRELOAD=/lib/aarch64-linux-gnu/libGLdispatch.so.0:$LD_PRELOAD
```
此外还可以把conda环境中自带的的强制挪走
```
mv $CONDA_PREFIX/lib/libstdc++.so.6 $CONDA_PREFIX/lib/libstdc++.so.6.bak
mv $CONDA_PREFIX/lib/libgomp.so.1 $CONDA_PREFIX/lib/libgomp.so.1.bak
mv $CONDA_PREFIX/lib/libGLdispatch.so.0 $CONDA_PREFIX/lib/libGLdispatch.so.0.bak # 如果有的话
simsimd包相关
mv /root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/simsimd./libgomp-947d5fa1.so.1.0.0 /root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/simsimd./libgomp-947d5fa1.so.1.0.0.bak
```
或者:
降级simsimd 3.7.2
降级albumentations 1.3.1
sklean包相关
```
# 找到 scikit-learn 内部的 libgomp 路径
SKLEARN_LIBGOMP="/root/.local/conda/envs/pdfparser/lib/python3.10/site-packages/scikit_learn.libs/libgomp-947d5fa1.so.1.0.0"
# 预加载这个特定的 libgomp 版本
export LD_PRELOAD="$SKLEARN_LIBGOMP:$LD_PRELOAD"
```
4 其他
torch / torch_npu 2.5.1
pip install "numpy<2.0" 2.0和昇腾不兼容
export MINERU_MODEL_SOURCE=modelscope

View File

@@ -0,0 +1,117 @@
## 在C500+MACA上部署并使用Mineru
### 获取MACA镜像包含torch-maca,maca,sglang-maca
镜像获取地址https://developer.metax-tech.com/softnova/docker ,
选择maca-c500-pytorch:2.33.0.6-ubuntu22.04-amd64
若在docker上部署镜像则需要启动GPU设备访问
```bash
docker run --device=/dev/dri --device=/dev/mxcd....
```
#### 注意事项
由于此镜像默认开启TORCH_ALLOW_TF32_CUBLAS_OVERRIDE会导致backed:vlm-transformers推理结果错误
```bash
unset TORCH_ALLOW_TF32_CUBLAS_OVERRIDE
```
### 安装MinerU
使用--no-deps去除对一些cuda版本包的依赖后续采用pip install-r requirements.txt 安装其他依赖
```bash
pip install -U "mineru[core]" --no-deps
```
```tex
boto3>=1.28.43
click>=8.1.7
loguru>=0.7.2
numpy==1.26.4
pdfminer.six==20250506
tqdm>=4.67.1
requests
httpx
pillow>=11.0.0
pypdfium2>=4.30.0
pypdf>=5.6.0
reportlab
pdftext>=0.6.2
modelscope>=1.26.0
huggingface-hub>=0.32.4
json-repair>=0.46.2
opencv-python>=4.11.0.86
fast-langdetect>=0.2.3,<0.3.0
transformers>=4.51.1
accelerate>=1.5.1
pydantic
matplotlib>=3.10,<4
ultralytics>=8.3.48,<9
dill>=0.3.8,<1
rapid_table>=1.0.5,<2.0.0
PyYAML>=6.0.2,<7
ftfy>=6.3.1,<7
openai>=1.70.0,<2
shapely>=2.0.7,<3
pyclipper>=1.3.0,<2
omegaconf>=2.3.0,<3
transformers>=4.49.0,!=4.51.0,<5.0.0
fastapi
python-multipart
uvicorn
gradio>=5.34,<6
gradio-pdf>=0.0.22
albumentations
beautifulsoup4
scikit-image==0.25.0
outlines==0.1.11
magika>=0.6.2,<0.7.0
mineru-vl-utils>=0.1.6,<1
```
上述内容保存为requirments.txt,进行安装
```bash
pip install -r requirments.txt
```
安装doclayout_yolo这里doclayout_yolo会依赖torch-cuda,使用--no-deps
```bash
pip install doclayout-yolo --no-deps
```
### 在线使用
**基础使用命令为:mineru -p <input_path> -o <output_path> -b vlm-transformers**
- `<input_path>`: Local PDF/image file or directory
- `<output_path>`: Output directory
- -b --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client] (default:pipeline)<br/>
其他详细使用命令可参考官方文档[Quick Usage - MinerU](https://opendatalab.github.io/MinerU/usage/quick_usage/#quick-model-source-configuration)
### 离线使用
**所用模型为本地模型需要设置环境变量和config配置文件**<br/>
#### 下载模型到本地
通过mineru交互式命令行工具进行下载下载完后会自动更新mineru.json配置文件
```bash
mineru-models-download
```
也可以在[HuggingFace](http://www.huggingface.co.)或[ModelScope](https://www.modelscope.cn/home)找到所需模型源PDF-Extract-Kit-1.0和MinerU2.5-2509-1.2B)进行下载,
下载完成后创建mineru.json文件按如下进行修改
```json
{
"models-dir": {
"pipeline": "/path/pdf-extract-kit-1.0/",
"vlm": "/path/MinerU2.5-2509-1.2B"
},
"config_version": "1.3.0"
}
```
path为本地模型的存储路径其中models-dir为本地模型的路径pipeline代表backend为pipeline时所需要的模型路径vlm代表backend为vlm-开头,所需要的模型路径
#### 修改环境变量
```bash
export MINERU_MODEL_SOURCE=local
export MINERU_TOOLS_CONFIG_JSON=/path/mineru.json //此环境变量为配置文件的路径
```
修改完成后即可正常使用<br/>

View File

@@ -0,0 +1,73 @@
# TECO适配
## 快速开始
使用本工具执行推理的主要流程如下:
1. 基础环境安装:介绍推理前需要完成的基础环境检查和安装。
3. 构建Docker环境介绍如何使用Dockerfile创建模型推理时所需的Docker环境。
4. 启动推理:介绍如何启动推理。
### 1 基础环境安装
请参考[Teco用户手册的安装准备章节](http://docs.tecorigin.com/release/torch_2.4/v2.2.0/#fc980a30f1125aa88bad4246ff0cedcc),完成训练前的基础环境检查和安装。
### 2 构建docker
#### 2.1 执行以下命令下载Docker镜像至本地Docker镜像包pytorch-3.0.0-torch_sdaa3.0.0.tar
wget 镜像下载链接(链接获取请联系太初内部人员)
#### 2.2 校验Docker镜像包执行以下命令生成MD5码是否与官方MD5码b2a7f60508c0d199a99b8b6b35da3954一致
md5sum pytorch-3.0.0-torch_sdaa3.0.0.tar
#### 2.3 执行以下命令导入Docker镜像
docker load < pytorch-3.0.0-torch_sdaa3.0.0.tar
#### 2.4 执行以下命令构建名为MinerU的Docker容器
docker run -itd --name="MinerU" --net=host --device=/dev/tcaicard0 --device=/dev/tcaicard1 --device=/dev/tcaicard2 --device=/dev/tcaicard3 --cap-add SYS_PTRACE --cap-add SYS_ADMIN --shm-size 64g jfrog.tecorigin.net/tecotp-docker/release/ubuntu22.04/x86_64/pytorch:3.0.0-torch_sdaa3.0.0 /bin/bash
#### 2.5 执行以下命令进入名称为tecopytorch_docker的Docker容器。
docker exec -it MinerU bash
### 3 执行以下命令安装MinerU
- 安装前的准备
```
cd <MinerU>
pip install --upgrade pip
pip install uv
```
- 由于镜像中安装了torch并且不需要安装nvidia-nccl-cu12、nvidia-cudnn-cu12等包因此需要注释掉一部分安装依赖。
- 请注释掉<MinerU>/pyproject.toml文件中所有的"doclayout_yolo==0.0.4"依赖并且将torch开头的包也注释掉。
- 执行以下命令安装MinerU
```
uv pip install -e .[core]
```
- 下载安装doclayout_yolo==0.0.4
```
pip install doclayout_yolo==0.0.4 --no-deps
```
- 下载安装其他包(doclayout_yolo==0.0.4的依赖)
```
pip install albumentations py-cpuinfo seaborn thop numpy==1.24.4
```
- 由于部分张量内部内存分布不连续,需要修改如下两个文件
<ultralytics安装路径>/ultralytics/utils/tal.py(330行左右,将view --> reshape)
<doclayout_yolo安装路径>/doclayout_yolo/utils/tal.py(375行左右,将view --> reshape)
### 4 执行推理
- 开启sdaa环境
```
export TORCH_SDAA_AUTOLOAD=cuda_migrate
```
- 首次运行推理命令前请添加以下环境下载模型权重
```
export HF_ENDPOINT=https://hf-mirror.com
```
- 运行以下命令执行推理
```
mineru -p 'input path' -o 'output_path' --lang 'model_name'
```
其中model_name可从'ch', 'ch_server', 'ch_lite', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka', 'latin', 'arabic', 'east_slavic', 'cyrillic', 'devanagari'选择
### 5 适配用到的软件栈版本列表
使用v3.0.0软件栈版本适配,获取方式联系太初内部人员

View File

@@ -81,7 +81,16 @@ MinerU命令行工具的某些参数存在相同功能的环境变量配置
- `MINERU_FORMULA_ENABLE`
* 用于启用公式解析
* 默认为`true`,可通过环境变量设置为`false`来禁用公式解析。
- `MINERU_FORMULA_CH_SUPPORT`
* 用于启用中文公式解析优化(实验性功能)
* 默认为`false`,可通过环境变量设置为`true`来启用中文公式解析优化。
* 仅对`pipeline`后端生效。
- `MINERU_TABLE_ENABLE`
* 用于启用表格解析
* 默认为`true`,可通过环境变量设置为`false`来禁用表格解析。
- `MINERU_TABLE_MERGE_ENABLE`
* 用于启用表格合并功能
* 默认为`true`,可通过环境变量设置为`false`来禁用表格合并功能。

View File

@@ -3,11 +3,28 @@
本章节提供了项目的完整使用说明。我们将通过以下几个部分,帮助您从基础到进阶逐步掌握项目的使用方法:
## 目录
- [快速使用](./quick_usage.md) - 快速上手和基本使用
- [模型源配置](./model_source.md) - 模型源的详细配置说明
- [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
- [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
- 本地部署
* [快速使用](./quick_usage.md) - 快速上手和基本使用
* [模型源配置](./model_source.md) - 模型源的详细配置说明
* [命令行工具](./cli_tools.md) - 命令行工具的详细参数说明
* [进阶优化参数](./advanced_cli_parameters.md) - 一些适配命令行工具的进阶参数说明
- 插件与生态
* [Cherry Studio](plugin/Cherry_Studio.md)
* [Sider](plugin/Sider.md)
* [Dify](plugin/Dify.md)
* [n8n](plugin/n8n.md)
* [Coze](plugin/Coze.md)
* [FastGPT](plugin/FastGPT.md)
* [ModelWhale](plugin/ModelWhale.md)
* [DingTalk](plugin/DingTalk.md)
* [DataFlow](plugin/DataFlow.md)
* [BISHENG](plugin/BISHENG.md)
* [RagFlow](plugin/RagFlow.md)
- 其他加速卡适配(由社区贡献)
* [昇腾 Ascend](acceleration_cards/Ascend.md) [#3233](https://github.com/opendatalab/MinerU/discussions/3233)
* [沐曦 METAX](acceleration_cards/METAX.md) [#3477](https://github.com/opendatalab/MinerU/pull/3477)
* [AMD](acceleration_cards/AMD.md) [#3662](https://github.com/opendatalab/MinerU/discussions/3662)
* [太初元碁 Tecorigin](acceleration_cards/Tecorigin.md) [#3767](https://github.com/opendatalab/MinerU/pull/3767)
## 开始使用

View File

@@ -0,0 +1,11 @@
# BISHENG 简介
BISHENG毕昇 是一款开源 LLM应用开发平台主攻企业场景 已有大量行业头部组织及世界500强企业在使用。“毕昇”是活字印刷术的发明人活字印刷术为人类知识的传递起到了巨大的推动作用。BISHENG毕昇团队希望“BISHENG毕昇”同样能够为智能应用的广泛落地提供有力支撑。
![](../../../assets/Images/BISHENG_01.png)
- 官网地址https://bisheng.dataelem.com/
- Miner 在BISHENG毕昇 项目中的插件项目https://github.com/dataelement/bisheng/pulls
特别鸣谢 [@pzc163](https://github.com/pzc163)

View File

@@ -0,0 +1,238 @@
# Cherry Studio 简介
Cherry Studio 是一款功能强大的多模型 AI 客户端软件,支持 Windows、macOS 和 Linux 等多平台运行,集成了 OpenAI、DeepSeek、Gemini、Anthropic 等主流 AI 云服务同时支持本地模型运行用户可以灵活切换不同的AI模型。
目前MinerU 强大的文档解析能力已深度集成到 Cherry Studio 的知识库与对话交互中,为用户带来更便捷的文档处理与信息获取体验。
![img](../../../assets/images/Cherry_Studio_1.png)
- Cherry Studio 官网地址https://www.cherry-ai.com/
# MinerU 在 Cherry Studio 中的使用方法
## 进入 Cherry Studio 设置
a. 打开 Cherry Studio 应用程序
b. 点击左下角的"设置"按钮,进入设置页面
c. 在左侧菜单中,选择"MCP 服务器"
在右侧的 MCP 服务器配置界面中,您可以看到已有的 MCP 服务器列表。点击右上角的"添加服务器"按钮来创建新的 MCP 服务,或者点击现有服务来编辑配置。
## 添加 MinerU-MCP 配置
点击"添加服务器"后,您将看到一个配置表单。请按以下步骤填写:
**a. 名称**:输入"MinerU-MCP"或您喜欢的其他名称
**b. 描述**:可选,如"文档转换为Markdown工具"
**c. 类型**:选择"标准输入/输出stdio"
**d. 命令**:输入 uvx
**e. 参数**:输入 mineru-mcp
**f. 环境变量**:添加以下环境变量
```Plain
MINERU_API_BASE=https://mineru.net
MINERU_API_KEY=您的API密钥
OUTPUT_DIR=./downloads
USE_LOCAL_API=false
LOCAL_MINERU_API_BASE=http://localhost:8888
```
使用 *`uvx`* 命令可以自动处理 mineru-mcp 的安装和运行,**无需预先手动安装 mineru-mcp 包**。这是最简单的配置方式。
## 保存配置
确认无误后,点击界面右上角的"保存"按钮完成配置。保存后MCP 服务器列表中会显示您刚刚添加的 MinerU-MCP 服务。
![img](../../../assets/images/Cherry_Studio_2.png)
![img](../../../assets/images/Cherry_Studio_3.png)
## 使用 Cherry Studio 中的 MinerU MCP
一旦配置完成,您可以在 Cherry Studio 中的对话中使用 MinerU MCP 工具。在 Cherry Studio 中,您可以使用如下提示让模型调用 MinerU MCP 工具。模型会自动识别任务并调用相应的工具。
## 示例 1: 使用 URL 转换文档
**用户输入:**
```Plain
请使用 MinerU MCP 将以下 URL 的 PDF 文档转换为 Markdown 格式https://example.com/sample.pdf
```
**模型将执行的步骤:**
模型识别这是文档转换任务,并调用 *`parse_documents`* 工具,参数为:
```Plain
{"file_sources": "https://example.com/sample.pdf"}
```
工具处理完成后,模型会告知您转换结果。
![img](../../../assets/images/Cherry_Studio_4.png)
## 示例 2: 转换本地文档
**用户输入:**
```Plain
请使用 MinerU-MCP 将本地的 D://sample.pdf 文件转换为 Markdown 格式
```
**模型将执行的步骤:**
模型识别这是本地文档转换任务,调用 `parse_documents` 工具,参数为:
```Plain
{"file_sources": "D://sample.pdf"}
```
![img](../../../assets/images/Cherry_Studio_5.png)
## 示例 3: 启用 OCR 处理扫描文档
**用户输入:**
```Plain
请使用 MinerU-MCP 将以下 URL 的扫描版 PDF 文档转换为 Markdown 格式,并启用 OCR 功能:
https://example.com/scanned.pdf
```
**模型将执行的步骤:**
模型识别这是需要 OCR 处理的文档转换任务,调用 `parse_documents` 工具,并启用 OCR 参数:
```Plain
{"file_sources": "https://example.com/scanned.pdf","enable_ocr": true}
```
![img](../../../assets/images/Cherry_Studio_6.png)
## 示例 4: 完整对话流程
以下是一个完整的对话流程示例:
**用户:**
```Plain
我有一份学术论文的 PDF网址是 https://arxiv.org/pdf/2303.08774.pdf能帮我转换成 Markdown 格式吗?
```
**模型:**
```Plain
我可以帮您将这份学术论文转换为 Markdown 格式。我将使用 MinerU-MCP 工具来处理这个任务。
[模型调用 parse_documents 工具,处理过程中...]
论文已成功转换为 Markdown 格式。转换结果如下:
# The Capacity of Diffusion Models to Memorize and Generate Training Data
## Abstract
Recent diffusion models can generate high-quality images that are nearly indistinguishable from real ones...
[显示论文内容...]
```
![img](../../../assets/images/Cherry_Studio_7.png)
## 工具参数详解
在使用过程中,模型会根据您的指令自动选择合适的工具和参数。以下是主要工具的参数说明:
### ● *parse_documents* 工具参数
![img](../../../assets/images/Cherry_Studio_8.png)
### ● *get_ocr_languages* 工具参数
无需参数用于获取OCR支持的语言列表。
## 高级用法
### 指定语言和页码范围
**用户输入:**
```Plain
请使用 MinerU MCP 将以下 URL 的文档转换为 Markdown 格式,只处理第 5-10 页并指定语言为中文https://example.com/document.pdf
```
模型会使用 *`parse_documents`* 工具,并设置 *`language`* 参数为 "ch"*`page_ranges`* 参数为 "5-10"。
### 批量处理多个文档
**用户输入:**
```Plain
请使用 MinerU-MCP 将以下多个 URL 的文档转换为 Markdown 格式:
https://example.com/doc1.pdf
https://example.com/doc2.pdf
https://example.com/doc3.pdf
```
模型会调用 *`parse_documents`* 工具,并将多个 URL 以逗号分隔传入 *`file_sources`* 参数。
## 注意事项
● 当设置 *`USE_LOCAL_API=true`*使用本地配置的API进行解析
● 当设置 *`USE_LOCAL_API=false`* 时,会使用 MinerU 官网的API进行解析
● 处理大型文档可能需要较长时间,请耐心等待
● 如果遇到超时问题请考虑分批处理文档或使用本地API模式
## 常见问题与解决方案
### 无法启动 MCP 服务
**问题**:运行 *`uv run -m mineru.cli`*` `时报错。
**解决方案**
● 确保已激活虚拟环境
● 检查是否已安装所有依赖
● 尝试使用 *`python -m mineru.cli`*` `命令替代
### 文件转换失败
**问题**:文件上传成功但转换失败。
**解决方案**
● 检查文件格式是否受支持
● 确认API密钥是否正确
● 查看MCP服务日志获取详细错误信息
### 文件路径问题
**问题**:使用 `parse_documents` 工具处理本地文件时报找不到文件错误。
**解决方案**:请确保使用绝对路径,或者相对于服务器运行目录的正确相对路径。
### MCP 服务调用超时问题
**问题**:调用 *`parse_documents`* 工具时出现 *`Error calling tool 'parse_documents': MCP error -32001: Request timed out`* 错误。
**解决方案**:这个问题常见于处理大型文档或网络不稳定的情况。在某些 MCP 客户端(如 Cursor超时后可能导致无法再次调用 MCP 服务,需要重启客户端。最新版本的 Cursor 中可能会显示正在调用 MCP但实际上没有真正调用成功。建议
**● 等待官方修复**这是Cursor客户端的已知问题建议等待Cursor官方修复
**● 处理小文件**:尽量只处理少量小文件,避免处理大型文档导致超时
**● 分批处理**:将多个文件分成多次请求处理,每次只处理一两个文件
● 增加超时时间设置(如果客户端支持)
● 对于超时后无法再次调用的问题,需要重启 MCP 客户端
● 如果反复出现超时,请检查网络连接或考虑使用本地 API 模式

View File

@@ -0,0 +1,92 @@
# Coze 简介
Coze中文版名称扣子 是字节跳动推出的零代码 AI 应用开发平台。无论用户是否有编程经验都可以通过该平台快速创建各种类型的聊天机器人、智能体、AI 应用和插件,并将其部署在社交平台和即时聊天应用程序中。
目前MinerU 插件已在 Coze 插件商店上线,通过其强大的文档解析能力,为用户搭建智能体与工作流提供文档解析能力,加快用户 AI 应用的开发。
![img](../../../assets/images/coze_0.png)
- 扣子官网地址https://www.coze.cn/
- MinerU 扣子插件下载地址https://www.coze.cn/store/plugin/7527957359730360354
# MinerU 在 Coze 中的使用方法
## **Coze集成应用**
- 进入 https://www.coze.cn/home coze 开发平台
## 智能体
### 工作空间 -> 项目开发 -> 创建 -> 创建智能体 -> 创建 -> 输入项目名
![img](../../../assets/images/Coze_1.png)
![img](../../../assets/images/Coze_2.png)
### 插件配置 -> 添加 `插件` -> 搜索 `MinerU`
![img](../../../assets/images/Coze_3.png)
### 添加 `parse_file` 工具(在线版)
![img](../../../assets/images/Coze_4.png)
### 选择 `MinerU` 插件 -> 编辑参数 -> 填写 api key
![img](../../../assets/images/Coze_5.png)
![img](../../../assets/images/Coze_6.png)
> 记得关闭 url 和 token 显示
### 调试 `智能体`
![img](../../../assets/images/Coze_7.png)
## 工作流
> 用工作流的方式使用 minerU
### 工作流 -> 创建工作流
![img](../../../assets/images/Coze_8.png)
![img](../../../assets/images/Coze_9.png)
### 工作流插件配置 -> 添加 `插件` -> 搜索 `MinerU` -> 添加
![img](../../../assets/images/Coze_10.png)
![img](../../../assets/images/Coze_11.png)
### 选择`MinerU` 插件 -> 编辑参数 -> 填写 api key
![img](../../../assets/images/Coze_12.png)
### 选择开始节点 -> 配置 `input` 类型为文件类型 -> 连接到 `mineru` 节点
![img](../../../assets/images/Coze_13.png)
![img](../../../assets/images/Coze_14.png)
### 选择结束节点 -> 连接到 `mineru` 节点 -> 配置 `output` 输出为 `mineru` 节点的 `parse_file.text`
![img](../../../assets/images/Coze_15.png)
![img](../../../assets/images/Coze_16.png)
### 上传文件 -> 试运行
![img](../../../assets/images/Coze_17.png)
![img](../../../assets/images/Coze_18.png)
### 发布 -> 添加到当前智能体
![img](../../../assets/images/Coze_19.png)
![img](../../../assets/images/Coze_20.png)
### 移除 `mineru` 插件 -> 调试
![img](../../../assets/images/Coze_21.png)

Some files were not shown because too many files have changed in this diff Show More