Compare commits

..

293 Commits

Author SHA1 Message Date
Xiaomeng Zhao
a0da3029fd Update mineru/model/utils/pytorchocr/modeling/backbones/rec_lcnetv3.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 15:54:12 +08:00
Xiaomeng Zhao
30fe325428 Update mineru/model/utils/tools/infer/predict_rec.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 15:53:55 +08:00
Xiaomeng Zhao
6131013ce9 Merge pull request #3822 from opendatalab/dev
Dev
2025-10-24 15:46:40 +08:00
Xiaomeng Zhao
f1c145054a Merge pull request #3821 from myhloli/dev
Dev
2025-10-24 15:46:09 +08:00
myhloli
078aaaf150 fix: remove unnecessary parameters from kwargs in vlm_analyze.py initialization 2025-10-24 15:39:44 +08:00
myhloli
c3a55fffab fix: add utility functions for GPU memory utilization and batch size configuration 2025-10-24 15:29:23 +08:00
Xiaomeng Zhao
4eddf28c8f Merge pull request #3820 from opendatalab/dev
Dev
2025-10-24 14:59:35 +08:00
Xiaomeng Zhao
dd92c5b723 Merge pull request #3819 from myhloli/dev
update docs
2025-10-24 14:59:03 +08:00
myhloli
b5922086cb fix: add environment variable configurations for Chinese formula parsing and table merging features 2025-10-24 14:53:00 +08:00
myhloli
df12e4fc79 fix: update README and utils for table merge feature and environment variable configuration 2025-10-24 11:37:14 +08:00
myhloli
90ed311198 fix: refactor table merging logic and add cross-page table merge utility 2025-10-24 10:52:05 +08:00
myhloli
c922c63fbc fix: correct formatting in kernel initialization in rec_lcnetv3.py 2025-10-24 10:22:10 +08:00
myhloli
28b278508f fix: add error handling for PDF conversion in common.py 2025-10-24 10:19:50 +08:00
Xiaomeng Zhao
6b54f321b4 Merge pull request #3814 from myhloli/dev
Dev
2025-10-23 18:00:51 +08:00
myhloli
e47ec7cd10 fix: refactor language lists for improved readability and maintainability in gradio_app.py and pytorch_paddle.py 2025-10-23 17:51:26 +08:00
myhloli
701f6018f2 fix: add logging for improved traceability in prediction logic of predict_formula.py 2025-10-23 17:26:16 +08:00
myhloli
5ade203e31 fix: remove commented-out code for autocasting in prediction logic of predict_formula.py 2025-10-23 17:12:00 +08:00
Xiaomeng Zhao
6e83f37754 Merge branch 'opendatalab:dev' into dev 2025-10-23 17:09:20 +08:00
Xiaomeng Zhao
972161a991 Merge pull request #3812 from Sidney233/dev
feat: add PPv5 arabic cyrillic devanagari ta te
2025-10-23 17:08:52 +08:00
Sidney233
700e11d342 feat: add PPv5 arabic cyrillic devanagari ta te 2025-10-23 16:49:01 +08:00
myhloli
fd79885b23 fix: remove commented-out code for autocasting in prediction logic of predict_formula.py 2025-10-23 16:03:34 +08:00
myhloli
a0810b5b6e fix: add debug logging for LaTeX text processing in processors.py 2025-10-23 02:30:47 +08:00
myhloli
39271b45de fix: adjust batch size calculation in prediction logic of predict_formula.py 2025-10-23 02:15:14 +08:00
Xiaomeng Zhao
db68aaf4ac Merge pull request #3806 from myhloli/dev
fix: update Gradio API access instructions in quick_usage.md
2025-10-22 22:51:37 +08:00
myhloli
a6cc8fa90d fix: update Gradio API access instructions in quick_usage.md 2025-10-22 22:50:36 +08:00
Xiaomeng Zhao
47f34f4ce8 Merge pull request #3805 from myhloli/dev
fix: handle empty input in prediction logic of predict_formula.py
2025-10-22 22:21:38 +08:00
myhloli
b7a8347f45 fix: handle empty input in prediction logic of predict_formula.py 2025-10-22 22:20:06 +08:00
Xiaomeng Zhao
c6d241f4f4 Merge pull request #3804 from myhloli/dev
fix: update model paths in models_download.py to include pp_formulanet_plus_m
2025-10-22 20:47:26 +08:00
myhloli
06b2fda1c1 fix: update model paths in models_download.py to include pp_formulanet_plus_m 2025-10-22 20:46:15 +08:00
Xiaomeng Zhao
5c1ca9271e Merge pull request #3803 from myhloli/dev
Dev
2025-10-22 20:33:42 +08:00
Xiaomeng Zhao
e7485c5d79 Update mineru/model/mfr/pp_formulanet_plus_m/predict_formula.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 20:32:36 +08:00
Xiaomeng Zhao
80436a89f9 Update mineru/model/utils/pytorchocr/modeling/heads/rec_ppformulanet_head.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 20:32:06 +08:00
Xiaomeng Zhao
b36793cef0 Update mineru/model/mfr/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-22 20:31:50 +08:00
myhloli
43b51e78fc fix: add environment variable handling for table merging in JSON processing 2025-10-22 20:19:59 +08:00
myhloli
9688f73046 fix: update package path for PaddleOCR utilities in pyproject.toml 2025-10-22 20:08:52 +08:00
myhloli
c02edd9cba fix: correct docstring for remove_up_commands function in utils.py 2025-10-22 20:07:11 +08:00
myhloli
b4d08e994c feat: implement LaTeX formatting utilities and refactor processing logic 2025-10-22 20:02:59 +08:00
myhloli
a220b8a208 refactor: enhance title hierarchy logic and update model configuration 2025-10-22 15:57:07 +08:00
myhloli
ab480a7a86 fix: update progress bar description in formula prediction 2025-10-22 15:51:56 +08:00
myhloli
f57a6d8d9e refactor: remove commented-out device assignment in predict_formula.py 2025-10-21 18:45:21 +08:00
myhloli
915ba87f7d feat: adjust batch size calculation and enhance device management in model heads 2025-10-21 18:21:25 +08:00
myhloli
42a95e8e20 refactor: improve variable naming and streamline input processing in predict_formula.py 2025-10-21 14:57:57 +08:00
Xiaomeng Zhao
a513357607 Merge pull request #3779 from myhloli/dev
mfr add paddle
2025-10-20 19:14:46 +08:00
Xiaomeng Zhao
c8ccf4cf20 Update mineru/model/utils/pytorchocr/modeling/heads/rec_ppformulanet_head.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-20 19:14:16 +08:00
Xiaomeng Zhao
33d43a5afc Update mineru/model/utils/pytorchocr/modeling/heads/rec_ppformulanet_head.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-20 19:14:05 +08:00
Xiaomeng Zhao
3b057c7996 Merge pull request #19 from myhloli/mfr-add-paddle
Mfr add paddle
2025-10-20 18:59:48 +08:00
myhloli
34547262a2 refactor: remove unused Formula constant from model_list.py 2025-10-20 18:57:35 +08:00
myhloli
cd0ed982c0 fix: revert MFR_MODEL to unimernet_small in model initialization 2025-10-20 18:55:30 +08:00
myhloli
52dcbcbfa5 Bump mineru-vl-utils version to 0.1.14 2025-10-20 15:03:39 +08:00
myhloli
0758de6d24 Update vllm version and increase default GPU memory utilization 2025-10-20 11:45:58 +08:00
Xiaomeng Zhao
ae7892a6f9 Merge pull request #3770 from myhloli/dev
Update acceleration card links to include discussion and pull request references
2025-10-17 19:01:33 +08:00
myhloli
73567ccedc Update acceleration card links to include discussion and pull request references 2025-10-17 19:00:15 +08:00
Xiaomeng Zhao
bb552282f3 Merge pull request #3769 from myhloli/dev
Add support for domestic acceleration cards in documentation
2025-10-17 18:54:34 +08:00
myhloli
14c38101f7 Add support for domestic acceleration cards in documentation 2025-10-17 18:53:31 +08:00
Xiaomeng Zhao
cb3a30e9ad Merge pull request #3768 from myhloli/dev
Add support for domestic acceleration cards in documentation
2025-10-17 18:41:31 +08:00
myhloli
f4db41d0cb Add support for domestic acceleration cards in documentation 2025-10-17 18:40:40 +08:00
Xiaomeng Zhao
dad59f7d52 Merge pull request #3760 from magicyuan876/master
feat(tianshu): v2.0 架构升级 - Worker主动拉取模式
2025-10-17 18:31:38 +08:00
myhloli
499e877165 refactor: rename files and update import paths for consistency 2025-10-17 18:09:19 +08:00
myhloli
2d249666ba feat: integrate PP-FormulaNet_plus-M architecture and update model initialization 2025-10-17 17:00:22 +08:00
Magic_yuan
cedc62a728 完善markitdown依赖 2025-10-17 16:17:03 +08:00
Xiaomeng Zhao
1e40bac24f Merge pull request #3761 from Sidney233/dev
feat: add PPFormula
2025-10-17 14:40:10 +08:00
Sidney233
23701d0db4 feat: add PPFormula 2025-10-17 14:02:26 +08:00
Magic_yuan
e7d8bf097a 修复codereview建议 2025-10-17 13:04:49 +08:00
Magic_yuan
08a89aeca1 feat(tianshu): v2.0 架构升级 - Worker主动拉取模式
主要改进:
- Worker主动拉取任务,响应速度提升10-20倍 (5-10s → 0.5s)
- 数据库并发安全增强,使用原子操作防止任务重复
- 调度器变为可选监控组件,默认不启动
- 修复多GPU显存占用问题,完全隔离各进程

新增功能:
- API自动返回解析内容
- 结果文件自动清理(可配置)
- 支持图片上传MinIO
2025-10-17 11:46:42 +08:00
Xiaomeng Zhao
1b724f3336 Merge pull request #3756 from myhloli/dev
Set OMP_NUM_THREADS environment variable to 1 for vllm backend initialization
2025-10-16 19:06:45 +08:00
myhloli
ea4271ab37 Set OMP_NUM_THREADS environment variable to 1 for vllm backend initialization 2025-10-16 18:26:06 +08:00
Xiaomeng Zhao
d83b83a5ad Merge pull request #3755 from myhloli/dev
Dev
2025-10-16 17:46:44 +08:00
myhloli
0853b84e87 Update README files to use external image link for MinerU logo 2025-10-16 17:45:42 +08:00
myhloli
36225160a3 Update arXiv badge to reflect MinerU technical report and add badge for MinerU2.5 2025-10-16 17:41:41 +08:00
myhloli
a36118f8ba Add mineru_tianshu project to README files for version 2.0 compatibility 2025-10-16 17:38:57 +08:00
myhloli
a38384e7fb Update mineru-vl-utils dependency version to allow upgrades to 0.1.13 2025-10-16 17:36:45 +08:00
Xiaomeng Zhao
4b7c2bbcc0 Merge pull request #3754 from myhloli/dev
Refactor table merging logic to enhance colspan adjustments and improve caption handling
2025-10-16 17:35:28 +08:00
Xiaomeng Zhao
504fe6ada3 Merge pull request #3742 from magicyuan876/master
feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务
2025-10-16 17:33:54 +08:00
myhloli
39be54023b Refactor table merging logic to enhance colspan adjustments and improve caption handling 2025-10-16 17:31:57 +08:00
Magic_yuan
484ff5a6f9 修复codereview问题 2025-10-16 16:04:42 +08:00
myhloli
59a7a577b3 Add backend name dropdown and update version constraints in bug report template 2025-10-16 14:55:48 +08:00
Xiaomeng Zhao
0e73ef9615 Merge pull request #3750 from myhloli/dev
Update openai dependency version to allow upgrades to version 3
2025-10-16 14:43:57 +08:00
myhloli
d580d6c7f8 Update openai dependency version to allow upgrades to version 3 2025-10-16 14:43:05 +08:00
Xiaomeng Zhao
4c8bb038ce Merge pull request #3748 from myhloli/dev
Enhance table merging logic to adjust colspan attributes based on row structures
2025-10-16 14:24:14 +08:00
myhloli
a89715b9a2 Refactor table merging logic to improve caption handling and prevent merging with non-continuation captions 2025-10-16 14:11:15 +08:00
myhloli
f05ea7c2e6 Simplify model output path handling by removing conditional checks for backend type 2025-10-16 14:09:30 +08:00
Xiaomeng Zhao
b68db3ab90 Merge pull request #3740 from yongtenglei/master
docs: Fix outdated sample data for output reference
2025-10-16 10:43:22 +08:00
yongtenglei
3539cfba36 docs: Fix sample data for output reference 2025-10-16 10:33:13 +08:00
Magic_yuan
3bf50d5267 feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务
项目简介:
天枢(Tianshu)是基于 MinerU 的文档解析服务,采用 SQLite 任务队列 +
LitServe GPU 负载均衡架构,支持异步处理、任务持久化和多格式文档智能解析。

核心功能:
- 异步任务处理:客户端立即响应,后台处理任务
- 智能解析器:PDF/图片使用 MinerU(GPU加速),Office/文本使用 MarkItDown
- GPU 负载均衡:基于 LitServe 实现多GPU自动调度
- 任务持久化:SQLite 存储,服务重启任务不丢失
- 优先级队列:支持任务优先级设置
- RESTful API:完整的任务管理接口
- MinIO 集成:支持图片上传到对象存储

项目架构:
- api_server.py: FastAPI Web 服务器,提供 RESTful API
- task_db.py: SQLite 任务数据库管理器
- litserve_worker.py: LitServe Worker Pool,GPU 负载均衡
- task_scheduler.py: 异步任务调度器
- start_all.py: 统一启动脚本
- client_example.py: Python 客户端示例

技术栈:
FastAPI, LitServe, SQLite, MinerU, MarkItDown, MinIO, Loguru
2025-10-16 08:41:51 +08:00
myhloli
2108019698 Enhance table merging logic to adjust colspan attributes based on row structures 2025-10-15 19:05:28 +08:00
Xiaomeng Zhao
17a9921ba9 Merge pull request #3737 from myhloli/dev
Refactor block processing to handle non-contiguous indices in captions and footnotes
2025-10-15 17:06:22 +08:00
myhloli
3baee1d077 Refactor block processing to handle non-contiguous indices in captions and footnotes 2025-10-15 17:04:29 +08:00
myhloli
e1ee728e31 Sort blocks by index and clean up unprocessed blocks handling 2025-10-15 16:06:03 +08:00
Xiaomeng Zhao
1b45e6e1bc Merge pull request #3723 from myhloli/dev
Rename plugin documentation files for consistency and update index links
2025-10-14 19:00:38 +08:00
myhloli
966aadd1d3 Rename plugin documentation files for consistency and update index links 2025-10-14 18:58:24 +08:00
Xiaomeng Zhao
ecb8e3f0ac Merge pull request #3722 from myhloli/dev
Add documentation for Cherry Studio, Sider, Dify, n8n, Coze, FastGPT, ModelWhale, DingTalk, DataFlow, BISHENG, and RagFlow plugins
2025-10-14 18:55:19 +08:00
myhloli
1bef6e3526 Add documentation for Cherry Studio, Sider, Dify, n8n, Coze, FastGPT, ModelWhale, DingTalk, DataFlow, BISHENG, and RagFlow plugins 2025-10-14 18:54:15 +08:00
myhloli
4c4d1d0f95 Update supported version range in bug_report.yml to include 2.2.x and 2.5.x 2025-10-14 16:09:30 +08:00
Xiaomeng Zhao
c36aa54370 Merge pull request #3709 from myhloli/dev
Add max_concurrency parameter to improve backend processing
2025-10-13 15:57:34 +08:00
myhloli
4b480cfcf7 Add max_concurrency parameter to improve backend processing 2025-10-13 15:56:49 +08:00
Xiaomeng Zhao
7e18e1bb76 Merge pull request #3707 from myhloli/dev
Refactor async function and improve output directory handling in prediction
2025-10-13 11:59:33 +08:00
myhloli
44fdeb663f Refactor async function and improve output directory handling in prediction 2025-10-13 11:32:28 +08:00
myhloli
cf59949ba9 add tiff 2025-10-12 11:45:49 +08:00
Xiaomeng Zhao
c8c2f28afc Merge pull request #3701 from opendatalab/ocr_enhance
Ocr enhance
2025-10-11 19:33:32 +08:00
Xiaomeng Zhao
aa4bc6259b Merge pull request #3700 from myhloli/ocr_enhance
Reduce recognition batch size from 8 to 6
2025-10-11 19:29:09 +08:00
myhloli
b7e4ea0b49 Reduce recognition batch size from 8 to 6 for improved OCR performance 2025-10-11 19:28:16 +08:00
Xiaomeng Zhao
998197a47f Merge pull request #3672 from cjsdurj/optimize_ocr
优化pytorch_paddle ocr的推理性性能,总体提升约400%
2025-10-11 18:44:02 +08:00
Xiaomeng Zhao
3c8b6e6b6b Merge pull request #3499 from jinghuan-Chen/fix/fill_blank_rec_crop_empty_image
Avoid cropping empty images.
2025-10-11 11:14:05 +08:00
Xiaomeng Zhao
be42b46ff9 Merge pull request #3688 from myhloli/dev 2025-10-10 19:43:03 +08:00
myhloli
7c689e33b8 Refactor fix_two_layer_blocks function to improve handling of captions and footnotes in table blocks 2025-10-10 19:12:18 +08:00
cjsdurj
af66bc02c2 优化ocr推理性能400% 2025-10-09 13:03:22 +00:00
Xiaomeng Zhao
752f75ad8e Merge pull request #3651 from opendatalab/dev
Dev
2025-09-30 06:31:24 +08:00
Xiaomeng Zhao
1cfde98585 Merge pull request #3650 from myhloli/dev
Dev
2025-09-30 06:30:12 +08:00
Xiaomeng Zhao
54676295d5 Update README_zh-CN.md 2025-09-30 06:29:05 +08:00
Xiaomeng Zhao
61c7c65d8b Update README.md 2025-09-30 06:18:00 +08:00
Xiaomeng Zhao
6f05f735d0 Update header.html 2025-09-30 06:11:43 +08:00
Xiaomeng Zhao
befb16e531 Merge pull request #3649 from opendatalab/master
master->dev
2025-09-30 06:08:54 +08:00
Bin Wang
abc433d6f2 Merge pull request #3635 from wangbinDL/master
docs: Update arXiv link for technical report
2025-09-29 09:36:45 +08:00
wangbinDL
e7c1385068 docs: Update arXiv link for technical report 2025-09-29 09:32:30 +08:00
Bin Wang
342c5aa34a Merge pull request #3619 from wangbinDL/master
docs: Update MinerU2.5 Technical Report
2025-09-26 18:35:31 +08:00
wangbinDL
f25ddfa024 docs: Update MinerU2.5 Technical Report 2025-09-26 18:27:22 +08:00
Bin Wang
e31de3a453 Merge pull request #3615 from wangbinDL/master
docs: Add MinerU2.5 technical report and BibTeX
2025-09-26 11:51:45 +08:00
wangbinDL
2f01754410 docs: Add MinerU2.5 technical report and BibTeX 2025-09-26 11:42:59 +08:00
Xiaomeng Zhao
8a9921fb22 Merge pull request #3610 from opendatalab/master
master->dev
2025-09-26 06:17:20 +08:00
myhloli
652e11a253 Update version.py with new version 2025-09-25 21:57:26 +00:00
Xiaomeng Zhao
61cc6886fe Merge pull request #3608 from opendatalab/release-2.5.4
Release 2.5.4
2025-09-26 05:53:36 +08:00
Xiaomeng Zhao
80dc57e7ce Merge pull request #3609 from myhloli/dev
Bump mineru-vl-utils dependency to version 0.1.11
2025-09-26 05:48:32 +08:00
myhloli
d84a006f6d Bump mineru-vl-utils dependency to version 0.1.11 2025-09-26 05:47:27 +08:00
Xiaomeng Zhao
2c5361bf8e Merge pull request #3607 from myhloli/dev
Update changelog for version 2.5.4 to document PDF identification fix
2025-09-26 05:43:50 +08:00
myhloli
eb01b7acf9 Update changelog for version 2.5.4 to document PDF identification fix 2025-09-26 05:42:43 +08:00
Xiaomeng Zhao
5656f1363b Merge pull request #3606 from myhloli/dev
Dev
2025-09-26 05:35:29 +08:00
myhloli
c9315b8e10 Refactor suffix guessing to handle PDF extensions for AI files 2025-09-26 05:31:46 +08:00
myhloli
907099762f Normalize PDF suffix handling for AI files to be case-insensitive 2025-09-26 05:09:19 +08:00
myhloli
2c356cccee Fix suffix identification for AI files to correctly handle PDF extensions 2025-09-26 05:02:56 +08:00
myhloli
0f62f166e6 Enhance image link replacement to handle only .jpg files while preserving other formats 2025-09-26 04:52:05 +08:00
Xiaomeng Zhao
c7a64e72dc Merge pull request #3563 from myhloli/dev
Update model output handling in test_e2e.py to write JSON format instead of text
2025-09-21 02:49:31 +08:00
myhloli
3cb3a94830 Merge remote-tracking branch 'origin/dev' into dev 2025-09-21 02:48:45 +08:00
myhloli
8301fa4c20 Update model output handling in test_e2e.py to write JSON format instead of text 2025-09-21 02:47:56 +08:00
Xiaomeng Zhao
4400f4b75f Merge pull request #3558 from opendatalab/master
master->dev
2025-09-20 15:37:45 +08:00
myhloli
92efb8f96e Update version.py with new version 2025-09-20 07:36:01 +00:00
Xiaomeng Zhao
9a88cbfb09 Merge pull request #3545 from opendatalab/release-2.5.3
Release 2.5.3
2025-09-20 15:33:58 +08:00
Xiaomeng Zhao
e96e4a0ce4 Merge pull request #3557 from opendatalab/dev
Dev
2025-09-20 15:30:40 +08:00
Xiaomeng Zhao
c7bde0ab39 Merge pull request #3556 from myhloli/dev
Refactor batch image orientation classification logic for improved cl…
2025-09-20 15:30:08 +08:00
myhloli
8754c24e42 Refactor batch image orientation classification logic for improved clarity and performance 2025-09-20 15:24:28 +08:00
Xiaomeng Zhao
4f8c00cc34 Merge pull request #3555 from opendatalab/dev
Dev
2025-09-20 15:18:19 +08:00
Xiaomeng Zhao
89681f98ad Merge pull request #3554 from myhloli/dev
Fix formatting in changelog sections of README.md and README_zh-CN.md…
2025-09-20 15:14:16 +08:00
myhloli
66d328dbc5 Fix formatting in changelog sections of README.md and README_zh-CN.md for improved readability 2025-09-20 15:13:29 +08:00
Xiaomeng Zhao
f0c1318545 Merge pull request #3553 from myhloli/dev
Fix formatting in changelog sections of README.md and README_zh-CN.md…
2025-09-20 15:11:43 +08:00
myhloli
6e97f3cf70 Fix formatting in changelog sections of README.md and README_zh-CN.md for improved readability 2025-09-20 15:10:25 +08:00
Xiaomeng Zhao
aede62167e Merge pull request #3552 from opendatalab/dev
Dev
2025-09-20 15:08:40 +08:00
Xiaomeng Zhao
5f2740f743 Merge pull request #3551 from myhloli/dev
Fix compute capability comparison in custom_logits_processors.py for …
2025-09-20 15:08:14 +08:00
myhloli
a888d2b625 Fix compute capability comparison in custom_logits_processors.py for correct version handling 2025-09-20 15:06:49 +08:00
Xiaomeng Zhao
4275876331 Merge pull request #3550 from opendatalab/dev
Dev
2025-09-20 15:01:39 +08:00
Xiaomeng Zhao
ec9f7f54ab Merge pull request #3549 from myhloli/dev
Update README.md and README_zh-CN.md to include changelog for v2.5.3 …
2025-09-20 15:00:50 +08:00
myhloli
7861e5e369 Remove redundant newline in README.md for improved formatting 2025-09-20 15:00:12 +08:00
myhloli
159f3a89a3 Update README.md and README_zh-CN.md to include changelog for v2.5.3 release with compatibility fixes and performance adjustments 2025-09-20 14:57:54 +08:00
Xiaomeng Zhao
d9452bbeb9 Merge pull request #3546 from myhloli/dev
Update docker_deployment.md for improved clarity on base image usage …
2025-09-20 14:48:50 +08:00
myhloli
d808a32c0b Update docker_deployment.md for improved clarity on base image usage and GPU support 2025-09-20 13:52:16 +08:00
Xiaomeng Zhao
12ce3bd024 Merge pull request #3544 from myhloli/dev
Dev
2025-09-20 13:26:18 +08:00
myhloli
e3d7aece50 Remove warning log for default VLLM_USE_V1 value in custom_logits_processors.py 2025-09-20 13:25:11 +08:00
Xiaomeng Zhao
7c55a0ea65 Update mineru/backend/vlm/custom_logits_processors.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-20 13:22:40 +08:00
myhloli
f1659eb7a7 Refactor logits processor handling in server.py and vlm_analyze.py for improved clarity and consistency 2025-09-20 13:21:05 +08:00
myhloli
c6bffd9382 Restrict vllm version to <0.11 for compatibility 2025-09-20 11:49:06 +08:00
myhloli
857dcb2ef5 Update docker_deployment.md to clarify GPU model support and base image options for vLLM 2025-09-20 11:45:33 +08:00
myhloli
ef69f98cd6 Update Dockerfile to include comments for GPU architecture compatibility based on Compute Capability 2025-09-20 03:15:58 +08:00
myhloli
6d5d1cf26b Refactor image rotation handling in batch_analyze.py and paddle_ori_cls.py for improved compatibility with torch versions 2025-09-20 03:07:47 +08:00
myhloli
7c481796f8 Refactor custom logits processors to include vllm version checks and improve logging 2025-09-20 01:22:06 +08:00
myhloli
7d62b7b7cc Update mineru-vl-utils dependency version to 0.1.8 2025-09-20 00:31:14 +08:00
myhloli
5a0cf9af7f Enhance custom logits processors with improved compute capability checks and environment variable handling 2025-09-20 00:21:43 +08:00
myhloli
f5e0e67545 Add custom logits processors functionality with compute capability check 2025-09-19 19:21:56 +08:00
myhloli
a4cac624df Add compute capability check for custom logits processors in server.py and vlm_analyze.py 2025-09-19 19:00:41 +08:00
Xiaomeng Zhao
e1eb318b9b Merge pull request #3535 from opendatalab/master
master->dev
2025-09-19 16:51:13 +08:00
myhloli
31834b1e68 Update version.py with new version 2025-09-19 08:48:17 +00:00
Xiaomeng Zhao
100ace2e99 Merge pull request #3534 from opendatalab/release-2.5.2
Release 2.5.2
2025-09-19 16:45:57 +08:00
Xiaomeng Zhao
6aac639686 Merge pull request #3533 from myhloli/dev
Update ModelScope link in README_zh-CN.md for MinerU2.5 release
2025-09-19 16:39:40 +08:00
myhloli
82f94a9a84 Update ModelScope link in README_zh-CN.md for MinerU2.5 release 2025-09-19 16:36:42 +08:00
Xiaomeng Zhao
d928334c61 Merge pull request #3532 from myhloli/dev
Fix formatting in vlm_middle_json_mkcontent.py to ensure proper line breaks in list items
2025-09-19 16:34:29 +08:00
myhloli
ebad82bd8c Update version in README to 2.5.2 for MinerU2.5 release 2025-09-19 16:31:30 +08:00
myhloli
b03c5fb449 Fix formatting in vlm_middle_json_mkcontent.py to ensure proper line breaks in list items 2025-09-19 16:30:43 +08:00
myhloli
c343afd20c Update version.py with new version 2025-09-19 03:45:08 +00:00
Xiaomeng Zhao
6586c7c01e Merge pull request #3529 from opendatalab/release-2.5.1
Release 2.5.1
2025-09-19 11:43:51 +08:00
Xiaomeng Zhao
304a6d9d8c Merge pull request #3527 from myhloli/dev
fix: Update mineru-vl-utils version and add logits processors support
2025-09-19 11:42:43 +08:00
myhloli
bce9bb6d1d Add support for --logits-processors argument in server.py 2025-09-19 11:42:05 +08:00
myhloli
920220e48e Update version in README for MinerU2.5 release to 2.5.1 2025-09-19 11:40:44 +08:00
myhloli
9fc3d6c742 Remove direct import of MinerULogitsProcessor and add it conditionally in vllm backend 2025-09-19 11:36:20 +08:00
myhloli
8fd544273e Update mineru-vl-utils version and add logits processors support 2025-09-19 11:20:34 +08:00
myhloli
72f1f5f935 Update mineru-vl-utils version and add logits processors support 2025-09-19 11:16:55 +08:00
Xiaomeng Zhao
5559a4701a Merge pull request #3523 from opendatalab/master
master->dev
2025-09-19 10:44:51 +08:00
myhloli
437022abfa Specify version constraints for mineru-vl-utils in pyproject.toml 2025-09-19 03:39:57 +08:00
myhloli
4653ed1502 Remove version constraints for mineru-vl-utils in pyproject.toml 2025-09-19 03:31:13 +08:00
Xiaomeng Zhao
b58c7f8d6e Merge pull request #3517 from opendatalab/dev
Dev
2025-09-19 03:27:30 +08:00
Xiaomeng Zhao
f6133b1731 Merge pull request #3516 from myhloli/dev
Update dependency name for mineru-vl-utils in pyproject.toml
2025-09-19 03:26:31 +08:00
myhloli
12d72c7c17 Update dependency name for mineru-vl-utils in pyproject.toml 2025-09-19 03:25:18 +08:00
Xiaomeng Zhao
5f3f35c009 Merge pull request #3515 from opendatalab/master
master->dev
2025-09-19 03:14:48 +08:00
myhloli
16ad71446b Update version.py with new version 2025-09-18 19:12:56 +00:00
Xiaomeng Zhao
d4b364eb9f Merge pull request #3513 from opendatalab/release-2.5.0
Release 2.5.0
2025-09-19 03:10:02 +08:00
Xiaomeng Zhao
446188adf4 Merge pull request #3514 from myhloli/dev
update dependencies
2025-09-19 03:09:50 +08:00
myhloli
ff90c600aa update dependencies 2025-09-19 03:07:23 +08:00
Xiaomeng Zhao
3f2c7e5e7c Merge pull request #3512 from myhloli/dev
update docs
2025-09-19 03:04:12 +08:00
myhloli
2ba1c35fbd update docs 2025-09-19 03:03:06 +08:00
Xiaomeng Zhao
d3f92a0b20 Merge pull request #3511 from opendatalab/dev
Dev
2025-09-19 03:00:56 +08:00
Xiaomeng Zhao
4b6f151351 Merge pull request #3510 from myhloli/dev
update docs
2025-09-19 03:00:14 +08:00
myhloli
5fcd428cb5 update docs 2025-09-19 02:56:44 +08:00
Xiaomeng Zhao
5db08afef6 Merge pull request #3509 from opendatalab/release-2.5.0
Release 2.5.0
2025-09-19 02:51:50 +08:00
Xiaomeng Zhao
6b182f8378 Update mineru/cli/gradio_app.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19 02:51:09 +08:00
Xiaomeng Zhao
ae9526127f Merge pull request #3508 from myhloli/dev
update docs
2025-09-19 02:27:40 +08:00
myhloli
39790095bf update docs 2025-09-19 02:26:36 +08:00
Xiaomeng Zhao
fef3081bdf Merge pull request #3507 from myhloli/dev
update docs
2025-09-19 02:24:30 +08:00
myhloli
5425da9571 update docs 2025-09-19 02:23:42 +08:00
Xiaomeng Zhao
9af1824328 Merge pull request #3506 from myhloli/dev
Dev
2025-09-19 02:17:16 +08:00
myhloli
e47b19c416 update docs 2025-09-19 02:16:17 +08:00
myhloli
5646f46606 update docs 2025-09-19 02:04:04 +08:00
Xiaomeng Zhao
9d5568a9cb Merge pull request #3505 from myhloli/dev
update docs
2025-09-19 01:58:12 +08:00
myhloli
ec3549702f update docs 2025-09-19 01:55:35 +08:00
Xiaomeng Zhao
d185d1822b Merge pull request #3504 from myhloli/dev
update docs
2025-09-19 01:49:57 +08:00
myhloli
4864a086ce update docs 2025-09-19 01:48:50 +08:00
Xiaomeng Zhao
f736e29cc0 Merge pull request #3503 from myhloli/dev
update docs
2025-09-19 01:23:09 +08:00
myhloli
34fab4f5b8 update docs 2025-09-19 01:22:25 +08:00
Xiaomeng Zhao
2496875c33 Merge pull request #3502 from myhloli/dev
Dev
2025-09-19 01:13:20 +08:00
myhloli
ec4cc37861 Merge remote-tracking branch 'origin/dev' into dev 2025-09-19 00:00:29 +08:00
myhloli
c2208d84cb feat: update output_files.md to include new block types and fields for code and list structures 2025-09-18 23:52:01 +08:00
Xiaomeng Zhao
cdc025a9ec Merge pull request #3490 from myhloli/dev
Add vlm 2.5 support
2025-09-18 23:04:43 +08:00
Xiaomeng Zhao
cdbe6ba9b6 Update mineru/utils/enum_class.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-18 23:04:29 +08:00
myhloli
75f576ad0c fix: correct capitalization of "HuggingFace" in README files 2025-09-18 22:55:56 +08:00
myhloli
52844f0794 feat: update README files to reflect the release of MinerU2.5 and its enhancements 2025-09-18 22:55:01 +08:00
myhloli
8d178b2b7e feat: enhance file type detection by using guess_suffix_by_path for document parsing 2025-09-18 22:41:58 +08:00
myhloli
1083476a02 fix: typo 2025-09-18 21:45:02 +08:00
myhloli
da29782a26 feat: add contrast calculation for span images to improve OCR accuracy 2025-09-18 19:55:40 +08:00
myhloli
75797a3b7c feat: update header title to MinerU 2.5 and add model link in header.html; add Dingo tool link in README_zh-CN.md 2025-09-18 18:46:50 +08:00
myhloli
5b73b89ceb fix: add handling for reference text blocks in draw_bbox.py 2025-09-18 17:24:15 +08:00
myhloli
c5b2926c7b fix: extend text block handling to include reference text in draw_bbox.py 2025-09-18 17:23:11 +08:00
jinghuan-Chen
8bb8b715c1 Avoid cropping empty images. 2025-09-18 17:08:40 +08:00
myhloli
3ca520a3fe feat: implement dynamic batch size calculation based on GPU memory in vlm_analyze.py 2025-09-18 14:55:34 +08:00
myhloli
ba36a94aa0 fix: streamline model argument handling in server.py 2025-09-18 01:14:16 +08:00
myhloli
11ebb47891 fix: remove redundant model_path checks for vllm backends in vlm_analyze.py 2025-09-18 00:17:09 +08:00
myhloli
dd8dd5197b fix: correct variable usage for language guessing in code block formatting 2025-09-17 23:56:16 +08:00
myhloli
7a71cfe288 feat: add support for vllm-async-engine backend in vlm_analyze.py 2025-09-17 22:58:47 +08:00
myhloli
bba31191a4 fix: update backend handling to enforce correct usage of vlm engines in sync and async modes 2025-09-17 22:43:44 +08:00
Xiaomeng Zhao
9041f04588 Merge pull request #18 from myhloli/vlm_2.5
Vlm 2.5
2025-09-17 21:53:26 +08:00
Xiaomeng Zhao
69a9d11b0b Merge pull request #3489 from e06084/dev
docs: README add dingo link
2025-09-17 21:51:06 +08:00
chupei
36e7267ce1 docs: README add dingo link 2025-09-17 20:31:23 +08:00
myhloli
14f347d613 feat: add code_content_clean function to sanitize Markdown code blocks 2025-09-17 19:20:34 +08:00
myhloli
6ea2cfeb21 fix: update MinerU version references in enum_class.py and header.html 2025-09-17 16:48:08 +08:00
myhloli
078099f19d feat: enhance language guessing for code blocks by integrating guess_lang into line structure 2025-09-17 16:03:27 +08:00
myhloli
25d4a4588a fix: specify version range for Magika dependency in pyproject.toml 2025-09-17 00:46:35 +08:00
myhloli
679dad3aac fix: streamline temporary file handling for image and PDF processing in fast_api.py 2025-09-17 00:41:37 +08:00
myhloli
e60da65cca feat: enhance file type detection using Magika for improved suffix guessing 2025-09-17 00:19:44 +08:00
myhloli
f081d36a3a feat: implement language guessing for code blocks using Magika 2025-09-16 23:40:51 +08:00
myhloli
c74e712918 fix: correct language guessing in code block formatting in vlm_middle_json_mkcontent.py 2025-09-16 22:19:44 +08:00
myhloli
f2b944ab06 fix: enhance language guessing for code blocks in VLM processing 2025-09-16 21:43:18 +08:00
myhloli
2e945adcc0 docs: update output_files.md to reflect significant changes in VLM backend output for version 2.5 2025-09-16 19:38:57 +08:00
myhloli
39eaf31fb9 docs: update output_files.md to reflect significant changes in VLM backend output for version 2.5 2025-09-16 19:02:50 +08:00
myhloli
7717534ea7 fix: remove unused import of list_iterator from draw_bbox.py 2025-09-16 01:30:09 +08:00
Xiaomeng Zhao
6166b98cd4 Merge pull request #17 from myhloli/dev
fix: adjust overlap area ratio for image and table spans in span_block_fix
2025-09-15 20:48:43 +08:00
Xiaomeng Zhao
a02ab97ea0 Merge pull request #3473 from myhloli/dev
fix: adjust overlap area ratio for image and table spans in span_block_fix
2025-09-15 20:46:36 +08:00
myhloli
beadb7a689 fix: adjust overlap area ratio for image and table spans in span_block_fix 2025-09-15 19:22:57 +08:00
myhloli
de5449fd40 refactor: consolidate output processing into a single _process_output function 2025-09-15 11:24:21 +08:00
myhloli
76f74e7c70 fix: enhance draw_bbox functionality to include list items in bounding box drawing 2025-09-15 02:32:09 +08:00
myhloli
efbf1422c6 fix: update header title to reflect MinerU version 2.5 2025-09-15 02:04:21 +08:00
myhloli
3ec6479462 fix: update backend comment to reflect renaming from sglang-engine to vlm-vllm-engine 2025-09-15 02:00:58 +08:00
myhloli
80e6f4ded4 fix: update coverage omit list to reflect renaming from sglang to vllm 2025-09-15 01:54:49 +08:00
myhloli
376b5d924a Merge remote-tracking branch 'origin/vlm_2.5' into vlm_2.5 2025-09-15 01:52:36 +08:00
myhloli
6608615012 docs: update demo.py to reflect changes in backend naming from sglang to vllm 2025-09-15 01:52:14 +08:00
myhloli
12dea70793 Merge remote-tracking branch 'origin/vlm_2.5' into vlm_2.5 2025-09-15 01:50:37 +08:00
myhloli
96a0a45c9a fix: update sys.argv to include 'serve' for vllm server startup 2025-09-15 01:50:14 +08:00
myhloli
745954ca08 docs: update references from sglang to vllm in documentation and configuration files 2025-09-15 01:45:35 +08:00
myhloli
e120a90d11 docs: update documentation for vllm integration and parameter optimization 2025-09-15 01:25:23 +08:00
myhloli
8c75e0fce2 docs: update changelog for version 2.5.0 release 2025-09-14 23:10:51 +08:00
Xiaomeng Zhao
978c94f680 Merge pull request #16 from myhloli/dev
Dev
2025-09-14 23:00:57 +08:00
myhloli
c4eae4e0ef fix: add timing log for model predictor retrieval in vlm_analyze.py 2025-09-14 22:28:52 +08:00
myhloli
411f3b7855 fix: comment out debug logging in vllm_analyze.py 2025-09-12 15:10:04 +08:00
myhloli
60e257e5f1 fix: set default values for gpu_memory_utilization and model in vllm_analyze.py 2025-09-12 15:07:51 +08:00
myhloli
20e1dfe984 fix: enhance model initialization for transformers and vllm-engine backends in vlm_analyze.py 2025-09-12 11:39:28 +08:00
myhloli
f2553dd89a fix: add default arguments for port and GPU memory utilization in server.py 2025-09-12 11:07:51 +08:00
myhloli
b35c3345c0 fix: add default arguments for port and GPU memory utilization in server.py 2025-09-12 11:07:23 +08:00
myhloli
af3ee06aa3 fix: update import path for vllm entrypoint in server.py 2025-09-12 10:23:16 +08:00
myhloli
4f6ac22ce6 fix: update import path for vllm entrypoint in server.py 2025-09-12 10:19:03 +08:00
myhloli
0f47a22bb3 refactor: update option names and server script for vLLM engine integration 2025-09-12 10:08:57 +08:00
myhloli
2ca6ee1708 refactor: rename server files and update model path handling for vllm integration 2025-09-12 10:01:23 +08:00
myhloli
55eaad224d feat: add support for vlm 2.5 2025-09-11 19:42:51 +08:00
Xiaomeng Zhao
bb94e73fc9 Merge pull request #3451 from opendatalab/master
master->dev
2025-09-10 14:46:22 +08:00
myhloli
70f62046e7 Update version.py with new version 2025-09-10 06:44:57 +00:00
Xiaomeng Zhao
fd38cdff80 Merge pull request #3450 from opendatalab/release-2.2.2
Release 2.2.2
2025-09-10 14:44:05 +08:00
Xiaomeng Zhao
d30f762ac8 Merge pull request #3449 from myhloli/dev
docs: update changelog for version 2.2.2 release
2025-09-10 14:43:09 +08:00
myhloli
f65ff12eea docs: update changelog for version 2.2.2 release 2025-09-10 14:42:28 +08:00
myhloli
8b8ac3e62e docs: update changelog for version 2.2.2 release 2025-09-10 14:33:30 +08:00
Xiaomeng Zhao
473154c2b3 Merge pull request #3448 from myhloli/dev
fix: improve HTML code handling and logging in batch_analyze and main…
2025-09-10 14:31:19 +08:00
myhloli
e2fd491760 fix: improve HTML code handling and logging in batch_analyze and main modules 2025-09-10 14:27:50 +08:00
Xiaomeng Zhao
c29e2d0ca2 Merge pull request #3438 from opendatalab/master
master->dev
2025-09-08 10:59:47 +08:00
myhloli
a5687394d5 Update version.py with new version 2025-09-08 02:54:47 +00:00
Xiaomeng Zhao
13819c0596 Merge pull request #3437 from opendatalab/release-2.2.1
Release 2.2.1
2025-09-08 10:53:01 +08:00
Xiaomeng Zhao
d775f76eec Merge pull request #3435 from myhloli/dev
feat: add new models to download list
2025-09-08 10:51:52 +08:00
myhloli
5dd73dbcca Merge remote-tracking branch 'origin/dev' into dev 2025-09-08 10:46:08 +08:00
myhloli
3eda0d10a0 feat: add new models to download list and update changelog for version 2.2.1 2025-09-08 10:45:21 +08:00
Xiaomeng Zhao
e0c3cbb34a Merge pull request #3429 from opendatalab/master
master->dev
2025-09-05 19:23:07 +08:00
myhloli
d2fcdd0fa4 Update version.py with new version 2025-09-05 11:21:16 +00:00
Xiaomeng Zhao
af887d63c0 Merge pull request #3428 from opendatalab/release-2.2.0
Release 2.2.0
2025-09-05 19:19:42 +08:00
Xiaomeng Zhao
a9f28b4436 Merge pull request #3425 from opendatalab/release-2.2.0
Release 2.2.0
2025-09-05 19:10:08 +08:00
258 changed files with 15341 additions and 4490 deletions

View File

@@ -122,7 +122,21 @@ body:
#multiple: false
options:
-
- "2.0.x"
- "<2.2.0"
- "2.2.x"
- ">=2.5"
validations:
required: true
- type: dropdown
id: backend_name
attributes:
label: Backend name | 解析后端
#multiple: false
options:
-
- "vlm"
- "pipeline"
validations:
required: true

115
README.md
View File

@@ -1,7 +1,7 @@
<div align="center" xmlns="http://www.w3.org/1999/html">
<!-- logo -->
<p align="center">
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
</p>
<!-- icon -->
@@ -18,7 +18,8 @@
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
@@ -43,20 +44,90 @@
</div>
# Changelog
- 2025/10/24 2.6.0 Release
- `pipeline` backend optimizations
- Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
- `OCR` speed significantly improved by 200%~300%, thanks to the optimization solution provided by @cjsdurj
- `OCR` models updated to `ppocr-v5` version for Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) languages, with accuracy improved by over 40% compared to previous models
- `vlm` backend optimizations
- `table_caption` and `table_footnote` matching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page
- Optimized CPU resource usage during high concurrency when using `vllm` backend, reducing server pressure
- Adapted to `vllm` version 0.11.0
- General optimizations
- Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios
- Added environment variable configuration option `MINERU_TABLE_MERGE_ENABLE` for table merging feature. Table merging is enabled by default and can be disabled by setting this variable to `0`
- 2025/09/05 2.2.0 Released
- Major Updates
- In this version, we focused on improving table parsing accuracy by introducing a new [wired table recognition model](https://github.com/RapidAI/TableStructureRec) and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the `pipeline` backend.
- We also added support for cross-page table merging, which is supported by both `pipeline` and `vlm` backends, further improving the completeness and accuracy of table parsing.
- Other Updates
- The `pipeline` backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations
- `pipeline` added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5)
- Added `bbox` field (mapped to 0-1000 range) in the output `content_list.json`, making it convenient for users to directly obtain position information for each content block
- Removed the `pipeline_old_linux` installation option, no longer supporting legacy Linux systems such as `CentOS 7`, to provide better support for `uv`'s `sync`/`run` commands
- 2025/09/26 2.5.4 released
- 🎉🎉 The MinerU2.5 [Technical Report](https://arxiv.org/abs/2509.22186) is now available! We welcome you to read it for a comprehensive overview of its model architecture, training strategy, data engineering and evaluation results.
- Fixed an issue where some `PDF` files were mistakenly identified as `AI` files, causing parsing failures
- 2025/09/20 2.5.3 Released
- Dependency version range adjustment to enable Turing and earlier architecture GPUs to use vLLM acceleration for MinerU2.5 model inference.
- `pipeline` backend compatibility fixes for torch 2.8.0.
- Reduced default concurrency for vLLM async backend to lower server pressure and avoid connection closure issues caused by high load.
- More compatibility-related details can be found in the [announcement](https://github.com/opendatalab/MinerU/discussions/3548)
- 2025/09/19 2.5.2 Released
We are officially releasing MinerU2.5, currently the most powerful multimodal large model for document parsing.
With only 1.2B parameters, MinerU2.5's accuracy on the OmniDocBench benchmark comprehensively surpasses top-tier multimodal models like Gemini 2.5 Pro, GPT-4o, and Qwen2.5-VL-72B. It also significantly outperforms leading specialized models such as dots.ocr, MonkeyOCR, and PP-StructureV3.
The model has been released on [HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B) and [ModelScope](https://modelscope.cn/models/opendatalab/MinerU2.5-2509-1.2B) platforms. Welcome to download and use!
- Core Highlights:
- SOTA Performance with Extreme Efficiency: As a 1.2B model, it achieves State-of-the-Art (SOTA) results that exceed models in the 10B and 100B+ classes, redefining the performance-per-parameter standard in document AI.
- Advanced Architecture for Across-the-Board Leadership: By combining a two-stage inference pipeline (decoupling layout analysis from content recognition) with a native high-resolution architecture, it achieves SOTA performance across five key areas: layout analysis, text recognition, formula recognition, table recognition, and reading order.
- Key Capability Enhancements:
- Layout Detection: Delivers more complete results by accurately covering non-body content like headers, footers, and page numbers. It also provides more precise element localization and natural format reconstruction for lists and references.
- Table Parsing: Drastically improves parsing for challenging cases, including rotated tables, borderless/semi-structured tables, and long/complex tables.
- Formula Recognition: Significantly boosts accuracy for complex, long-form, and hybrid Chinese-English formulas, greatly enhancing the parsing capability for mathematical documents.
Additionally, with the release of vlm 2.5, we have made some adjustments to the repository:
- The vlm backend has been upgraded to version 2.5, supporting the MinerU2.5 model and no longer compatible with the MinerU2.0-2505-0.9B model. The last version supporting the 2.0 model is mineru-2.2.2.
- VLM inference-related code has been moved to [mineru_vl_utils](https://github.com/opendatalab/mineru-vl-utils), reducing coupling with the main mineru repository and facilitating independent iteration in the future.
- The vlm accelerated inference framework has been switched from `sglang` to `vllm`, achieving full compatibility with the vllm ecosystem, allowing users to use the MinerU2.5 model and accelerated inference on any platform that supports the vllm framework.
- Due to major upgrades in the vlm model supporting more layout types, we have made some adjustments to the structure of the parsing intermediate file `middle.json` and result file `content_list.json`. Please refer to the [documentation](https://opendatalab.github.io/MinerU/reference/output_files/) for details.
Other repository optimizations:
- Removed file extension whitelist validation for input files. When input files are PDF documents or images, there are no longer requirements for file extensions, improving usability.
<details>
<summary>History Log</summary>
<details>
<summary>2025/09/10 2.2.2 Released</summary>
<ul>
<li>Fixed the issue where the new table recognition model would affect the overall parsing task when some table parsing failed</li>
</ul>
</details>
<details>
<summary>2025/09/08 2.2.1 Released</summary>
<ul>
<li>Fixed the issue where some newly added models were not downloaded when using the model download command.</li>
</ul>
</details>
<details>
<summary>2025/09/05 2.2.0 Released</summary>
<ul>
<li>
Major Updates
<ul>
<li>In this version, we focused on improving table parsing accuracy by introducing a new <a href="https://github.com/RapidAI/TableStructureRec">wired table recognition model</a> and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the <code>pipeline</code> backend.</li>
<li>We also added support for cross-page table merging, which is supported by both <code>pipeline</code> and <code>vlm</code> backends, further improving the completeness and accuracy of table parsing.</li>
</ul>
</li>
<li>
Other Updates
<ul>
<li>The <code>pipeline</code> backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations</li>
<li><code>pipeline</code> added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5)</li>
<li>Added <code>bbox</code> field (mapped to 0-1000 range) in the output <code>content_list.json</code>, making it convenient for users to directly obtain position information for each content block</li>
<li>Removed the <code>pipeline_old_linux</code> installation option, no longer supporting legacy Linux systems such as <code>CentOS 7</code>, to provide better support for <code>uv</code>'s <code>sync</code>/<code>run</code> commands</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/08/01 2.1.10 Released</summary>
<ul>
@@ -553,7 +624,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
<td>Parsing Backend</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sglang</td>
<td>vlm-vllm</td>
</tr>
<tr>
<td>Operating System</td>
@@ -602,8 +673,8 @@ uv pip install -e .[core]
```
> [!TIP]
> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
> `mineru[core]` includes all core features except `vLLM` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to use `vLLM` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
---
@@ -631,8 +702,8 @@ You can use MinerU for PDF parsing through various methods such as command line,
- [x] Handwritten Text Recognition
- [x] Vertical Text Recognition
- [x] Latin Accent Mark Recognition
- [ ] Code block recognition in the main text
- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
- [x] Code block recognition in the main text
- [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net)
- [ ] Geometric shape recognition
# Known Issues
@@ -680,10 +751,21 @@ Currently, some models in this project are trained based on YOLO. However, since
- [pdftext](https://github.com/datalab-to/pdftext)
- [pdfminer.six](https://github.com/pdfminer/pdfminer.six)
- [pypdf](https://github.com/py-pdf/pypdf)
- [magika](https://github.com/google/magika)
# Citation
```bibtex
@misc{niu2025mineru25decoupledvisionlanguagemodel,
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
year={2025},
eprint={2509.22186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.22186},
}
@misc{wang2024mineruopensourcesolutionprecise,
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
@@ -722,3 +804,4 @@ Currently, some models in this project are trained based on YOLO. However, since
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)

View File

@@ -1,7 +1,7 @@
<div align="center" xmlns="http://www.w3.org/1999/html">
<!-- logo -->
<p align="center">
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
</p>
<!-- icon -->
@@ -18,7 +18,8 @@
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
@@ -43,21 +44,88 @@
</div>
# 更新记录
- 2025/10/24 2.6.0 发布
- `pipline`后端优化
- 增加对中文公式的实验性支持,可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题建议仅在需要解析中文公式的场景下开启。如需关闭该功能可将环境变量设置为`0`
- `OCR`速度大幅提升200%~300%,感谢 @cjsdurj 提供的优化方案
- `OCR`模型更新西里尔文(cyrillic)、阿拉伯文(arabic)、天城文(devanagari)、泰卢固语(te)、泰米尔语(ta)语系至`ppocr-v5`版本精度相比上代模型提升40%以上
- `vlm`后端优化
- `table_caption``table_footnote`匹配逻辑优化,提升页内多张连续表场景下的表格标题和脚注的匹配准确率和阅读顺序合理性
- 优化使用`vllm`后端时高并发时的cpu资源占用降低服务端压力
- 适配`vllm`0.11.0版本
- 通用优化
- 跨页表格合并效果优化,新增跨页续表合并支持,提升在多列合并场景下的表格合并效果
- 为表格合并功能增加环境变量配置选项`MINERU_TABLE_MERGE_ENABLE`,表格合并功能默认开启,可通过设置该变量为`0`来关闭表格合并功能
- 2025/09/26 2.5.4 发布
- 🎉🎉 MinerU2.5[技术报告](https://arxiv.org/abs/2509.22186)现已发布,欢迎阅读全面了解其模型架构、训练策略、数据工程和评测结果。
- 修复部分`pdf`文件被识别成`ai`文件导致无法解析的问题
- 2025/09/05 2.2.0 发布
- 主要更新
- 在这个版本我们重点提升了表格的解析精度,通过引入新的[有线表识别模型](https://github.com/RapidAI/TableStructureRec)和全新的混合表格结构解析算法,显著提升了`pipeline`后端的表格识别能力
- 另外我们增加了对跨页表格合并的支持,这一功能同时支持`pipeline``vlm`后端,进一步提升了表格解析的完整性和准确性
- 其他更新
- `pipeline`后端增加270度旋转的表格解析能力现已支持0/90/270度三个方向的表格解析
- `pipeline`增加对泰文、希腊文的ocr能力支持并更新了英文ocr模型至最新英文识别精度提升11%,泰文识别模型精度 82.68%,希腊文识别模型精度 89.28%by PPOCRv5
- 在输出的`content_list.json`中增加了`bbox`字段(映射至0-1000范围内),方便用户直接获取每个内容块的位置信息
- 移除`pipeline_old_linux`安装可选项不再支持老版本的Linux系统如`Centos 7`等,以便对`uv``sync`/`run`等命令进行更好的支持
- 2025/09/20 2.5.3 发布
- 依赖版本范围调整使得Turing及更早架构显卡可以使用vLLM加速推理MinerU2.5模型。
- `pipeline`后端对torch 2.8.0的一些兼容性修复
- 降低vLLM异步后端默认的并发数降低服务端压力以避免高压导致的链接关闭问题
- 更多兼容性相关内容详见[公告](https://github.com/opendatalab/MinerU/discussions/3547)
- 2025/09/19 2.5.2 发布
我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型并显著领先于主流文档解析专用模型如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。
模型已发布至[HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B)和[ModelScope](https://modelscope.cn/models/opendatalab/MinerU2.5-2509-1.2B)平台,欢迎大家下载使用!
- 核心亮点
- 极致能效性能SOTA: 以 1.2B 的轻量化规模实现了超越百亿乃至千亿级模型的SOTA性能重新定义了文档解析的能效比。
- 先进架构,全面领先: 通过 “两阶段推理” (解耦布局分析与内容识别) 与 原生高分辨率架构 的结合,在布局分析、文本识别、公式识别、表格识别及阅读顺序五大方面均达到 SOTA 水平。
- 关键能力提升
- 布局检测: 结果更完整,精准覆盖页眉、页脚、页码等非正文内容;同时提供更精准的元素定位与更自然的格式还原(如列表、参考文献)。
- 表格解析: 大幅优化了对旋转表格、无线/少线表、以及长难表格的解析能力。
- 公式识别: 显著提升中英混合公式及复杂长公式的识别准确率,大幅改善数学类文档解析能力。
此外伴随vlm 2.5的发布,我们对仓库做出一些调整:
- vlm后端升级至2.5版本支持MinerU2.5模型不再兼容MinerU2.0-2505-0.9B模型最后一个支持2.0模型的版本为mineru-2.2.2。
- vlm推理相关代码已移至[mineru_vl_utils](https://github.com/opendatalab/mineru-vl-utils),降低与mineru主仓库的耦合度便于后续独立迭代。
- vlm加速推理框架从`sglang`切换至`vllm`,并实现对vllm生态的完全兼容使得用户可以在任何支持vllm框架的平台上使用MinerU2.5模型并加速推理。
- 由于vlm模型的重大升级支持更多layout type因此我们对解析的中间文件`middle.json`和结果文件`content_list.json`的结构做出一些调整,请参考[文档](https://opendatalab.github.io/MinerU/zh/reference/output_files/)了解详情。
其他仓库优化:
- 移除对输入文件的后缀名白名单校验当输入文件为PDF文档或图片时对文件的后缀名不再有要求提升易用性。
<details>
<summary>历史日志</summary>
<details>
<summary>2025/09/10 2.2.2 发布</summary>
<ul>
<li>修复新的表格识别模型在部分表格解析失败时影响整体解析任务的问题</li>
</ul>
</details>
<details>
<summary>2025/09/08 2.2.1 发布</summary>
<ul>
<li>修复使用模型下载命令时,部分新增模型未下载的问题</li>
</ul>
</details>
<details>
<summary>2025/09/05 2.2.0 发布</summary>
<ul>
<li>
主要更新
<ul>
<li>在这个版本我们重点提升了表格的解析精度,通过引入新的<a href="https://github.com/RapidAI/TableStructureRec">有线表识别模型</a>和全新的混合表格结构解析算法,显著提升了<code>pipeline</code>后端的表格识别能力。</li>
<li>另外我们增加了对跨页表格合并的支持,这一功能同时支持<code>pipeline</code>和<code>vlm</code>后端,进一步提升了表格解析的完整性和准确性。</li>
</ul>
</li>
<li>
其他更新
<ul>
<li><code>pipeline</code>后端增加270度旋转的表格解析能力现已支持0/90/270度三个方向的表格解析</li>
<li><code>pipeline</code>增加对泰文、希腊文的ocr能力支持并更新了英文ocr模型至最新英文识别精度提升11%,泰文识别模型精度 82.68%,希腊文识别模型精度 89.28%by PPOCRv5</li>
<li>在输出的<code>content_list.json</code>中增加了<code>bbox</code>字段(映射至0-1000范围内),方便用户直接获取每个内容块的位置信息</li>
<li>移除<code>pipeline_old_linux</code>安装可选项不再支持老版本的Linux系统如<code>Centos 7</code>等,以便对<code>uv</code>的<code>sync</code>/<code>run</code>等命令进行更好的支持</li>
</ul>
</li>
</ul>
</details>
<details>
<summary>2025/08/01 2.1.10 发布</summary>
<ul>
@@ -542,7 +610,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
<td>解析后端</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sglang</td>
<td>vlm-vllm</td>
</tr>
<tr>
<td>操作系统</td>
@@ -591,8 +659,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
```
> [!TIP]
> `mineru[core]`包含除`sglang`加速外的所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您有使用`sglang`加速VLM模型推理或是在边缘设备安装轻量版client端等需求可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
> `mineru[core]`包含除`vLLM`加速外的所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您有使用`vLLM`加速VLM模型推理或是在边缘设备安装轻量版client端等需求可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
---
@@ -620,8 +688,8 @@ mineru -p <input_path> -o <output_path>
- [x] 手写文本识别
- [x] 竖排文本识别
- [x] 拉丁字母重音符号识别
- [ ] 正文中代码块识别
- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
- [x] 正文中代码块识别
- [x] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)(https://mineru.net)
- [ ] 图表内容识别
# Known Issues
@@ -669,10 +737,21 @@ mineru -p <input_path> -o <output_path>
- [pdftext](https://github.com/datalab-to/pdftext)
- [pdfminer.six](https://github.com/pdfminer/pdfminer.six)
- [pypdf](https://github.com/py-pdf/pypdf)
- [magika](https://github.com/google/magika)
# Citation
```bibtex
@misc{niu2025mineru25decoupledvisionlanguagemodel,
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
year={2025},
eprint={2509.22186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.22186},
}
@misc{wang2024mineruopensourcesolutionprecise,
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
@@ -710,4 +789,5 @@ mineru -p <input_path> -o <output_path>
- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit)
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)

View File

@@ -15,7 +15,7 @@ from mineru.backend.pipeline.pipeline_analyze import doc_analyze as pipeline_doc
from mineru.backend.pipeline.pipeline_middle_json_mkcontent import union_make as pipeline_union_make
from mineru.backend.pipeline.model_json_to_middle_json import result_to_middle_json as pipeline_result_to_middle_json
from mineru.backend.vlm.vlm_middle_json_mkcontent import union_make as vlm_union_make
from mineru.utils.models_download_utils import auto_download_and_get_model_root_path
from mineru.utils.guess_suffix_or_lang import guess_suffix_by_path
def do_parse(
@@ -27,7 +27,7 @@ def do_parse(
parse_method="auto", # The method for parsing PDF, default is 'auto'
formula_enable=True, # Enable formula parsing
table_enable=True, # Enable table parsing
server_url=None, # Server URL for vlm-sglang-client backend
server_url=None, # Server URL for vlm-http-client backend
f_draw_layout_bbox=True, # Whether to draw layout bounding boxes
f_draw_span_bbox=True, # Whether to draw span bounding boxes
f_dump_md=True, # Whether to dump markdown files
@@ -62,47 +62,12 @@ def do_parse(
pdf_info = middle_json["pdf_info"]
pdf_bytes = pdf_bytes_list[idx]
if f_draw_layout_bbox:
draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
if f_draw_span_bbox:
draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
if f_dump_orig_pdf:
md_writer.write(
f"{pdf_file_name}_origin.pdf",
pdf_bytes,
)
if f_dump_md:
image_dir = str(os.path.basename(local_image_dir))
md_content_str = pipeline_union_make(pdf_info, f_make_md_mode, image_dir)
md_writer.write_string(
f"{pdf_file_name}.md",
md_content_str,
)
if f_dump_content_list:
image_dir = str(os.path.basename(local_image_dir))
content_list = pipeline_union_make(pdf_info, MakeMode.CONTENT_LIST, image_dir)
md_writer.write_string(
f"{pdf_file_name}_content_list.json",
json.dumps(content_list, ensure_ascii=False, indent=4),
)
if f_dump_middle_json:
md_writer.write_string(
f"{pdf_file_name}_middle.json",
json.dumps(middle_json, ensure_ascii=False, indent=4),
)
if f_dump_model_output:
md_writer.write_string(
f"{pdf_file_name}_model.json",
json.dumps(model_json, ensure_ascii=False, indent=4),
)
logger.info(f"local output dir is {local_md_dir}")
_process_output(
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
f_make_md_mode, middle_json, model_json, is_pipeline=True
)
else:
if backend.startswith("vlm-"):
backend = backend[4:]
@@ -118,48 +83,77 @@ def do_parse(
pdf_info = middle_json["pdf_info"]
if f_draw_layout_bbox:
draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
_process_output(
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
f_make_md_mode, middle_json, infer_result, is_pipeline=False
)
if f_draw_span_bbox:
draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
if f_dump_orig_pdf:
md_writer.write(
f"{pdf_file_name}_origin.pdf",
pdf_bytes,
)
def _process_output(
pdf_info,
pdf_bytes,
pdf_file_name,
local_md_dir,
local_image_dir,
md_writer,
f_draw_layout_bbox,
f_draw_span_bbox,
f_dump_orig_pdf,
f_dump_md,
f_dump_content_list,
f_dump_middle_json,
f_dump_model_output,
f_make_md_mode,
middle_json,
model_output=None,
is_pipeline=True
):
"""处理输出文件"""
if f_draw_layout_bbox:
draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
if f_dump_md:
image_dir = str(os.path.basename(local_image_dir))
md_content_str = vlm_union_make(pdf_info, f_make_md_mode, image_dir)
md_writer.write_string(
f"{pdf_file_name}.md",
md_content_str,
)
if f_draw_span_bbox:
draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
if f_dump_content_list:
image_dir = str(os.path.basename(local_image_dir))
content_list = vlm_union_make(pdf_info, MakeMode.CONTENT_LIST, image_dir)
md_writer.write_string(
f"{pdf_file_name}_content_list.json",
json.dumps(content_list, ensure_ascii=False, indent=4),
)
if f_dump_orig_pdf:
md_writer.write(
f"{pdf_file_name}_origin.pdf",
pdf_bytes,
)
if f_dump_middle_json:
md_writer.write_string(
f"{pdf_file_name}_middle.json",
json.dumps(middle_json, ensure_ascii=False, indent=4),
)
image_dir = str(os.path.basename(local_image_dir))
if f_dump_model_output:
model_output = ("\n" + "-" * 50 + "\n").join(infer_result)
md_writer.write_string(
f"{pdf_file_name}_model_output.txt",
model_output,
)
if f_dump_md:
make_func = pipeline_union_make if is_pipeline else vlm_union_make
md_content_str = make_func(pdf_info, f_make_md_mode, image_dir)
md_writer.write_string(
f"{pdf_file_name}.md",
md_content_str,
)
logger.info(f"local output dir is {local_md_dir}")
if f_dump_content_list:
make_func = pipeline_union_make if is_pipeline else vlm_union_make
content_list = make_func(pdf_info, MakeMode.CONTENT_LIST, image_dir)
md_writer.write_string(
f"{pdf_file_name}_content_list.json",
json.dumps(content_list, ensure_ascii=False, indent=4),
)
if f_dump_middle_json:
md_writer.write_string(
f"{pdf_file_name}_middle.json",
json.dumps(middle_json, ensure_ascii=False, indent=4),
)
if f_dump_model_output:
md_writer.write_string(
f"{pdf_file_name}_model.json",
json.dumps(model_output, ensure_ascii=False, indent=4),
)
logger.info(f"local output dir is {local_md_dir}")
def parse_doc(
@@ -182,8 +176,8 @@ def parse_doc(
backend: the backend for parsing pdf:
pipeline: More general.
vlm-transformers: More general.
vlm-sglang-engine: Faster(engine).
vlm-sglang-client: Faster(client).
vlm-vllm-engine: Faster(engine).
vlm-http-client: Faster(client).
without method specified, pipeline will be used by default.
method: the method for parsing pdf:
auto: Automatically determine the method based on the file type.
@@ -191,7 +185,7 @@ def parse_doc(
ocr: Use OCR method for image-based PDFs.
Without method specified, 'auto' will be used by default.
Adapted only for the case where the backend is set to "pipeline".
server_url: When the backend is `sglang-client`, you need to specify the server_url, for example:`http://127.0.0.1:30000`
server_url: When the backend is `http-client`, you need to specify the server_url, for example:`http://127.0.0.1:30000`
start_page_id: Start page ID for parsing, default is 0
end_page_id: End page ID for parsing, default is None (parse all pages until the end of the document)
"""
@@ -225,12 +219,12 @@ if __name__ == '__main__':
__dir__ = os.path.dirname(os.path.abspath(__file__))
pdf_files_dir = os.path.join(__dir__, "pdfs")
output_dir = os.path.join(__dir__, "output")
pdf_suffixes = [".pdf"]
image_suffixes = [".png", ".jpeg", ".jpg"]
pdf_suffixes = ["pdf"]
image_suffixes = ["png", "jpeg", "jp2", "webp", "gif", "bmp", "jpg"]
doc_path_list = []
for doc_path in Path(pdf_files_dir).glob('*'):
if doc_path.suffix in pdf_suffixes + image_suffixes:
if guess_suffix_by_path(doc_path) in pdf_suffixes + image_suffixes:
doc_path_list.append(doc_path)
"""如果您由于网络问题无法下载模型可以设置环境变量MINERU_MODEL_SOURCE为modelscope使用免代理仓库下载模型"""
@@ -241,5 +235,5 @@ if __name__ == '__main__':
"""To enable VLM mode, change the backend to 'vlm-xxx'"""
# parse_doc(doc_path_list, output_dir, backend="vlm-transformers") # more general.
# parse_doc(doc_path_list, output_dir, backend="vlm-sglang-engine") # faster(engine).
# parse_doc(doc_path_list, output_dir, backend="vlm-sglang-client", server_url="http://127.0.0.1:30000") # faster(client).
# parse_doc(doc_path_list, output_dir, backend="vlm-vllm-engine") # faster(engine).
# parse_doc(doc_path_list, output_dir, backend="vlm-http-client", server_url="http://127.0.0.1:30000") # faster(client).

View File

@@ -1,12 +1,15 @@
# Use DaoCloud mirrored sglang image for China region
FROM docker.m.daocloud.io/lmsysorg/sglang:v0.4.10.post2-cu126
# For blackwell GPU, use the following line instead:
# FROM docker.m.daocloud.io/lmsysorg/sglang:v0.4.10.post2-cu128-b200
# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
# Use the official sglang image
# FROM lmsysorg/sglang:v0.4.10.post2-cu126
# For blackwell GPU, use the following line instead:
# FROM lmsysorg/sglang:v0.4.10.post2-cu128-b200
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.1.1
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.2
# Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \

View File

@@ -1,21 +1,19 @@
services:
mineru-sglang-server:
image: mineru-sglang:latest
container_name: mineru-sglang-server
mineru-vllm-server:
image: mineru-vllm:latest
container_name: mineru-vllm-server
restart: always
profiles: ["sglang-server"]
profiles: ["vllm-server"]
ports:
- 30000:30000
environment:
MINERU_MODEL_SOURCE: local
entrypoint: mineru-sglang-server
entrypoint: mineru-vllm-server
command:
--host 0.0.0.0
--port 30000
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
# --dp-size 2 # If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode
# --tp-size 2 # If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode.
# --mem-fraction-static 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
ulimits:
memlock: -1
stack: 67108864
@@ -31,7 +29,7 @@ services:
capabilities: [gpu]
mineru-api:
image: mineru-sglang:latest
image: mineru-vllm:latest
container_name: mineru-api
restart: always
profiles: ["api"]
@@ -43,11 +41,9 @@ services:
command:
--host 0.0.0.0
--port 8000
# parameters for sglang-engine
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
# --dp-size 2 # If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode
# --tp-size 2 # If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode.
# --mem-fraction-static 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
# parameters for vllm-engine
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
ulimits:
memlock: -1
stack: 67108864
@@ -61,7 +57,7 @@ services:
capabilities: [ gpu ]
mineru-gradio:
image: mineru-sglang:latest
image: mineru-vllm:latest
container_name: mineru-gradio
restart: always
profiles: ["gradio"]
@@ -73,14 +69,12 @@ services:
command:
--server-name 0.0.0.0
--server-port 7860
--enable-sglang-engine true # Enable the sglang engine for Gradio
--enable-vllm-engine true # Enable the vllm engine for Gradio
# --enable-api false # If you want to disable the API, set this to false
# --max-convert-pages 20 # If you want to limit the number of pages for conversion, set this to a specific number
# parameters for sglang-engine
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
# --dp-size 2 # If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode
# --tp-size 2 # If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode.
# --mem-fraction-static 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
# parameters for vllm-engine
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
ulimits:
memlock: -1
stack: 67108864

View File

@@ -1,7 +1,9 @@
# Use the official sglang image
FROM lmsysorg/sglang:v0.4.10.post2-cu126
# For blackwell GPU, use the following line instead:
# FROM lmsysorg/sglang:v0.4.10.post2-cu128-b200
# Use the official vllm image for gpu with Ampere architecture and above (Compute Capability>=8.0)
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
FROM vllm/vllm-openai:v0.10.1.1
# Use the official vllm image for gpu with Turing architecture and below (Compute Capability<8.0)
# FROM vllm/vllm-openai:v0.10.2
# Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 214 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 147 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 249 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 255 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 177 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 161 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 263 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 264 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 246 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 500 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 276 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

View File

@@ -15,18 +15,6 @@ For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [W
Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
```
conda create -n mineru python=3.11 -y
conda activate mineru
pip install -U "mineru[pipeline_old_linux]"
```
Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
??? question "Missing text information in parsing results when installing and using on Linux systems."
MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.

View File

@@ -19,7 +19,8 @@
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
<div align="center">

View File

@@ -6,25 +6,23 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
```bash
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
docker build -t mineru-sglang:latest -f Dockerfile .
docker build -t mineru-vllm:latest -f Dockerfile .
```
> [!TIP]
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.10.post2-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.10.post2-cu128-b200` before executing the build operation.
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default. This version of vLLM v1 engine has limited support for GPU models.
> If you cannot use vLLM accelerated inference on Turing and earlier architecture GPUs, you can resolve this issue by changing the base image to `vllm/vllm-openai:v0.10.2`.
## Docker Description
MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
MinerU's Docker uses `vllm/vllm-openai` as the base image, so it includes the `vllm` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `vllm` to accelerate VLM model inference.
> [!NOTE]
> Requirements for using `sglang` to accelerate VLM model inference:
> Requirements for using `vllm` to accelerate VLM model inference:
>
> - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
> - The host machine's graphics driver should support CUDA 12.8 or higher; You can check the driver version using the `nvidia-smi` command.
> - Docker container must have access to the host machine's graphics devices.
>
> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
## Start Docker Container
@@ -33,12 +31,12 @@ docker run --gpus all \
--shm-size 32g \
-p 30000:30000 -p 7860:7860 -p 8000:8000 \
--ipc=host \
-it mineru-sglang:latest \
-it mineru-vllm:latest \
/bin/bash
```
After executing this command, you will enter the Docker container's interactive terminal with some ports mapped for potential services. You can directly run MinerU-related commands within the container to use MinerU's features.
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [Start the service via command](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [Start the service via command](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-http-clientserver).
## Start Services Directly with Docker Compose
@@ -53,19 +51,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
>
>- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
>- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
>- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
>- Due to the pre-allocation of GPU memory by the `vllm` inference acceleration framework, you may not be able to run multiple `vllm` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-vllm-server` service or using the `vlm-vllm-engine` backend.
---
### Start sglang-server service
connect to `sglang-server` via `vlm-sglang-client` backend
### Start vllm-server service
connect to `vllm-server` via `vlm-http-client` backend
```bash
docker compose -f compose.yaml --profile sglang-server up -d
docker compose -f compose.yaml --profile vllm-server up -d
```
>[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
>In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
> ```
---

View File

@@ -4,34 +4,26 @@ MinerU supports installing extension modules on demand based on different needs
## Common Scenarios
### Core Functionality Installation
The `core` module is the core dependency of MinerU, containing all functional modules except `sglang`. Installing this module ensures the basic functionality of MinerU works properly.
The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
```bash
uv pip install mineru[core]
```
---
### Using `sglang` to Accelerate VLM Model Inference
The `sglang` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
In the configuration, `all` includes both `core` and `sglang` modules, so `mineru[all]` and `mineru[core,sglang]` are equivalent.
### Using `vllm` to Accelerate VLM Model Inference
The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
```bash
uv pip install mineru[all]
```
> [!TIP]
> If exceptions occur during installation of the complete package including sglang, please refer to the [sglang official documentation](https://docs.sglang.ai/start/install.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
> If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
---
### Installing Lightweight Client to Connect to sglang-server
If you need to install a lightweight client on edge devices to connect to `sglang-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
### Installing Lightweight Client to Connect to vllm-server
If you need to install a lightweight client on edge devices to connect to `vllm-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
```bash
uv pip install mineru
```
---
### Using Pipeline Backend on Outdated Linux Systems
If your system is too outdated to meet the dependency requirements of `mineru[core]`, this option can minimally meet MinerU's runtime requirements, suitable for old systems that cannot be upgraded and only need to use the pipeline backend.
```bash
uv pip install mineru[pipeline_old_linux]
```

View File

@@ -31,7 +31,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
<td>Parsing Backend</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sglang</td>
<td>vlm-vllm</td>
</tr>
<tr>
<td>Operating System</td>
@@ -80,8 +80,8 @@ uv pip install -e .[core]
```
> [!TIP]
> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
> `mineru[core]` includes all core features except `vllm` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
> If you need to use `vllm` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
---

View File

@@ -51,14 +51,16 @@ The following sections provide detailed descriptions of each file's purpose and
## Structured Data Files
### Model Inference Results (model.json)
> [!IMPORTANT]
> The VLM backend output has significant changes in version 2.5 and is not backward-compatible with the pipeline backend. If you plan to build secondary development on structured outputs, please read this document carefully.
> [!NOTE]
> Only applicable to pipeline backend
### Pipeline Backend Output Results
#### Model Inference Results (model.json)
**File naming format**: `{original_filename}_model.json`
#### Data Structure Definition
##### Data Structure Definition
```python
from pydantic import BaseModel, Field
@@ -103,7 +105,7 @@ class PageInferenceResults(BaseModel):
inference_result: list[PageInferenceResults] = []
```
#### Coordinate System Description
##### Coordinate System Description
`poly` coordinate format: `[x0, y0, x1, y1, x2, y2, x3, y3]`
@@ -112,7 +114,7 @@ inference_result: list[PageInferenceResults] = []
![poly coordinate diagram](../images/poly.png)
#### Sample Data
##### Sample Data
```json
[
@@ -165,52 +167,11 @@ inference_result: list[PageInferenceResults] = []
]
```
### VLM Output Results (model_output.txt)
> [!NOTE]
> Only applicable to VLM backend
**File naming format**: `{original_filename}_model_output.txt`
#### File Format Description
- Uses `----` to separate output results for each page
- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
#### Field Meanings
| Tag | Format | Description |
|-----|--------|-------------|
| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
#### Supported Content Types
```json
{
"text": "Text",
"title": "Title",
"image": "Image",
"image_caption": "Image caption",
"image_footnote": "Image footnote",
"table": "Table",
"table_caption": "Table caption",
"table_footnote": "Table footnote",
"equation": "Interline formula"
}
```
#### Special Tags
- `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
### Intermediate Processing Results (middle.json)
#### Intermediate Processing Results (middle.json)
**File naming format**: `{original_filename}_middle.json`
#### Top-level Structure
##### Top-level Structure
| Field Name | Type | Description |
|------------|------|-------------|
@@ -218,22 +179,20 @@ inference_result: list[PageInferenceResults] = []
| `_backend` | `string` | Parsing mode: `pipeline` or `vlm` |
| `_version_name` | `string` | MinerU version number |
#### Page Information Structure (pdf_info)
##### Page Information Structure (pdf_info)
| Field Name | Description |
|------------|-------------|
| `preproc_blocks` | Unsegmented intermediate results after PDF preprocessing |
| `layout_bboxes` | Layout segmentation results, including layout direction and bounding boxes, sorted by reading order |
| `page_idx` | Page number, starting from 0 |
| `page_size` | Page width and height `[width, height]` |
| `_layout_tree` | Layout tree structure |
| `images` | Image block information list |
| `tables` | Table block information list |
| `interline_equations` | Interline formula block information list |
| `discarded_blocks` | Block information to be discarded |
| `para_blocks` | Content block results after segmentation |
#### Block Structure Hierarchy
##### Block Structure Hierarchy
```
Level 1 blocks (table | image)
@@ -242,7 +201,7 @@ Level 1 blocks (table | image)
└── Spans
```
#### Level 1 Block Fields
##### Level 1 Block Fields
| Field Name | Description |
|------------|-------------|
@@ -250,7 +209,7 @@ Level 1 blocks (table | image)
| `bbox` | Rectangular box coordinates of the block `[x0, y0, x1, y1]` |
| `blocks` | List of contained level 2 blocks |
#### Level 2 Block Fields
##### Level 2 Block Fields
| Field Name | Description |
|------------|-------------|
@@ -258,7 +217,7 @@ Level 1 blocks (table | image)
| `bbox` | Rectangular box coordinates of the block |
| `lines` | List of contained line information |
#### Level 2 Block Types
##### Level 2 Block Types
| Type | Description |
|------|-------------|
@@ -274,7 +233,7 @@ Level 1 blocks (table | image)
| `list` | List block |
| `interline_equation` | Interline formula block |
#### Line and Span Structure
##### Line and Span Structure
**Line fields**:
- `bbox`: Rectangular box coordinates of the line
@@ -285,7 +244,7 @@ Level 1 blocks (table | image)
- `type`: Span type (`image`, `table`, `text`, `inline_equation`, `interline_equation`)
- `content` | `img_path`: Text content or image path
#### Sample Data
##### Sample Data
```json
{
@@ -388,15 +347,15 @@ Level 1 blocks (table | image)
}
```
### Content List (content_list.json)
#### Content List (content_list.json)
**File naming format**: `{original_filename}_content_list.json`
#### Functionality
##### Functionality
This is a simplified version of `middle.json` that stores all readable content blocks in reading order as a flat structure, removing complex layout information for easier subsequent processing.
#### Content Types
##### Content Types
| Type | Description |
|------|-------------|
@@ -405,7 +364,7 @@ This is a simplified version of `middle.json` that stores all readable content b
| `text` | Text/Title |
| `equation` | Interline formula |
#### Text Level Identification
##### Text Level Identification
Text levels are distinguished through the `text_level` field:
@@ -414,12 +373,12 @@ Text levels are distinguished through the `text_level` field:
- `text_level: 2`: Level 2 heading
- And so on...
#### Common Fields
##### Common Fields
- All content blocks include a `page_idx` field indicating the page number (starting from 0).
- All content blocks include a `bbox` field representing the bounding box coordinates of the content block `[x0, y0, x1, y1]`, mapped to a range of 0-1000.
#### Sample Data
##### Sample Data
```json
[
@@ -438,10 +397,10 @@ Text levels are distinguished through the `text_level` field:
{
"type": "image",
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
"img_caption": [
"image_caption": [
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 19892000. "
],
"img_footnote": [],
"image_footnote": [],
"bbox": [
62,
480,
@@ -484,11 +443,270 @@ Text levels are distinguished through the `text_level` field:
]
```
### VLM Backend Output Results
#### Model Inference Results (model.json)
**File naming format**: `{original_filename}_model.json`
##### File format description
- Two-level nested list: outer list = pages; inner list = content blocks of that page
- Each block is a dict with at least: `type`, `bbox`, `angle`, `content` (some types add extra fields like `score`, `block_tags`, `content_tags`, `format`)
- Designed for direct, raw model inspection
##### Supported content types (type field values)
```json
{
"text": "Plain text",
"title": "Title",
"equation": "Display (interline) formula",
"image": "Image",
"image_caption": "Image caption",
"image_footnote": "Image footnote",
"table": "Table",
"table_caption": "Table caption",
"table_footnote": "Table footnote",
"phonetic": "Phonetic annotation",
"code": "Code block",
"code_caption": "Code caption",
"ref_text": "Reference / citation entry",
"algorithm": "Algorithm block (treated as code subtype)",
"list": "List container",
"header": "Page header",
"footer": "Page footer",
"page_number": "Page number",
"aside_text": "Side / margin note",
"page_footnote": "Page footnote"
}
```
##### Coordinate system
- `bbox` = `[x0, y0, x1, y1]` (top-left, bottom-right)
- Origin at top-left of the page
- All coordinates are normalized percentages in `[0,1]`
##### Sample data
```json
[
[
{
"type": "header",
"bbox": [0.077, 0.095, 0.18, 0.181],
"angle": 0,
"score": null,
"block_tags": null,
"content": "ELSEVIER",
"format": null,
"content_tags": null
},
{
"type": "title",
"bbox": [0.157, 0.228, 0.833, 0.253],
"angle": 0,
"score": null,
"block_tags": null,
"content": "The response of flow duration curves to afforestation",
"format": null,
"content_tags": null
}
]
]
```
#### Intermediate Processing Results (middle.json)
**File naming format**: `{original_filename}_middle.json`
Structure is broadly similar to the pipeline backend, but with these differences:
- `list` becomes a secondlevel block, a new field `sub_type` distinguishes list categories:
* `text`: ordinary list
* `ref_text`: reference / bibliography style list
- New `code` block type with `sub_type`(a code block always has at least a `code_body`, it may optionally have a `code_caption`):
* `code`
* `algorithm`
- `discarded_blocks` may contain additional types:
* `header`
* `footer`
* `page_number`
* `aside_text`
* `page_footnote`
- All blocks include an `angle` field indicating rotation (one of `0, 90, 180, 270`).
##### Examples
- Example: list block
```json
{
"bbox": [174,155,818,333],
"type": "list",
"angle": 0,
"index": 11,
"blocks": [
{
"bbox": [174,157,311,175],
"type": "text",
"angle": 0,
"lines": [
{
"bbox": [174,157,311,175],
"spans": [
{
"bbox": [174,157,311,175],
"type": "text",
"content": "H.1 Introduction"
}
]
}
],
"index": 3
},
{
"bbox": [175,182,464,229],
"type": "text",
"angle": 0,
"lines": [
{
"bbox": [175,182,464,229],
"spans": [
{
"bbox": [175,182,464,229],
"type": "text",
"content": "H.2 Example: Divide by Zero without Exception Handling"
}
]
}
],
"index": 4
}
],
"sub_type": "text"
}
```
- Example: code block with optional caption:
```json
{
"type": "code",
"bbox": [114,780,885,1231],
"blocks": [
{
"bbox": [114,780,885,1231],
"lines": [
{
"bbox": [114,780,885,1231],
"spans": [
{
"bbox": [114,780,885,1231],
"type": "text",
"content": "1 // Fig. H.1: DivideByZeroNoExceptionHandling.java \n2 // Integer division without exception handling. \n3 import java.util.Scanner; \n4 \n5 public class DivideByZeroNoExceptionHandling \n6 { \n7 // demonstrates throwing an exception when a divide-by-zero occurs \n8 public static int quotient( int numerator, int denominator ) \n9 { \n10 return numerator / denominator; // possible division by zero \n11 } // end method quotient \n12 \n13 public static void main(String[] args) \n14 { \n15 Scanner scanner = new Scanner(System.in); // scanner for input \n16 \n17 System.out.print(\"Please enter an integer numerator: \"); \n18 int numerator = scanner.nextInt(); \n19 System.out.print(\"Please enter an integer denominator: \"); \n20 int denominator = scanner.nextInt(); \n21"
}
]
}
],
"index": 17,
"angle": 0,
"type": "code_body"
},
{
"bbox": [867,160,1280,189],
"lines": [
{
"bbox": [867,160,1280,189],
"spans": [
{
"bbox": [867,160,1280,189],
"type": "text",
"content": "Algorithm 1 Modules for MCTSteg"
}
]
}
],
"index": 19,
"angle": 0,
"type": "code_caption"
}
],
"index": 17,
"sub_type": "code"
}
```
#### Content List (content_list.json)
**File naming format**: `{original_filename}_content_list.json`
Based on the pipeline format, with these VLM-specific extensions:
- New `code` type with `sub_type` (`code` | `algorithm`):
* Fields: `code_body` (string), optional `code_caption` (list of strings)
- New `list` type with `sub_type` (`text` | `ref_text`):
* Field: `list_items` (array of strings)
- All `discarded_blocks` entries are also output (e.g., headers, footers, page numbers, margin notes, page footnotes).
- Existing types (`image`, `table`, `text`, `equation`) remain unchanged.
- `bbox` still uses the 01000 normalized coordinate mapping.
##### Examples
Example: code (algorithm) entry
```json
{
"type": "code",
"sub_type": "algorithm",
"code_caption": ["Algorithm 1 Modules for MCTSteg"],
"code_body": "1: function GETCOORDINATE(d) \n2: $x \\gets d / l$ , $y \\gets d$ mod $l$ \n3: return $(x, y)$ \n4: end function \n5: function BESTCHILD(v) \n6: $C \\gets$ child set of $v$ \n7: $v' \\gets \\arg \\max_{c \\in C} \\mathrm{UCTScore}(c)$ \n8: $v'.n \\gets v'.n + 1$ \n9: return $v'$ \n10: end function \n11: function BACK PROPAGATE(v) \n12: Calculate $R$ using Equation 11 \n13: while $v$ is not a root node do \n14: $v.r \\gets v.r + R$ , $v \\gets v.p$ \n15: end while \n16: end function \n17: function RANDOMSEARCH(v) \n18: while $v$ is not a leaf node do \n19: Randomly select an untried action $a \\in A(v)$ \n20: Create a new node $v'$ \n21: $(x, y) \\gets \\mathrm{GETCOORDINATE}(v'.d)$ \n22: $v'.p \\gets v$ , $v'.d \\gets v.d + 1$ , $v'.\\Gamma \\gets v.\\Gamma$ \n23: $v'.\\gamma_{x,y} \\gets a$ \n24: if $a = -1$ then \n25: $v.lc \\gets v'$ \n26: else if $a = 0$ then \n27: $v.mc \\gets v'$ \n28: else \n29: $v.rc \\gets v'$ \n30: end if \n31: $v \\gets v'$ \n32: end while \n33: return $v$ \n34: end function \n35: function SEARCH(v) \n36: while $v$ is fully expanded do \n37: $v \\gets$ BESTCHILD(v) \n38: end while \n39: if $v$ is not a leaf node then \n40: $v \\gets$ RANDOMSEARCH(v) \n41: end if \n42: return $v$ \n43: end function",
"bbox": [510,87,881,740],
"page_idx": 0
}
```
Example: list (text) entry
```json
{
"type": "list",
"sub_type": "text",
"list_items": [
"H.1 Introduction",
"H.2 Example: Divide by Zero without Exception Handling",
"H.3 Example: Divide by Zero with Exception Handling",
"H.4 Summary"
],
"bbox": [174,155,818,333],
"page_idx": 0
}
```
Example: discarded blocks output
```json
[
{
"type": "header",
"text": "Journal of Hydrology 310 (2005) 253-265",
"bbox": [363,164,623,177],
"page_idx": 0
},
{
"type": "page_footnote",
"text": "* Corresponding author. Address: Forest Science Centre, Department of Sustainability and Environment, P.O. Box 137, Heidelberg, Vic. 3084, Australia. Tel.: +61 3 9450 8719; fax: +61 3 9450 8644.",
"bbox": [71,815,915,841],
"page_idx": 0
}
]
```
## Summary
The above files constitute MinerU's complete output results. Users can choose appropriate files for subsequent processing based on their needs:
- **Model outputs**: Use raw outputs (model.json, model_output.txt)
- **Debugging and verification**: Use visualization files (layout.pdf, spans.pdf)
- **Content extraction**: Use simplified files (*.md, content_list.json)
- **Secondary development**: Use structured files (middle.json)
- **Model outputs** (Use raw outputs):
* model.json
- **Debugging and verification** (Use visualization files):
* layout.pdf
* spans.pdf
- **Content extraction**: (Use simplified files):
* *.md
* content_list.json
- **Secondary development**: (Use structured files):
* middle.json

View File

@@ -1,25 +1,17 @@
# Advanced Command Line Parameters
## SGLang Acceleration Parameter Optimization
### Memory Optimization Parameters
> [!TIP]
> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
>
> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
## vllm Acceleration Parameter Optimization
### Performance Optimization Parameters
> [!TIP]
> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
> If you can already use vllm normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
>
> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
> - If you have multiple graphics cards, you can use vllm's multi-card parallel mode to increase throughput: `--data-parallel-size 2`
### Parameter Passing Instructions
> [!TIP]
> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
> - All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`
> - If you want to learn more about `vllm` parameter usage, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/cli/serve.html)
## GPU Device Selection and Configuration
@@ -29,7 +21,7 @@
> ```bash
> CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
> ```
> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
> - This specification method is effective for all command line calls, including `mineru`, `mineru-vllm-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
### Common Device Configuration Examples
> [!TIP]
@@ -46,14 +38,9 @@
> [!TIP]
> Here are some possible usage scenarios:
>
> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `vllm-server`, you can use the following command:
> ```bash
> CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
> ```
>
> - If you have multiple GPUs and need to specify GPU 03, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
> ```bash
> CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
> CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --port 30000 --data-parallel-size 2
> ```
>
> - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:

View File

@@ -11,11 +11,11 @@ Options:
-p, --path PATH Input file path or directory (required)
-o, --output PATH Output directory (required)
-m, --method [auto|txt|ocr] Parsing method: auto (default), txt, ocr (pipeline backend only)
-b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
-b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
Parsing backend (default: pipeline)
-l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
Specify document language (improves OCR accuracy, pipeline backend only)
-u, --url TEXT Service address when using sglang-client
-u, --url TEXT Service address when using http-client
-s, --start INTEGER Starting page number for parsing (0-based)
-e, --end INTEGER Ending page number for parsing (0-based)
-f, --formula BOOLEAN Enable formula parsing (default: enabled)
@@ -45,7 +45,7 @@ Options:
files to be input need to be placed in the
`example` folder within the directory where
the command is currently executed.
--enable-sglang-engine BOOLEAN Enable SgLang engine backend for faster
--enable-vllm-engine BOOLEAN Enable vllm engine backend for faster
processing.
--enable-api BOOLEAN Enable gradio API for serving the
application.
@@ -65,9 +65,38 @@ Options:
Some parameters of MinerU command line tools have equivalent environment variable configurations. Generally, environment variable configurations have higher priority than command line parameters and take effect across all command line tools.
Here are the environment variables and their descriptions:
- `MINERU_DEVICE_MODE`: Used to specify inference device, supports device types like `cpu/cuda/cuda:0/npu/mps`, only effective for `pipeline` backend.
- `MINERU_VIRTUAL_VRAM_SIZE`: Used to specify maximum GPU VRAM usage per process (GB), only effective for `pipeline` backend.
- `MINERU_MODEL_SOURCE`: Used to specify model source, supports `huggingface/modelscope/local`, defaults to `huggingface`, can be switched to `modelscope` or local models through environment variables.
- `MINERU_TOOLS_CONFIG_JSON`: Used to specify configuration file path, defaults to `mineru.json` in user directory, can specify other configuration file paths through environment variables.
- `MINERU_FORMULA_ENABLE`: Used to enable formula parsing, defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
- `MINERU_TABLE_ENABLE`: Used to enable table parsing, defaults to `true`, can be set to `false` through environment variables to disable table parsing.
- `MINERU_DEVICE_MODE`:
* Used to specify inference device
* supports device types like `cpu/cuda/cuda:0/npu/mps`
* only effective for `pipeline` backend.
- `MINERU_VIRTUAL_VRAM_SIZE`:
* Used to specify maximum GPU VRAM usage per process (GB)
* only effective for `pipeline` backend.
- `MINERU_MODEL_SOURCE`:
* Used to specify model source
* supports `huggingface/modelscope/local`
* defaults to `huggingface`, can be switched to `modelscope` or local models through environment variables.
- `MINERU_TOOLS_CONFIG_JSON`:
* Used to specify configuration file path
* defaults to `mineru.json` in user directory, can specify other configuration file paths through environment variables.
- `MINERU_FORMULA_ENABLE`:
* Used to enable formula parsing
* defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
- `MINERU_FORMULA_CH_SUPPORT`:
* Used to enable Chinese formula parsing optimization (experimental feature)
* Default is `false`, can be set to `true` via environment variable to enable Chinese formula parsing optimization.
* Only effective for `pipeline` backend.
- `MINERU_TABLE_ENABLE`:
* Used to enable table parsing
* Default is `true`, can be set to `false` via environment variable to disable table parsing.
- `MINERU_TABLE_MERGE_ENABLE`:
* Used to enable table merging functionality
* Default is `true`, can be set to `false` via environment variable to disable table merging functionality.

View File

@@ -29,11 +29,11 @@ mineru -p <input_path> -o <output_path>
mineru -p <input_path> -o <output_path> -b vlm-transformers
```
> [!TIP]
> The vlm backend additionally supports `sglang` acceleration. Compared to the `transformers` backend, `sglang` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `sglang` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
> The vlm backend additionally supports `vllm` acceleration. Compared to the `transformers` backend, `vllm` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `vllm` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
## Advanced Usage via API, WebUI, sglang-client/server
## Advanced Usage via API, WebUI, http-client/server
- Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
- FastAPI calls:
@@ -44,29 +44,29 @@ If you need to adjust parsing options through custom parameters, you can also ch
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
- Start Gradio WebUI visual frontend:
```bash
# Using pipeline/vlm-transformers/vlm-sglang-client backends
# Using pipeline/vlm-transformers/vlm-http-client backends
mineru-gradio --server-name 0.0.0.0 --server-port 7860
# Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
# Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
```
>[!TIP]
>
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
- Using `sglang-client/server` method:
- Using `http-client/server` method:
```bash
# Start sglang server (requires sglang environment)
mineru-sglang-server --port 30000
# Start vllm server (requires vllm environment)
mineru-vllm-server --port 30000
```
>[!TIP]
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
>In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
> ```
> [!NOTE]
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
> All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`.
> We have compiled some commonly used parameters and usage methods for `vllm`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
## Extending MinerU Functionality with Configuration Files
@@ -77,7 +77,16 @@ MinerU is now ready to use out of the box, but also supports extending functiona
Here are some available configuration options:
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
- `latex-delimiter-config`:
* Used to configure LaTeX formula delimiters
* Defaults to `$` symbol, can be modified to other symbols or strings as needed.
- `llm-aided-config`:
* Used to configure parameters for LLM-assisted title hierarchy
* Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model.
* You need to configure your own API key and set `enable` to `true` to enable this feature.
- `models-dir`:
* Used to specify local model storage directory
* Please specify model directories for `pipeline` and `vlm` backends separately.
* After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.

View File

@@ -14,18 +14,6 @@
参考:[#388](https://github.com/opendatalab/MinerU/issues/388)
??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28导致部分2019年之前发布的Linux发行版无法正常安装可通过如下命令安装:
```
conda create -n mineru python=3.11 -y
conda activate mineru
pip install -U "mineru[pipeline_old_linux]"
```
参考:[#1004](https://github.com/opendatalab/MinerU/issues/1004)
??? question "在 Linux 系统安装并使用时,解析结果缺失部份文字信息。"
MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎以解决AGPLv3的许可证问题在某些Linux发行版由于缺少CJK字体可能会在将PDF渲染成图片的过程中丢失部份文字。

View File

@@ -19,7 +19,8 @@
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjIzIiBoZWlnaHQ9IjIwMCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCiA8Zz4KICA8dGl0bGU+TGF5ZXIgMTwvdGl0bGU+CiAgPHBhdGggaWQ9InN2Z18xNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTAsODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTUiIGZpbGw9IiM2MjRhZmYiIGQ9Im05OS4xNCwxMTUuNDlsMjUuNjUsMGwwLDI1LjY1bC0yNS42NSwwbDAsLTI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTYiIGZpbGw9IiM2MjRhZmYiIGQ9Im0xNzYuMDksMTQxLjE0bC0yNS42NDk5OSwwbDAsMjIuMTlsNDcuODQsMGwwLC00Ny44NGwtMjIuMTksMGwwLDI1LjY1eiIvPgogIDxwYXRoIGlkPSJzdmdfMTciIGZpbGw9IiMzNmNmZDEiIGQ9Im0xMjQuNzksODkuODRsMjUuNjUsMGwwLDI1LjY0OTk5bC0yNS42NSwwbDAsLTI1LjY0OTk5eiIvPgogIDxwYXRoIGlkPSJzdmdfMTgiIGZpbGw9IiMzNmNmZDEiIGQ9Im0wLDY0LjE5bDI1LjY1LDBsMCwyNS42NWwtMjUuNjUsMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzE5IiBmaWxsPSIjNjI0YWZmIiBkPSJtMTk4LjI4LDg5Ljg0bDI1LjY0OTk5LDBsMCwyNS42NDk5OWwtMjUuNjQ5OTksMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIwIiBmaWxsPSIjMzZjZmQxIiBkPSJtMTk4LjI4LDY0LjE5bDI1LjY0OTk5LDBsMCwyNS42NWwtMjUuNjQ5OTksMGwwLC0yNS42NXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIxIiBmaWxsPSIjNjI0YWZmIiBkPSJtMTUwLjQ0LDQybDAsMjIuMTlsMjUuNjQ5OTksMGwwLDI1LjY1bDIyLjE5LDBsMCwtNDcuODRsLTQ3Ljg0LDB6Ii8+CiAgPHBhdGggaWQ9InN2Z18yMiIgZmlsbD0iIzM2Y2ZkMSIgZD0ibTczLjQ5LDg5Ljg0bDI1LjY1LDBsMCwyNS42NDk5OWwtMjUuNjUsMGwwLC0yNS42NDk5OXoiLz4KICA8cGF0aCBpZD0ic3ZnXzIzIiBmaWxsPSIjNjI0YWZmIiBkPSJtNDcuODQsNjQuMTlsMjUuNjUsMGwwLC0yMi4xOWwtNDcuODQsMGwwLDQ3Ljg0bDIyLjE5LDBsMCwtMjUuNjV6Ii8+CiAgPHBhdGggaWQ9InN2Z18yNCIgZmlsbD0iIzYyNGFmZiIgZD0ibTQ3Ljg0LDExNS40OWwtMjIuMTksMGwwLDQ3Ljg0bDQ3Ljg0LDBsMCwtMjIuMTlsLTI1LjY1LDBsMCwtMjUuNjV6Ii8+CiA8L2c+Cjwvc3ZnPg==&labelColor=white)](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAF8AAABYCAMAAACkl9t/AAAAk1BMVEVHcEz/nQv/nQv/nQr/nQv/nQr/nQv/nQv/nQr/wRf/txT/pg7/yRr/rBD/zRz/ngv/oAz/zhz/nwv/txT/ngv/0B3+zBz/nQv/0h7/wxn/vRb/thXkuiT/rxH/pxD/ogzcqyf/nQvTlSz/czCxky7/SjifdjT/Mj3+Mj3wMj15aTnDNz+DSD9RTUBsP0FRO0Q6O0WyIxEIAAAAGHRSTlMADB8zSWF3krDDw8TJ1NbX5efv8ff9/fxKDJ9uAAAGKklEQVR42u2Z63qjOAyGC4RwCOfB2JAGqrSb2WnTw/1f3UaWcSGYNKTdf/P+mOkTrE+yJBulvfvLT2A5ruenaVHyIks33npl/6C4s/ZLAM45SOi/1FtZPyFur1OYofBX3w7d54Bxm+E8db+nDr12ttmESZ4zludJEG5S7TO72YPlKZFyE+YCYUJTBZsMiNS5Sd7NlDmKM2Eg2JQg8awbglfqgbhArjxkS7dgp2RH6hc9AMLdZYUtZN5DJr4molC8BfKrEkPKEnEVjLbgW1fLy77ZVOJagoIcLIl+IxaQZGjiX597HopF5CkaXVMDO9Pyix3AFV3kw4lQLCbHuMovz8FallbcQIJ5Ta0vks9RnolbCK84BtjKRS5uA43hYoZcOBGIG2Epbv6CvFVQ8m8loh66WNySsnN7htL58LNp+NXT8/PhXiBXPMjLSxtwp8W9f/1AngRierBkA+kk/IpUSOeKByzn8y3kAAAfh//0oXgV4roHm/kz4E2z//zRc3/lgwBzbM2mJxQEa5pqgX7d1L0htrhx7LKxOZlKbwcAWyEOWqYSI8YPtgDQVjpB5nvaHaSnBaQSD6hweDi8PosxD6/PT09YY3xQA7LTCTKfYX+QHpA0GCcqmEHvr/cyfKQTEuwgbs2kPxJEB0iNjfJcCTPyocx+A0griHSmADiC91oNGVwJ69RudYe65vJmoqfpul0lrqXadW0jFKH5BKwAeCq+Den7s+3zfRJzA61/Uj/9H/VzLKTx9jFPPdXeeP+L7WEvDLAKAIoF8bPTKT0+TM7W8ePj3Rz/Yn3kOAp2f1Kf0Weony7pn/cPydvhQYV+eFOfmOu7VB/ViPe34/EN3RFHY/yRuT8ddCtMPH/McBAT5s+vRde/gf2c/sPsjLK+m5IBQF5tO+h2tTlBGnP6693JdsvofjOPnnEHkh2TnV/X1fBl9S5zrwuwF8NFrAVJVwCAPTe8gaJlomqlp0pv4Pjn98tJ/t/fL++6unpR1YGC2n/KCoa0tTLoKiEeUPDl94nj+5/Tv3/eT5vBQ60X1S0oZr+IWRR8Ldhu7AlLjPISlJcO9vrFotky9SpzDequlwEir5beYAc0R7D9KS1DXva0jhYRDXoExPdc6yw5GShkZXe9QdO/uOvHofxjrV/TNS6iMJS+4TcSTgk9n5agJdBQbB//IfF/HpvPt3Tbi7b6I6K0R72p6ajryEJrENW2bbeVUGjfgoals4L443c7BEE4mJO2SpbRngxQrAKRudRzGQ8jVOL2qDVjjI8K1gc3TIJ5KiFZ1q+gdsARPB4NQS4AjwVSt72DSoXNyOWUrU5mQ9nRYyjp89Xo7oRI6Bga9QNT1mQ/ptaJq5T/7WcgAZywR/XlPGAUDdet3LE+qS0TI+g+aJU8MIqjo0Kx8Ly+maxLjJmjQ18rA0YCkxLQbUZP1WqdmyQGJLUm7VnQFqodmXSqmRrdVpqdzk5LvmvgtEcW8PMGdaS23EOWyDVbACZzUJPaqMbjDxpA3Qrgl0AikimGDbqmyT8P8NOYiqrldF8rX+YN7TopX4UoHuSCYY7cgX4gHwclQKl1zhx0THf+tCAUValzjI7Wg9EhptrkIcfIJjA94evOn8B2eHaVzvBrnl2ig0So6hvPaz0IGcOvTHvUIlE2+prqAxLSQxZlU2stql1NqCCLdIiIN/i1DBEHUoElM9dBravbiAnKqgpi4IBkw+utSPIoBijDXJipSVV7MpOEJUAc5Qmm3BnUN+w3hteEieYKfRZSIUcXKMVf0u5wD4EwsUNVvZOtUT7A2GkffHjByWpHqvRBYrTV72a6j8zZ6W0DTE86Hn04bmyWX3Ri9WH7ZU6Q7h+ZHo0nHUAcsQvVhXRDZHChwiyi/hnPuOsSEF6Exk3o6Y9DT1eZ+6cASXk2Y9k+6EOQMDGm6WBK10wOQJCBwren86cPPWUcRAnTVjGcU1LBgs9FURiX/e6479yZcLwCBmTxiawEwrOcleuu12t3tbLv/N4RLYIBhYexm7Fcn4OJcn0+zc+s8/VfPeddZHAGN6TT8eGczHdR/Gts1/MzDkThr23zqrVfAMFT33Nx1RJsx1k5zuWILLnG/vsH+Fv5D4NTVcp1Gzo8AAAAAElFTkSuQmCC&labelColor=white)](https://huggingface.co/spaces/opendatalab/MinerU)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[![arXiv](https://img.shields.io/badge/arXiv-2409.18839-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2409.18839)
[![arXiv](https://img.shields.io/badge/MinerU2.5-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.22186)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/opendatalab/MinerU)
<div align="center">

View File

@@ -6,24 +6,23 @@ MinerU提供了便捷的docker部署方式这有助于快速搭建环境并
```bash
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
docker build -t mineru-sglang:latest -f Dockerfile .
docker build -t mineru-vllm:latest -f Dockerfile .
```
> [!TIP]
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.10.post2-cu126`作为基础镜像支持Turing/Ampere/Ada Lovelace/Hopper平台
> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.10.post2-cu128-b200` 再执行build操作
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像
> 该版本的vLLM v1 engine对显卡型号支持有限如您无法在Turing及更早架构的显卡上使用vLLM加速推理可通过更改基础镜像为`vllm/vllm-openai:v0.10.2`来解决该问题
## Docker说明
Mineru的docker使用了`lmsysorg/sglang`作为基础镜像因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`sglang`加速VLM模型推理。
Mineru的docker使用了`vllm/vllm-openai`作为基础镜像因此在docker中默认集成了`vllm`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`vllm`加速VLM模型推理。
> [!NOTE]
> 使用`sglang`加速VLM模型推理需要满足的条件是
> 使用`vllm`加速VLM模型推理需要满足的条件是
>
> - 设备包含Turing及以后架构的显卡且可用显存大于等于8G。
> - 物理机的显卡驱动应支持CUDA 12.6或更高版本`Blackwell`平台应支持CUDA 12.8及更高版本,可通过`nvidia-smi`命令检查驱动版本。
> - 物理机的显卡驱动应支持CUDA 12.8或更高版本,可通过`nvidia-smi`命令检查驱动版本。
> - docker中能够访问物理机的显卡设备。
>
> 如果您的设备不满足上述条件您仍然可以使用MinerU的其他功能但无法使用`sglang`加速VLM模型推理即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
## 启动 Docker 容器
@@ -32,12 +31,12 @@ docker run --gpus all \
--shm-size 32g \
-p 30000:30000 -p 7860:7860 -p 8000:8000 \
--ipc=host \
-it mineru-sglang:latest \
-it mineru-vllm:latest \
/bin/bash
```
执行该命令后您将进入到Docker容器的交互式终端并映射了一些端口用于可能会使用的服务您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)。
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuihttp-clientserver)。
## 通过 Docker Compose 直接启动服务
@@ -51,19 +50,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
>
>- `compose.yaml`文件中包含了MinerU的多个服务配置您可以根据需要选择启动特定的服务。
>- 不同的服务可能会有额外的参数配置,您可以在`compose.yaml`文件中查看并编辑。
>- 由于`sglang`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`sglang`服务,因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时,其他可能使用显存的服务已停止。
>- 由于`vllm`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`vllm`服务,因此请确保在启动`vlm-vllm-server`服务或使用`vlm-vllm-engine`后端时,其他可能使用显存的服务已停止。
---
### 启动 sglang-server 服务
并通过`vlm-sglang-client`后端连接`sglang-server`
### 启动 vllm-server 服务
并通过`vlm-http-client`后端连接`vllm-server`
```bash
docker compose -f compose.yaml --profile sglang-server up -d
docker compose -f compose.yaml --profile vllm-server up -d
```
>[!TIP]
>在另一个终端中通过sglang client连接sglang server只需cpu与网络不需要sglang环境)
>在另一个终端中通过http client连接vllm server只需cpu与网络不需要vllm环境)
> ```bash
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
> ```
---

View File

@@ -4,34 +4,26 @@ MinerU 支持根据不同需求,按需安装扩展模块,以增强功能或
## 常见场景
### 核心功能安装
`core` 模块是 MinerU 的核心依赖,包含了除`sglang`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
`core` 模块是 MinerU 的核心依赖,包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
```bash
uv pip install mineru[core]
```
---
### 使用`sglang`加速 VLM 模型推理
`sglang` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡8G 显存及以上)。安装此模块可以显著提升模型推理速度。
在配置中,`all`包含了`core``sglang`模块,因此`mineru[all]``mineru[core,sglang]`是等价的。
### 使用`vllm`加速 VLM 模型推理
`vllm` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡8G 显存及以上)。安装此模块可以显著提升模型推理速度。
在配置中,`all`包含了`core``vllm`模块,因此`mineru[all]``mineru[core,vllm]`是等价的。
```bash
uv pip install mineru[all]
```
> [!TIP]
> 如在安装包含sglang的完整包过程中发生异常,请参考 [sglang 官方文档](https://docs.sglang.ai/start/install.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
> 如在安装包含vllm的完整包过程中发生异常,请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
---
### 安装轻量版client连接sglang-server使用
如果您需要在边缘设备上安装轻量版的 client 端以连接 `sglang-server`可以安装mineru的基础包非常轻量适合在只有cpu和网络连接的设备上使用。
### 安装轻量版client连接vllm-server使用
如果您需要在边缘设备上安装轻量版的 client 端以连接 `vllm-server`可以安装mineru的基础包非常轻量适合在只有cpu和网络连接的设备上使用。
```bash
uv pip install mineru
```
---
### 在过时的linux系统上使用pipeline后端
如果您的系统过于陈旧,无法满足`mineru[core]`的依赖要求,该选项可以最低限度的满足 MinerU 的运行需求,适用于老旧系统无法升级且仅需使用 pipeline 后端的场景。
```bash
uv pip install mineru[pipeline_old_linux]
```

View File

@@ -31,7 +31,7 @@
<td>解析后端</td>
<td>pipeline</td>
<td>vlm-transformers</td>
<td>vlm-sglang</td>
<td>vlm-vllm</td>
</tr>
<tr>
<td>操作系统</td>
@@ -80,8 +80,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
```
> [!TIP]
> `mineru[core]`包含除`sglang`加速外的所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您有使用`sglang`加速VLM模型推理或是在边缘设备安装轻量版client端等需求可以参考文档[扩展模块安装指南](./extension_modules.md)。
> `mineru[core]`包含除`vllm`加速外的所有核心功能兼容Windows / Linux / macOS系统适合绝大多数用户。
> 如果您有使用`vllm`加速VLM模型推理或是在边缘设备安装轻量版client端等需求可以参考文档[扩展模块安装指南](./extension_modules.md)。
---

View File

@@ -51,14 +51,16 @@
## 结构化数据文件
### 模型推理结果 (model.json)
> [!IMPORTANT]
> 2.5版本vlm后端的输出存在较大变化与pipeline版本存在不兼容情况如需基于结构化输出进行二次开发请仔细阅读本文档内容。
> [!NOTE]
> 仅适用于 pipeline 后端
### pipeline 后端 输出结果
#### 模型推理结果 (model.json)
**文件命名格式**`{原文件名}_model.json`
#### 数据结构定义
##### 数据结构定义
```python
from pydantic import BaseModel, Field
@@ -103,7 +105,7 @@ class PageInferenceResults(BaseModel):
inference_result: list[PageInferenceResults] = []
```
#### 坐标系统说明
##### 坐标系统说明
`poly` 坐标格式:`[x0, y0, x1, y1, x2, y2, x3, y3]`
@@ -112,7 +114,7 @@ inference_result: list[PageInferenceResults] = []
![poly 坐标示意图](../images/poly.png)
#### 示例数据
##### 示例数据
```json
[
@@ -165,52 +167,11 @@ inference_result: list[PageInferenceResults] = []
]
```
### VLM 输出结果 (model_output.txt)
> [!NOTE]
> 仅适用于 VLM 后端
**文件命名格式**`{原文件名}_model_output.txt`
#### 文件格式说明
- 使用 `----` 分割每一页的输出结果
- 每页包含多个以 `<|box_start|>` 开头、`<|md_end|>` 结尾的文本块
#### 字段含义
| 标记 | 格式 | 说明 |
|------|---|------|
| 边界框 | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | 四边形坐标(左上、右下两点),页面缩放至 1000×1000 后的坐标值 |
| 类型标记 | `<\|ref_start\|>type<\|ref_end\|>` | 内容块类型标识 |
| 内容 | `<\|md_start\|>markdown内容<\|md_end\|>` | 该块的 Markdown 内容 |
#### 支持的内容类型
```json
{
"text": "文本",
"title": "标题",
"image": "图片",
"image_caption": "图片描述",
"image_footnote": "图片脚注",
"table": "表格",
"table_caption": "表格描述",
"table_footnote": "表格脚注",
"equation": "行间公式"
}
```
#### 特殊标记
- `<|txt_contd|>`:出现在文本末尾,表示该文本块可与后续文本块连接
- 表格内容采用 `otsl` 格式,需转换为 HTML 才能在 Markdown 中渲染
### 中间处理结果 (middle.json)
#### 中间处理结果 (middle.json)
**文件命名格式**`{原文件名}_middle.json`
#### 顶层结构
##### 顶层结构
| 字段名 | 类型 | 说明 |
|--------|------|------|
@@ -218,22 +179,20 @@ inference_result: list[PageInferenceResults] = []
| `_backend` | `string` | 解析模式:`pipeline``vlm` |
| `_version_name` | `string` | MinerU 版本号 |
#### 页面信息结构 (pdf_info)
##### 页面信息结构 (pdf_info)
| 字段名 | 说明 |
|--------|------|
| `preproc_blocks` | PDF 预处理后的未分段中间结果 |
| `layout_bboxes` | 布局分割结果,包含布局方向和边界框,按阅读顺序排序 |
| `page_idx` | 页码,从 0 开始 |
| `page_size` | 页面的宽度和高度 `[width, height]` |
| `_layout_tree` | 布局树状结构 |
| `images` | 图片块信息列表 |
| `tables` | 表格块信息列表 |
| `interline_equations` | 行间公式块信息列表 |
| `discarded_blocks` | 需要丢弃的块信息 |
| `para_blocks` | 分段后的内容块结果 |
#### 块结构层次
##### 块结构层次
```
一级块 (table | image)
@@ -242,7 +201,7 @@ inference_result: list[PageInferenceResults] = []
└── 片段 (span)
```
#### 一级块字段
##### 一级块字段
| 字段名 | 说明 |
|--------|------|
@@ -250,7 +209,7 @@ inference_result: list[PageInferenceResults] = []
| `bbox` | 块的矩形框坐标 `[x0, y0, x1, y1]` |
| `blocks` | 包含的二级块列表 |
#### 二级块字段
##### 二级块字段
| 字段名 | 说明 |
|--------|------|
@@ -258,7 +217,7 @@ inference_result: list[PageInferenceResults] = []
| `bbox` | 块的矩形框坐标 |
| `lines` | 包含的行信息列表 |
#### 二级块类型
##### 二级块类型
| 类型 | 说明 |
|------|------|
@@ -274,7 +233,7 @@ inference_result: list[PageInferenceResults] = []
| `list` | 列表块 |
| `interline_equation` | 行间公式块 |
#### 行和片段结构
##### 行和片段结构
**行 (line) 字段**
- `bbox`:行的矩形框坐标
@@ -285,7 +244,7 @@ inference_result: list[PageInferenceResults] = []
- `type`:片段类型(`image``table``text``inline_equation``interline_equation`
- `content` | `img_path`:文本内容或图片路径
#### 示例数据
##### 示例数据
```json
{
@@ -388,15 +347,15 @@ inference_result: list[PageInferenceResults] = []
}
```
### 内容列表 (content_list.json)
#### 内容列表 (content_list.json)
**文件命名格式**`{原文件名}_content_list.json`
#### 功能说明
##### 功能说明
这是一个简化版的 `middle.json`,按阅读顺序平铺存储所有可读内容块,去除了复杂的布局信息,便于后续处理。
#### 内容类型
##### 内容类型
| 类型 | 说明 |
|------|------|
@@ -405,7 +364,7 @@ inference_result: list[PageInferenceResults] = []
| `text` | 文本/标题 |
| `equation` | 行间公式 |
#### 文本层级标识
##### 文本层级标识
通过 `text_level` 字段区分文本层级:
@@ -414,12 +373,12 @@ inference_result: list[PageInferenceResults] = []
- `text_level: 2`:二级标题
- 以此类推...
#### 通用字段
##### 通用字段
- 所有内容块都包含 `page_idx` 字段,表示所在页码(从 0 开始)。
- 所有内容块都包含 `bbox` 字段,表示内容块的边界框坐标 `[x0, y0, x1, y1]` 映射在0-1000范围内的结果。
#### 示例数据
##### 示例数据
```json
[
@@ -438,10 +397,10 @@ inference_result: list[PageInferenceResults] = []
{
"type": "image",
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
"img_caption": [
"image_caption": [
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 19892000. "
],
"img_footnote": [],
"image_footnote": [],
"bbox": [
62,
480,
@@ -484,11 +443,385 @@ inference_result: list[PageInferenceResults] = []
]
```
### VLM 后端 输出结果
#### 模型推理结果 (model.json)
**文件命名格式**`{原文件名}_model.json`
##### 文件格式说明
- 该文件为 VLM 模型的原始输出结果包含两层嵌套list外层表示页面内层表示该页的内容块
- 每个内容块都是一个dict包含 `type``bbox``angle``content` 字段
##### 支持的内容类型
```json
{
"text": "文本",
"title": "标题",
"equation": "行间公式",
"image": "图片",
"image_caption": "图片描述",
"image_footnote": "图片脚注",
"table": "表格",
"table_caption": "表格描述",
"table_footnote": "表格脚注",
"phonetic": "拼音",
"code": "代码块",
"code_caption": "代码描述",
"ref_text": "参考文献",
"algorithm": "算法块",
"list": "列表",
"header": "页眉",
"footer": "页脚",
"page_number": "页码",
"aside_text": "装订线旁注",
"page_footnote": "页面脚注"
}
```
##### 坐标系统说明
`bbox` 坐标格式:`[x0, y0, x1, y1]`
- 分别表示左上、右下两点的坐标
- 坐标原点在页面左上角
- 坐标为相对于原始页面尺寸的百分比范围在0-1之间
##### 示例数据
```json
[
[
{
"type": "header",
"bbox": [
0.077,
0.095,
0.18,
0.181
],
"angle": 0,
"score": null,
"block_tags": null,
"content": "ELSEVIER",
"format": null,
"content_tags": null
},
{
"type": "title",
"bbox": [
0.157,
0.228,
0.833,
0.253
],
"angle": 0,
"score": null,
"block_tags": null,
"content": "The response of flow duration curves to afforestation",
"format": null,
"content_tags": null
}
]
]
```
#### 中间处理结果 (middle.json)
**文件命名格式**`{原文件名}_middle.json`
##### 文件格式说明
vlm 后端的 middle.json 文件结构与 pipeline 后端类似,但存在以下差异:
- list变成二级block增加`sub_type`字段区分list类型:
* `text`(文本类型)
* `ref_text`(引用类型)
- 增加code类型blockcode类型包含两种"sub_type":
* 分别是`code``algorithm`
* 至少有`code_body`, 可选`code_caption`
- `discarded_blocks`内元素type增加以下类型:
* `header`(页眉)
* `footer`(页脚)
* `page_number`(页码)
* `aside_text`(装订线文本)
* `page_footnote`(脚注)
- 所有block增加`angle`字段用来表示旋转角度090180270
##### 示例数据
- list block 示例
```json
{
"bbox": [
174,
155,
818,
333
],
"type": "list",
"angle": 0,
"index": 11,
"blocks": [
{
"bbox": [
174,
157,
311,
175
],
"type": "text",
"angle": 0,
"lines": [
{
"bbox": [
174,
157,
311,
175
],
"spans": [
{
"bbox": [
174,
157,
311,
175
],
"type": "text",
"content": "H.1 Introduction"
}
]
}
],
"index": 3
},
{
"bbox": [
175,
182,
464,
229
],
"type": "text",
"angle": 0,
"lines": [
{
"bbox": [
175,
182,
464,
229
],
"spans": [
{
"bbox": [
175,
182,
464,
229
],
"type": "text",
"content": "H.2 Example: Divide by Zero without Exception Handling"
}
]
}
],
"index": 4
}
],
"sub_type": "text"
}
```
- code block 示例
```json
{
"type": "code",
"bbox": [
114,
780,
885,
1231
],
"blocks": [
{
"bbox": [
114,
780,
885,
1231
],
"lines": [
{
"bbox": [
114,
780,
885,
1231
],
"spans": [
{
"bbox": [
114,
780,
885,
1231
],
"type": "text",
"content": "1 // Fig. H.1: DivideByZeroNoExceptionHandling.java \n2 // Integer division without exception handling. \n3 import java.util.Scanner; \n4 \n5 public class DivideByZeroNoExceptionHandling \n6 { \n7 // demonstrates throwing an exception when a divide-by-zero occurs \n8 public static int quotient( int numerator, int denominator ) \n9 { \n10 return numerator / denominator; // possible division by zero \n11 } // end method quotient \n12 \n13 public static void main(String[] args) \n14 { \n15 Scanner scanner = new Scanner(System.in); // scanner for input \n16 \n17 System.out.print(\"Please enter an integer numerator: \"); \n18 int numerator = scanner.nextInt(); \n19 System.out.print(\"Please enter an integer denominator: \"); \n20 int denominator = scanner.nextInt(); \n21"
}
]
}
],
"index": 17,
"angle": 0,
"type": "code_body"
},
{
"bbox": [
867,
160,
1280,
189
],
"lines": [
{
"bbox": [
867,
160,
1280,
189
],
"spans": [
{
"bbox": [
867,
160,
1280,
189
],
"type": "text",
"content": "Algorithm 1 Modules for MCTSteg"
}
]
}
],
"index": 19,
"angle": 0,
"type": "code_caption"
}
],
"index": 17,
"sub_type": "code"
}
```
#### 内容列表 (content_list.json)
**文件命名格式**`{原文件名}_content_list.json`
##### 文件格式说明
vlm 后端的 content_list.json 文件结构与 pipeline 后端类似伴随本次middle.json的变化做了以下调整
- 新增`code`类型code类型包含两种"sub_type":
* 分别是`code`和`algorithm`
* 至少有`code_body`, 可选`code_caption`
- 新增`list`类型list类型包含两种"sub_type":
* `text`
* `ref_text`
- 增加所有所有`discarded_blocks`的输出内容
* `header`
* `footer`
* `page_number`
* `aside_text`
* `page_footnote`
##### 示例数据
- code 类型 content
```json
{
"type": "code",
"sub_type": "algorithm",
"code_caption": [
"Algorithm 1 Modules for MCTSteg"
],
"code_body": "1: function GETCOORDINATE(d) \n2: $x \\gets d / l$ , $y \\gets d$ mod $l$ \n3: return $(x, y)$ \n4: end function \n5: function BESTCHILD(v) \n6: $C \\gets$ child set of $v$ \n7: $v' \\gets \\arg \\max_{c \\in C} \\mathrm{UCTScore}(c)$ \n8: $v'.n \\gets v'.n + 1$ \n9: return $v'$ \n10: end function \n11: function BACK PROPAGATE(v) \n12: Calculate $R$ using Equation 11 \n13: while $v$ is not a root node do \n14: $v.r \\gets v.r + R$ , $v \\gets v.p$ \n15: end while \n16: end function \n17: function RANDOMSEARCH(v) \n18: while $v$ is not a leaf node do \n19: Randomly select an untried action $a \\in A(v)$ \n20: Create a new node $v'$ \n21: $(x, y) \\gets \\mathrm{GETCOORDINATE}(v'.d)$ \n22: $v'.p \\gets v$ , $v'.d \\gets v.d + 1$ , $v'.\\Gamma \\gets v.\\Gamma$ \n23: $v'.\\gamma_{x,y} \\gets a$ \n24: if $a = -1$ then \n25: $v.lc \\gets v'$ \n26: else if $a = 0$ then \n27: $v.mc \\gets v'$ \n28: else \n29: $v.rc \\gets v'$ \n30: end if \n31: $v \\gets v'$ \n32: end while \n33: return $v$ \n34: end function \n35: function SEARCH(v) \n36: while $v$ is fully expanded do \n37: $v \\gets$ BESTCHILD(v) \n38: end while \n39: if $v$ is not a leaf node then \n40: $v \\gets$ RANDOMSEARCH(v) \n41: end if \n42: return $v$ \n43: end function",
"bbox": [
510,
87,
881,
740
],
"page_idx": 0
}
```
- list 类型 content
```json
{
"type": "list",
"sub_type": "text",
"list_items": [
"H.1 Introduction",
"H.2 Example: Divide by Zero without Exception Handling",
"H.3 Example: Divide by Zero with Exception Handling",
"H.4 Summary"
],
"bbox": [
174,
155,
818,
333
],
"page_idx": 0
}
```
- discarded 类型 content
```json
[{
"type": "header",
"text": "Journal of Hydrology 310 (2005) 253-265",
"bbox": [
363,
164,
623,
177
],
"page_idx": 0
},
{
"type": "page_footnote",
"text": "* Corresponding author. Address: Forest Science Centre, Department of Sustainability and Environment, P.O. Box 137, Heidelberg, Vic. 3084, Australia. Tel.: +61 3 9450 8719; fax: +61 3 9450 8644.",
"bbox": [
71,
815,
915,
841
],
"page_idx": 0
}]
```
## 总结
以上文件为 MinerU 的完整输出结果,用户可根据需要选择合适的文件进行后续处理:
- **模型输出**使用原始输出model.json、model_output.txt
- **调试和验证**使用可视化文件layout.pdf、spans.pdf
- **内容提取**:使用简化文件(*.md、content_list.json
- **二次开发**使用结构化文件middle.json
- **模型输出**(使用原始输出):
* model.json
- **调试和验证**(使用可视化文件):
* layout.pdf
* spans.pdf
- **内容提取**(使用简化文件):
* *.md
* content_list.json
- **二次开发**(使用结构化文件):
* middle.json

Some files were not shown because too many files have changed in this diff Show More