Compare commits
462 Commits
release-2.
...
release-2.
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a0da3029fd | ||
|
|
30fe325428 | ||
|
|
6131013ce9 | ||
|
|
f1c145054a | ||
|
|
078aaaf150 | ||
|
|
c3a55fffab | ||
|
|
4eddf28c8f | ||
|
|
dd92c5b723 | ||
|
|
b5922086cb | ||
|
|
df12e4fc79 | ||
|
|
90ed311198 | ||
|
|
c922c63fbc | ||
|
|
28b278508f | ||
|
|
6b54f321b4 | ||
|
|
e47ec7cd10 | ||
|
|
701f6018f2 | ||
|
|
5ade203e31 | ||
|
|
6e83f37754 | ||
|
|
972161a991 | ||
|
|
700e11d342 | ||
|
|
fd79885b23 | ||
|
|
a0810b5b6e | ||
|
|
39271b45de | ||
|
|
db68aaf4ac | ||
|
|
a6cc8fa90d | ||
|
|
47f34f4ce8 | ||
|
|
b7a8347f45 | ||
|
|
c6d241f4f4 | ||
|
|
06b2fda1c1 | ||
|
|
5c1ca9271e | ||
|
|
e7485c5d79 | ||
|
|
80436a89f9 | ||
|
|
b36793cef0 | ||
|
|
43b51e78fc | ||
|
|
9688f73046 | ||
|
|
c02edd9cba | ||
|
|
b4d08e994c | ||
|
|
a220b8a208 | ||
|
|
ab480a7a86 | ||
|
|
f57a6d8d9e | ||
|
|
915ba87f7d | ||
|
|
42a95e8e20 | ||
|
|
a513357607 | ||
|
|
c8ccf4cf20 | ||
|
|
33d43a5afc | ||
|
|
3b057c7996 | ||
|
|
34547262a2 | ||
|
|
cd0ed982c0 | ||
|
|
52dcbcbfa5 | ||
|
|
0758de6d24 | ||
|
|
ae7892a6f9 | ||
|
|
73567ccedc | ||
|
|
bb552282f3 | ||
|
|
14c38101f7 | ||
|
|
cb3a30e9ad | ||
|
|
f4db41d0cb | ||
|
|
dad59f7d52 | ||
|
|
499e877165 | ||
|
|
2d249666ba | ||
|
|
cedc62a728 | ||
|
|
1e40bac24f | ||
|
|
23701d0db4 | ||
|
|
e7d8bf097a | ||
|
|
08a89aeca1 | ||
|
|
1b724f3336 | ||
|
|
ea4271ab37 | ||
|
|
d83b83a5ad | ||
|
|
0853b84e87 | ||
|
|
36225160a3 | ||
|
|
a36118f8ba | ||
|
|
a38384e7fb | ||
|
|
4b7c2bbcc0 | ||
|
|
504fe6ada3 | ||
|
|
39be54023b | ||
|
|
484ff5a6f9 | ||
|
|
59a7a577b3 | ||
|
|
0e73ef9615 | ||
|
|
d580d6c7f8 | ||
|
|
4c8bb038ce | ||
|
|
a89715b9a2 | ||
|
|
f05ea7c2e6 | ||
|
|
b68db3ab90 | ||
|
|
3539cfba36 | ||
|
|
3bf50d5267 | ||
|
|
2108019698 | ||
|
|
17a9921ba9 | ||
|
|
3baee1d077 | ||
|
|
e1ee728e31 | ||
|
|
1b45e6e1bc | ||
|
|
966aadd1d3 | ||
|
|
ecb8e3f0ac | ||
|
|
1bef6e3526 | ||
|
|
4c4d1d0f95 | ||
|
|
c36aa54370 | ||
|
|
4b480cfcf7 | ||
|
|
7e18e1bb76 | ||
|
|
44fdeb663f | ||
|
|
cf59949ba9 | ||
|
|
c8c2f28afc | ||
|
|
aa4bc6259b | ||
|
|
b7e4ea0b49 | ||
|
|
998197a47f | ||
|
|
3c8b6e6b6b | ||
|
|
be42b46ff9 | ||
|
|
7c689e33b8 | ||
|
|
af66bc02c2 | ||
|
|
752f75ad8e | ||
|
|
1cfde98585 | ||
|
|
54676295d5 | ||
|
|
61c7c65d8b | ||
|
|
6f05f735d0 | ||
|
|
befb16e531 | ||
|
|
abc433d6f2 | ||
|
|
e7c1385068 | ||
|
|
342c5aa34a | ||
|
|
f25ddfa024 | ||
|
|
e31de3a453 | ||
|
|
2f01754410 | ||
|
|
8a9921fb22 | ||
|
|
652e11a253 | ||
|
|
61cc6886fe | ||
|
|
80dc57e7ce | ||
|
|
d84a006f6d | ||
|
|
2c5361bf8e | ||
|
|
eb01b7acf9 | ||
|
|
5656f1363b | ||
|
|
c9315b8e10 | ||
|
|
907099762f | ||
|
|
2c356cccee | ||
|
|
0f62f166e6 | ||
|
|
c7a64e72dc | ||
|
|
3cb3a94830 | ||
|
|
8301fa4c20 | ||
|
|
4400f4b75f | ||
|
|
92efb8f96e | ||
|
|
9a88cbfb09 | ||
|
|
e96e4a0ce4 | ||
|
|
c7bde0ab39 | ||
|
|
8754c24e42 | ||
|
|
4f8c00cc34 | ||
|
|
89681f98ad | ||
|
|
66d328dbc5 | ||
|
|
f0c1318545 | ||
|
|
6e97f3cf70 | ||
|
|
aede62167e | ||
|
|
5f2740f743 | ||
|
|
a888d2b625 | ||
|
|
4275876331 | ||
|
|
ec9f7f54ab | ||
|
|
7861e5e369 | ||
|
|
159f3a89a3 | ||
|
|
d9452bbeb9 | ||
|
|
d808a32c0b | ||
|
|
12ce3bd024 | ||
|
|
e3d7aece50 | ||
|
|
7c55a0ea65 | ||
|
|
f1659eb7a7 | ||
|
|
c6bffd9382 | ||
|
|
857dcb2ef5 | ||
|
|
ef69f98cd6 | ||
|
|
6d5d1cf26b | ||
|
|
7c481796f8 | ||
|
|
7d62b7b7cc | ||
|
|
5a0cf9af7f | ||
|
|
f5e0e67545 | ||
|
|
a4cac624df | ||
|
|
e1eb318b9b | ||
|
|
31834b1e68 | ||
|
|
100ace2e99 | ||
|
|
6aac639686 | ||
|
|
82f94a9a84 | ||
|
|
d928334c61 | ||
|
|
ebad82bd8c | ||
|
|
b03c5fb449 | ||
|
|
c343afd20c | ||
|
|
6586c7c01e | ||
|
|
304a6d9d8c | ||
|
|
bce9bb6d1d | ||
|
|
920220e48e | ||
|
|
9fc3d6c742 | ||
|
|
8fd544273e | ||
|
|
72f1f5f935 | ||
|
|
5559a4701a | ||
|
|
437022abfa | ||
|
|
4653ed1502 | ||
|
|
b58c7f8d6e | ||
|
|
f6133b1731 | ||
|
|
12d72c7c17 | ||
|
|
5f3f35c009 | ||
|
|
16ad71446b | ||
|
|
d4b364eb9f | ||
|
|
446188adf4 | ||
|
|
ff90c600aa | ||
|
|
3f2c7e5e7c | ||
|
|
2ba1c35fbd | ||
|
|
d3f92a0b20 | ||
|
|
4b6f151351 | ||
|
|
5fcd428cb5 | ||
|
|
5db08afef6 | ||
|
|
6b182f8378 | ||
|
|
ae9526127f | ||
|
|
39790095bf | ||
|
|
fef3081bdf | ||
|
|
5425da9571 | ||
|
|
9af1824328 | ||
|
|
e47b19c416 | ||
|
|
5646f46606 | ||
|
|
9d5568a9cb | ||
|
|
ec3549702f | ||
|
|
d185d1822b | ||
|
|
4864a086ce | ||
|
|
f736e29cc0 | ||
|
|
34fab4f5b8 | ||
|
|
2496875c33 | ||
|
|
ec4cc37861 | ||
|
|
c2208d84cb | ||
|
|
cdc025a9ec | ||
|
|
cdbe6ba9b6 | ||
|
|
75f576ad0c | ||
|
|
52844f0794 | ||
|
|
8d178b2b7e | ||
|
|
1083476a02 | ||
|
|
da29782a26 | ||
|
|
75797a3b7c | ||
|
|
5b73b89ceb | ||
|
|
c5b2926c7b | ||
|
|
8bb8b715c1 | ||
|
|
3ca520a3fe | ||
|
|
ba36a94aa0 | ||
|
|
11ebb47891 | ||
|
|
dd8dd5197b | ||
|
|
7a71cfe288 | ||
|
|
bba31191a4 | ||
|
|
9041f04588 | ||
|
|
69a9d11b0b | ||
|
|
36e7267ce1 | ||
|
|
14f347d613 | ||
|
|
6ea2cfeb21 | ||
|
|
078099f19d | ||
|
|
25d4a4588a | ||
|
|
679dad3aac | ||
|
|
e60da65cca | ||
|
|
f081d36a3a | ||
|
|
c74e712918 | ||
|
|
f2b944ab06 | ||
|
|
2e945adcc0 | ||
|
|
39eaf31fb9 | ||
|
|
7717534ea7 | ||
|
|
6166b98cd4 | ||
|
|
a02ab97ea0 | ||
|
|
beadb7a689 | ||
|
|
de5449fd40 | ||
|
|
76f74e7c70 | ||
|
|
efbf1422c6 | ||
|
|
3ec6479462 | ||
|
|
80e6f4ded4 | ||
|
|
376b5d924a | ||
|
|
6608615012 | ||
|
|
12dea70793 | ||
|
|
96a0a45c9a | ||
|
|
745954ca08 | ||
|
|
e120a90d11 | ||
|
|
8c75e0fce2 | ||
|
|
978c94f680 | ||
|
|
c4eae4e0ef | ||
|
|
411f3b7855 | ||
|
|
60e257e5f1 | ||
|
|
20e1dfe984 | ||
|
|
f2553dd89a | ||
|
|
b35c3345c0 | ||
|
|
af3ee06aa3 | ||
|
|
4f6ac22ce6 | ||
|
|
0f47a22bb3 | ||
|
|
2ca6ee1708 | ||
|
|
55eaad224d | ||
|
|
bb94e73fc9 | ||
|
|
70f62046e7 | ||
|
|
fd38cdff80 | ||
|
|
d30f762ac8 | ||
|
|
f65ff12eea | ||
|
|
8b8ac3e62e | ||
|
|
473154c2b3 | ||
|
|
e2fd491760 | ||
|
|
c29e2d0ca2 | ||
|
|
a5687394d5 | ||
|
|
13819c0596 | ||
|
|
d775f76eec | ||
|
|
5dd73dbcca | ||
|
|
3eda0d10a0 | ||
|
|
e0c3cbb34a | ||
|
|
d2fcdd0fa4 | ||
|
|
af887d63c0 | ||
|
|
b5a69c5258 | ||
|
|
ecfb4a03fb | ||
|
|
0bbefad67b | ||
|
|
a9f28b4436 | ||
|
|
05a9920ffe | ||
|
|
d96c3fc4d2 | ||
|
|
64e12cb924 | ||
|
|
29e37933aa | ||
|
|
287e5b6cfc | ||
|
|
9003f50a22 | ||
|
|
cb4d1cceb3 | ||
|
|
b670ebdd63 | ||
|
|
82323549c3 | ||
|
|
f24a30714f | ||
|
|
c497e4b1fc | ||
|
|
719154fe21 | ||
|
|
6ac9ebb3da | ||
|
|
30dce2063f | ||
|
|
41017331c6 | ||
|
|
d9618d9107 | ||
|
|
3da1ed8443 | ||
|
|
32a4bed808 | ||
|
|
244a1d9161 | ||
|
|
9b0c88a489 | ||
|
|
45a8ca81e8 | ||
|
|
06a158e56b | ||
|
|
bae254fa72 | ||
|
|
aa39e61fef | ||
|
|
733cbca6dd | ||
|
|
3bff7cd017 | ||
|
|
7a2286890b | ||
|
|
1bf3817be7 | ||
|
|
b8730977e5 | ||
|
|
28d0360ec3 | ||
|
|
d0e68a3018 | ||
|
|
2c8accf9d0 | ||
|
|
62fb477beb | ||
|
|
65a8097704 | ||
|
|
254a0a483b | ||
|
|
c65fb7de8a | ||
|
|
33f4a21ae8 | ||
|
|
7cb9d67ea4 | ||
|
|
a746bac44b | ||
|
|
98a7d66d28 | ||
|
|
650ff3c683 | ||
|
|
a084153411 | ||
|
|
90047f9bd5 | ||
|
|
43e5b8da0e | ||
|
|
7d3a76f80f | ||
|
|
b3a3c2ccd2 | ||
|
|
7a1603978f | ||
|
|
3e88f78c5c | ||
|
|
cd9ae14d1e | ||
|
|
959163a5b5 | ||
|
|
f1fb900ea5 | ||
|
|
be587e31fa | ||
|
|
396cf8b81d | ||
|
|
0bb4238114 | ||
|
|
10bb08c875 | ||
|
|
98c8761361 | ||
|
|
65b2ddc07f | ||
|
|
832d28e512 | ||
|
|
0641cc07f7 | ||
|
|
1671e68367 | ||
|
|
320cd60c81 | ||
|
|
c8ff2f2778 | ||
|
|
3a33acaeb0 | ||
|
|
2fcffcb0af | ||
|
|
ffb2ffcd76 | ||
|
|
da1431558a | ||
|
|
51a6077876 | ||
|
|
532cfd20f8 | ||
|
|
8634e0b51c | ||
|
|
cf2b74b030 | ||
|
|
c8a17c5f98 | ||
|
|
512f40fdfb | ||
|
|
193d5d8e44 | ||
|
|
17a7758fee | ||
|
|
9d10bb13f5 | ||
|
|
58cccf0825 | ||
|
|
3b9221de18 | ||
|
|
4a237eef36 | ||
|
|
a5b09b8479 | ||
|
|
2803ad4dd6 | ||
|
|
a73de7746a | ||
|
|
b0d40dd236 | ||
|
|
dde265f148 | ||
|
|
aad384f2e7 | ||
|
|
60dd005dd5 | ||
|
|
dee840afc7 | ||
|
|
aeacfc8d50 | ||
|
|
b54dc524bf | ||
|
|
c3db578247 | ||
|
|
2b7eb741dc | ||
|
|
27e2ea44b1 | ||
|
|
f0126cfc23 | ||
|
|
efa4a5b7f1 | ||
|
|
bb933ff9f6 | ||
|
|
80c1b995bc | ||
|
|
ca155df027 | ||
|
|
c85d5d271a | ||
|
|
91826697c9 | ||
|
|
0137913fd2 | ||
|
|
a086cfad0d | ||
|
|
de41fa5859 | ||
|
|
30b698ecc5 | ||
|
|
5c00dcaee7 | ||
|
|
c7e456033d | ||
|
|
7676543ff8 | ||
|
|
9ca1bf232b | ||
|
|
6081d01da1 | ||
|
|
8c8ac2c667 | ||
|
|
8d2871e827 | ||
|
|
1cd85ccfae | ||
|
|
1e18361273 | ||
|
|
866ad6ae51 | ||
|
|
0a91a69596 | ||
|
|
84a9e5f13e | ||
|
|
ac15ad0e61 | ||
|
|
dfbccbc624 | ||
|
|
e96eac481e | ||
|
|
97d6ff955d | ||
|
|
7e5b1a45d7 | ||
|
|
4b71e1855b | ||
|
|
29754f7c17 | ||
|
|
5db0f7b9ce | ||
|
|
139d9b15bc | ||
|
|
d58a440ffc | ||
|
|
5f9cb12fd8 | ||
|
|
26aa3d81e2 | ||
|
|
34360ba642 | ||
|
|
a01c4bbe66 | ||
|
|
706eadbf5d | ||
|
|
c702302684 | ||
|
|
eb67c36a81 | ||
|
|
948e4aefff | ||
|
|
0eec0d90b5 | ||
|
|
6978f09be0 | ||
|
|
f3c933770a | ||
|
|
1d55925954 | ||
|
|
e00c090616 | ||
|
|
21ce40a90c | ||
|
|
11c4a0c6b6 | ||
|
|
433b37589c | ||
|
|
e429c5a840 | ||
|
|
ff11e602fc | ||
|
|
f5afd61eb0 | ||
|
|
2ccad698d1 | ||
|
|
99d1fddc8c | ||
|
|
8b6d217efe | ||
|
|
324681f77c | ||
|
|
133984a514 | ||
|
|
fc5179ce5e | ||
|
|
86b20f3283 | ||
|
|
0298de844f | ||
|
|
5be16aa4cb | ||
|
|
2a8e6f9d45 | ||
|
|
fdf7e4f771 | ||
|
|
c98cba1e30 | ||
|
|
7d0c39df3b | ||
|
|
d6c8199326 | ||
|
|
be4f3de32b | ||
|
|
176bf3d845 | ||
|
|
20e8d7fcd7 | ||
|
|
6a9035bdf9 | ||
|
|
865b44a517 | ||
|
|
6db7df1cde | ||
|
|
ebb5e317db | ||
|
|
bc17d77fa9 | ||
|
|
bf5b750565 |
16
.github/ISSUE_TEMPLATE/bug_report.yml
vendored
@@ -122,7 +122,21 @@ body:
|
||||
#multiple: false
|
||||
options:
|
||||
-
|
||||
- "2.0.x"
|
||||
- "<2.2.0"
|
||||
- "2.2.x"
|
||||
- ">=2.5"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: dropdown
|
||||
id: backend_name
|
||||
attributes:
|
||||
label: Backend name | 解析后端
|
||||
#multiple: false
|
||||
options:
|
||||
-
|
||||
- "vlm"
|
||||
- "pipeline"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
|
||||
254
README.md
@@ -1,7 +1,7 @@
|
||||
<div align="center" xmlns="http://www.w3.org/1999/html">
|
||||
<!-- logo -->
|
||||
<p align="center">
|
||||
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
</p>
|
||||
|
||||
<!-- icon -->
|
||||
@@ -18,7 +18,8 @@
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
|
||||
@@ -37,54 +38,198 @@
|
||||
<!-- join us -->
|
||||
|
||||
<p align="center">
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="http://mineru.space/s/V85Yl" target="_blank">WeChat</a>
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="https://mineru.net/community-portal/?aliasId=3c430f94" target="_blank">WeChat</a>
|
||||
</p>
|
||||
|
||||
</div>
|
||||
|
||||
# Changelog
|
||||
- 2025/08/01 2.1.10 Released
|
||||
- Fixed an issue in the `pipeline` backend where block overlap caused the parsing results to deviate from expectations #3232
|
||||
- 2025/07/30 2.1.9 Released
|
||||
- `transformers` 4.54.1 version adaptation
|
||||
- 2025/07/28 2.1.8 Released
|
||||
- `sglang` 0.4.9.post5 version adaptation
|
||||
- 2025/07/27 2.1.7 Released
|
||||
- `transformers` 4.54.0 version adaptation
|
||||
- 2025/07/26 2.1.6 Released
|
||||
- Fixed table parsing issues in handwritten documents when using `vlm` backend
|
||||
- Fixed visualization box position drift issue when document is rotated #3175
|
||||
- 2025/07/24 2.1.5 Released
|
||||
- `sglang` 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3
|
||||
- 2025/07/23 2.1.4 Released
|
||||
- Bug Fixes
|
||||
- Fixed the issue of excessive memory consumption during the `MFR` step in the `pipeline` backend under certain scenarios #2771
|
||||
- Fixed the inaccurate matching between `image`/`table` and `caption`/`footnote` under certain conditions #3129
|
||||
- 2025/07/16 2.1.1 Released
|
||||
- Bug fixes
|
||||
- Fixed text block content loss issue that could occur in certain `pipeline` scenarios #3005
|
||||
- Fixed issue where `sglang-client` required unnecessary packages like `torch` #2968
|
||||
- Updated `dockerfile` to fix incomplete text content parsing due to missing fonts in Linux #2915
|
||||
- Usability improvements
|
||||
- Updated `compose.yaml` to facilitate direct startup of `sglang-server`, `mineru-api`, and `mineru-gradio` services
|
||||
- Launched brand new [online documentation site](https://opendatalab.github.io/MinerU/), simplified readme, providing better documentation experience
|
||||
- 2025/07/05 Version 2.1.0 Released
|
||||
- This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
|
||||
- **Performance Optimizations:**
|
||||
- Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).
|
||||
- Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages).
|
||||
- Layout analysis speed of the `pipeline` backend has been increased by approximately 20%.
|
||||
- **Experience Enhancements:**
|
||||
- Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
|
||||
- Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer).
|
||||
- Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`.
|
||||
- Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files).
|
||||
- **New Features:**
|
||||
- Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
|
||||
- Introduced limited support for vertical text layout in the `pipeline` backend.
|
||||
- 2025/10/24 2.6.0 Release
|
||||
- `pipeline` backend optimizations
|
||||
- Added experimental support for Chinese formulas, which can be enabled by setting the environment variable `export MINERU_FORMULA_CH_SUPPORT=1`. This feature may cause a slight decrease in MFR speed and failures in recognizing some long formulas. It is recommended to enable it only when parsing Chinese formulas is needed. To disable this feature, set the environment variable to `0`.
|
||||
- `OCR` speed significantly improved by 200%~300%, thanks to the optimization solution provided by @cjsdurj
|
||||
- `OCR` models updated to `ppocr-v5` version for Cyrillic, Arabic, Devanagari, Telugu (te), and Tamil (ta) languages, with accuracy improved by over 40% compared to previous models
|
||||
- `vlm` backend optimizations
|
||||
- `table_caption` and `table_footnote` matching logic optimized to improve the accuracy of table caption and footnote matching and reading order rationality in scenarios with multiple consecutive tables on a page
|
||||
- Optimized CPU resource usage during high concurrency when using `vllm` backend, reducing server pressure
|
||||
- Adapted to `vllm` version 0.11.0
|
||||
- General optimizations
|
||||
- Cross-page table merging effect optimized, added support for cross-page continuation table merging, improving table merging effectiveness in multi-column merge scenarios
|
||||
- Added environment variable configuration option `MINERU_TABLE_MERGE_ENABLE` for table merging feature. Table merging is enabled by default and can be disabled by setting this variable to `0`
|
||||
|
||||
- 2025/09/26 2.5.4 released
|
||||
- 🎉🎉 The MinerU2.5 [Technical Report](https://arxiv.org/abs/2509.22186) is now available! We welcome you to read it for a comprehensive overview of its model architecture, training strategy, data engineering and evaluation results.
|
||||
- Fixed an issue where some `PDF` files were mistakenly identified as `AI` files, causing parsing failures
|
||||
|
||||
- 2025/09/20 2.5.3 Released
|
||||
- Dependency version range adjustment to enable Turing and earlier architecture GPUs to use vLLM acceleration for MinerU2.5 model inference.
|
||||
- `pipeline` backend compatibility fixes for torch 2.8.0.
|
||||
- Reduced default concurrency for vLLM async backend to lower server pressure and avoid connection closure issues caused by high load.
|
||||
- More compatibility-related details can be found in the [announcement](https://github.com/opendatalab/MinerU/discussions/3548)
|
||||
|
||||
- 2025/09/19 2.5.2 Released
|
||||
|
||||
We are officially releasing MinerU2.5, currently the most powerful multimodal large model for document parsing.
|
||||
With only 1.2B parameters, MinerU2.5's accuracy on the OmniDocBench benchmark comprehensively surpasses top-tier multimodal models like Gemini 2.5 Pro, GPT-4o, and Qwen2.5-VL-72B. It also significantly outperforms leading specialized models such as dots.ocr, MonkeyOCR, and PP-StructureV3.
|
||||
The model has been released on [HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B) and [ModelScope](https://modelscope.cn/models/opendatalab/MinerU2.5-2509-1.2B) platforms. Welcome to download and use!
|
||||
- Core Highlights:
|
||||
- SOTA Performance with Extreme Efficiency: As a 1.2B model, it achieves State-of-the-Art (SOTA) results that exceed models in the 10B and 100B+ classes, redefining the performance-per-parameter standard in document AI.
|
||||
- Advanced Architecture for Across-the-Board Leadership: By combining a two-stage inference pipeline (decoupling layout analysis from content recognition) with a native high-resolution architecture, it achieves SOTA performance across five key areas: layout analysis, text recognition, formula recognition, table recognition, and reading order.
|
||||
- Key Capability Enhancements:
|
||||
- Layout Detection: Delivers more complete results by accurately covering non-body content like headers, footers, and page numbers. It also provides more precise element localization and natural format reconstruction for lists and references.
|
||||
- Table Parsing: Drastically improves parsing for challenging cases, including rotated tables, borderless/semi-structured tables, and long/complex tables.
|
||||
- Formula Recognition: Significantly boosts accuracy for complex, long-form, and hybrid Chinese-English formulas, greatly enhancing the parsing capability for mathematical documents.
|
||||
|
||||
Additionally, with the release of vlm 2.5, we have made some adjustments to the repository:
|
||||
- The vlm backend has been upgraded to version 2.5, supporting the MinerU2.5 model and no longer compatible with the MinerU2.0-2505-0.9B model. The last version supporting the 2.0 model is mineru-2.2.2.
|
||||
- VLM inference-related code has been moved to [mineru_vl_utils](https://github.com/opendatalab/mineru-vl-utils), reducing coupling with the main mineru repository and facilitating independent iteration in the future.
|
||||
- The vlm accelerated inference framework has been switched from `sglang` to `vllm`, achieving full compatibility with the vllm ecosystem, allowing users to use the MinerU2.5 model and accelerated inference on any platform that supports the vllm framework.
|
||||
- Due to major upgrades in the vlm model supporting more layout types, we have made some adjustments to the structure of the parsing intermediate file `middle.json` and result file `content_list.json`. Please refer to the [documentation](https://opendatalab.github.io/MinerU/reference/output_files/) for details.
|
||||
|
||||
Other repository optimizations:
|
||||
- Removed file extension whitelist validation for input files. When input files are PDF documents or images, there are no longer requirements for file extensions, improving usability.
|
||||
|
||||
<details>
|
||||
<summary>History Log</summary>
|
||||
|
||||
<details>
|
||||
<summary>2025/09/10 2.2.2 Released</summary>
|
||||
<ul>
|
||||
<li>Fixed the issue where the new table recognition model would affect the overall parsing task when some table parsing failed</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/09/08 2.2.1 Released</summary>
|
||||
<ul>
|
||||
<li>Fixed the issue where some newly added models were not downloaded when using the model download command.</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/09/05 2.2.0 Released</summary>
|
||||
<ul>
|
||||
<li>
|
||||
Major Updates
|
||||
<ul>
|
||||
<li>In this version, we focused on improving table parsing accuracy by introducing a new <a href="https://github.com/RapidAI/TableStructureRec">wired table recognition model</a> and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the <code>pipeline</code> backend.</li>
|
||||
<li>We also added support for cross-page table merging, which is supported by both <code>pipeline</code> and <code>vlm</code> backends, further improving the completeness and accuracy of table parsing.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>
|
||||
Other Updates
|
||||
<ul>
|
||||
<li>The <code>pipeline</code> backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations</li>
|
||||
<li><code>pipeline</code> added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5)</li>
|
||||
<li>Added <code>bbox</code> field (mapped to 0-1000 range) in the output <code>content_list.json</code>, making it convenient for users to directly obtain position information for each content block</li>
|
||||
<li>Removed the <code>pipeline_old_linux</code> installation option, no longer supporting legacy Linux systems such as <code>CentOS 7</code>, to provide better support for <code>uv</code>'s <code>sync</code>/<code>run</code> commands</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/08/01 2.1.10 Released</summary>
|
||||
<ul>
|
||||
<li>Fixed an issue in the <code>pipeline</code> backend where block overlap caused the parsing results to deviate from expectations #3232</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/30 2.1.9 Released</summary>
|
||||
<ul>
|
||||
<li><code>transformers</code> 4.54.1 version adaptation</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/28 2.1.8 Released</summary>
|
||||
<ul>
|
||||
<li><code>sglang</code> 0.4.9.post5 version adaptation</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/27 2.1.7 Released</summary>
|
||||
<ul>
|
||||
<li><code>transformers</code> 4.54.0 version adaptation</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/26 2.1.6 Released</summary>
|
||||
<ul>
|
||||
<li>Fixed table parsing issues in handwritten documents when using <code>vlm</code> backend</li>
|
||||
<li>Fixed visualization box position drift issue when document is rotated #3175</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/24 2.1.5 Released</summary>
|
||||
<ul>
|
||||
<li><code>sglang</code> 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/23 2.1.4 Released</summary>
|
||||
<ul>
|
||||
<li><strong>Bug Fixes</strong>
|
||||
<ul>
|
||||
<li>Fixed the issue of excessive memory consumption during the <code>MFR</code> step in the <code>pipeline</code> backend under certain scenarios #2771</li>
|
||||
<li>Fixed the inaccurate matching between <code>image</code>/<code>table</code> and <code>caption</code>/<code>footnote</code> under certain conditions #3129</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/16 2.1.1 Released</summary>
|
||||
<ul>
|
||||
<li><strong>Bug fixes</strong>
|
||||
<ul>
|
||||
<li>Fixed text block content loss issue that could occur in certain <code>pipeline</code> scenarios #3005</li>
|
||||
<li>Fixed issue where <code>sglang-client</code> required unnecessary packages like <code>torch</code> #2968</li>
|
||||
<li>Updated <code>dockerfile</code> to fix incomplete text content parsing due to missing fonts in Linux #2915</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><strong>Usability improvements</strong>
|
||||
<ul>
|
||||
<li>Updated <code>compose.yaml</code> to facilitate direct startup of <code>sglang-server</code>, <code>mineru-api</code>, and <code>mineru-gradio</code> services</li>
|
||||
<li>Launched brand new <a href="https://opendatalab.github.io/MinerU/">online documentation site</a>, simplified readme, providing better documentation experience</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/05 2.1.0 Released</summary>
|
||||
<ul>
|
||||
<li>This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:</li>
|
||||
<li><strong>Performance Optimizations:</strong>
|
||||
<ul>
|
||||
<li>Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).</li>
|
||||
<li>Greatly enhanced post-processing speed when the <code>pipeline</code> backend handles batch processing of documents with fewer pages (<10 pages).</li>
|
||||
<li>Layout analysis speed of the <code>pipeline</code> backend has been increased by approximately 20%.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><strong>Experience Enhancements:</strong>
|
||||
<ul>
|
||||
<li>Built-in ready-to-use <code>fastapi service</code> and <code>gradio webui</code>. For detailed usage instructions, please refer to <a href="https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver">Documentation</a>.</li>
|
||||
<li>Adapted to <code>sglang</code> version <code>0.4.8</code>, significantly reducing the GPU memory requirements for the <code>vlm-sglang</code> backend. It can now run on graphics cards with as little as <code>8GB GPU memory</code> (Turing architecture or newer).</li>
|
||||
<li>Added transparent parameter passing for all commands related to <code>sglang</code>, allowing the <code>sglang-engine</code> backend to receive all <code>sglang</code> parameters consistently with the <code>sglang-server</code>.</li>
|
||||
<li>Supports feature extensions based on configuration files, including <code>custom formula delimiters</code>, <code>enabling heading classification</code>, and <code>customizing local model directories</code>. For detailed usage instructions, please refer to <a href="https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files">Documentation</a>.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><strong>New Features:</strong>
|
||||
<ul>
|
||||
<li>Updated the <code>pipeline</code> backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. <a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html">Details</a></li>
|
||||
<li>Introduced limited support for vertical text layout in the <code>pipeline</code> backend.</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/06/20 2.0.6 Released</summary>
|
||||
<ul>
|
||||
@@ -479,7 +624,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
|
||||
<td>Parsing Backend</td>
|
||||
<td>pipeline</td>
|
||||
<td>vlm-transformers</td>
|
||||
<td>vlm-sglang</td>
|
||||
<td>vlm-vllm</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Operating System</td>
|
||||
@@ -528,8 +673,8 @@ uv pip install -e .[core]
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
|
||||
> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
|
||||
> `mineru[core]` includes all core features except `vLLM` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
|
||||
> If you need to use `vLLM` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](https://opendatalab.github.io/MinerU/quick_start/extension_modules/).
|
||||
|
||||
---
|
||||
|
||||
@@ -557,8 +702,8 @@ You can use MinerU for PDF parsing through various methods such as command line,
|
||||
- [x] Handwritten Text Recognition
|
||||
- [x] Vertical Text Recognition
|
||||
- [x] Latin Accent Mark Recognition
|
||||
- [ ] Code block recognition in the main text
|
||||
- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
|
||||
- [x] Code block recognition in the main text
|
||||
- [x] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)(mineru.net)
|
||||
- [ ] Geometric shape recognition
|
||||
|
||||
# Known Issues
|
||||
@@ -576,7 +721,7 @@ You can use MinerU for PDF parsing through various methods such as command line,
|
||||
|
||||
- If you encounter any issues during usage, you can first check the [FAQ](https://opendatalab.github.io/MinerU/faq/) for solutions.
|
||||
- If your issue remains unresolved, you may also use [DeepWiki](https://deepwiki.com/opendatalab/MinerU) to interact with an AI assistant, which can address most common problems.
|
||||
- If you still cannot resolve the issue, you are welcome to join our community via [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) to discuss with other users and developers.
|
||||
- If you still cannot resolve the issue, you are welcome to join our community via [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](https://mineru.net/community-portal/?aliasId=3c430f94) to discuss with other users and developers.
|
||||
|
||||
# All Thanks To Our Contributors
|
||||
|
||||
@@ -596,6 +741,7 @@ Currently, some models in this project are trained based on YOLO. However, since
|
||||
- [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
|
||||
- [UniMERNet](https://github.com/opendatalab/UniMERNet)
|
||||
- [RapidTable](https://github.com/RapidAI/RapidTable)
|
||||
- [TableStructureRec](https://github.com/RapidAI/TableStructureRec)
|
||||
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
||||
- [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
|
||||
- [layoutreader](https://github.com/ppaanngggg/layoutreader)
|
||||
@@ -605,10 +751,21 @@ Currently, some models in this project are trained based on YOLO. However, since
|
||||
- [pdftext](https://github.com/datalab-to/pdftext)
|
||||
- [pdfminer.six](https://github.com/pdfminer/pdfminer.six)
|
||||
- [pypdf](https://github.com/py-pdf/pypdf)
|
||||
- [magika](https://github.com/google/magika)
|
||||
|
||||
# Citation
|
||||
|
||||
```bibtex
|
||||
@misc{niu2025mineru25decoupledvisionlanguagemodel,
|
||||
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
|
||||
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
|
||||
year={2025},
|
||||
eprint={2509.22186},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2509.22186},
|
||||
}
|
||||
|
||||
@misc{wang2024mineruopensourcesolutionprecise,
|
||||
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
|
||||
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
|
||||
@@ -647,3 +804,4 @@ Currently, some models in this project are trained based on YOLO. However, since
|
||||
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
|
||||
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
|
||||
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
|
||||
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
|
||||
|
||||
254
README_zh-CN.md
@@ -1,7 +1,7 @@
|
||||
<div align="center" xmlns="http://www.w3.org/1999/html">
|
||||
<!-- logo -->
|
||||
<p align="center">
|
||||
<img src="docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
<img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;">
|
||||
</p>
|
||||
|
||||
<!-- icon -->
|
||||
@@ -18,7 +18,8 @@
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
|
||||
@@ -37,54 +38,196 @@
|
||||
<!-- join us -->
|
||||
|
||||
<p align="center">
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="http://mineru.space/s/V85Yl" target="_blank">WeChat</a>
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="https://mineru.net/community-portal/?aliasId=3c430f94" target="_blank">WeChat</a>
|
||||
</p>
|
||||
|
||||
</div>
|
||||
|
||||
# 更新记录
|
||||
- 2025/08/01 2.1.10 发布
|
||||
- 修复`pipeline`后端因block覆盖导致的解析结果与预期不符 #3232
|
||||
- 2025/07/30 2.1.9 发布
|
||||
- `transformers` 4.54.1 版本适配
|
||||
- 2025/07/28 2.1.8 发布
|
||||
- `sglang` 0.4.9.post5 版本适配
|
||||
- 2025/07/27 2.1.7 发布
|
||||
- `transformers` 4.54.0 版本适配
|
||||
- 2025/07/26 2.1.6 发布
|
||||
- 修复`vlm`后端解析部分手写文档时的表格异常问题
|
||||
- 修复文档旋转时可视化框位置漂移问题 #3175
|
||||
- 2025/07/24 2.1.5 发布
|
||||
- `sglang` 0.4.9 版本适配,同步升级dockerfile基础镜像为sglang 0.4.9.post3
|
||||
- 2025/07/23 2.1.4 发布
|
||||
- bug修复
|
||||
- 修复`pipeline`后端中`MFR`步骤在某些情况下显存消耗过大的问题 #2771
|
||||
- 修复某些情况下`image`/`table`与`caption`/`footnote`匹配不准确的问题 #3129
|
||||
- 2025/07/16 2.1.1 发布
|
||||
- bug修复
|
||||
- 修复`pipeline`在某些情况可能发生的文本块内容丢失问题 #3005
|
||||
- 修复`sglang-client`需要安装`torch`等不必要的包的问题 #2968
|
||||
- 更新`dockerfile`以修复linux字体缺失导致的解析文本内容不完整问题 #2915
|
||||
- 易用性更新
|
||||
- 更新`compose.yaml`,便于用户直接启动`sglang-server`、`mineru-api`、`mineru-gradio`服务
|
||||
- 启用全新的[在线文档站点](https://opendatalab.github.io/MinerU/zh/),简化readme,提供更好的文档体验
|
||||
- 2025/07/05 2.1.0 发布
|
||||
- 这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下:
|
||||
- 性能优化:
|
||||
- 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度
|
||||
- 大幅提升`pipeline`后端批量处理大量页数较少(<10)文档时的后处理速度
|
||||
- `pipeline`后端的layout分析速度提升约20%
|
||||
- 体验优化:
|
||||
- 内置开箱即用的`fastapi服务`和`gradio webui`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)
|
||||
- `sglang`适配`0.4.8`版本,大幅降低`vlm-sglang`后端的显存要求,最低可在`8G显存`(Turing及以后架构)的显卡上运行
|
||||
- 对所有命令增加`sglang`的参数透传,使得`sglang-engine`后端可以与`sglang-server`一致,接收`sglang`的所有参数
|
||||
- 支持基于配置文件的功能扩展,包含`自定义公式标识符`、`开启标题分级功能`、`自定义本地模型目录`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1)
|
||||
- 新特性:
|
||||
- `pipeline`后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html)
|
||||
- `pipeline`后端增加对竖排文本的有限支持
|
||||
- 2025/10/24 2.6.0 发布
|
||||
- `pipline`后端优化
|
||||
- 增加对中文公式的实验性支持,可通过配置环境变量`export MINERU_FORMULA_CH_SUPPORT=1`开启。该功能可能会导致MFR速率略微下降、部分长公式识别失败等问题,建议仅在需要解析中文公式的场景下开启。如需关闭该功能,可将环境变量设置为`0`。
|
||||
- `OCR`速度大幅提升200%~300%,感谢 @cjsdurj 提供的优化方案
|
||||
- `OCR`模型更新西里尔文(cyrillic)、阿拉伯文(arabic)、天城文(devanagari)、泰卢固语(te)、泰米尔语(ta)语系至`ppocr-v5`版本,精度相比上代模型提升40%以上
|
||||
- `vlm`后端优化
|
||||
- `table_caption`、`table_footnote`匹配逻辑优化,提升页内多张连续表场景下的表格标题和脚注的匹配准确率和阅读顺序合理性
|
||||
- 优化使用`vllm`后端时高并发时的cpu资源占用,降低服务端压力
|
||||
- 适配`vllm`0.11.0版本
|
||||
- 通用优化
|
||||
- 跨页表格合并效果优化,新增跨页续表合并支持,提升在多列合并场景下的表格合并效果
|
||||
- 为表格合并功能增加环境变量配置选项`MINERU_TABLE_MERGE_ENABLE`,表格合并功能默认开启,可通过设置该变量为`0`来关闭表格合并功能
|
||||
|
||||
- 2025/09/26 2.5.4 发布
|
||||
- 🎉🎉 MinerU2.5[技术报告](https://arxiv.org/abs/2509.22186)现已发布,欢迎阅读全面了解其模型架构、训练策略、数据工程和评测结果。
|
||||
- 修复部分`pdf`文件被识别成`ai`文件导致无法解析的问题
|
||||
|
||||
- 2025/09/20 2.5.3 发布
|
||||
- 依赖版本范围调整,使得Turing及更早架构显卡可以使用vLLM加速推理MinerU2.5模型。
|
||||
- `pipeline`后端对torch 2.8.0的一些兼容性修复。
|
||||
- 降低vLLM异步后端默认的并发数,降低服务端压力以避免高压导致的链接关闭问题。
|
||||
- 更多兼容性相关内容详见[公告](https://github.com/opendatalab/MinerU/discussions/3547)
|
||||
|
||||
- 2025/09/19 2.5.2 发布
|
||||
我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数,MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型,并显著领先于主流文档解析专用模型(如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。
|
||||
模型已发布至[HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B)和[ModelScope](https://modelscope.cn/models/opendatalab/MinerU2.5-2509-1.2B)平台,欢迎大家下载使用!
|
||||
- 核心亮点
|
||||
- 极致能效,性能SOTA: 以 1.2B 的轻量化规模,实现了超越百亿乃至千亿级模型的SOTA性能,重新定义了文档解析的能效比。
|
||||
- 先进架构,全面领先: 通过 “两阶段推理” (解耦布局分析与内容识别) 与 原生高分辨率架构 的结合,在布局分析、文本识别、公式识别、表格识别及阅读顺序五大方面均达到 SOTA 水平。
|
||||
- 关键能力提升
|
||||
- 布局检测: 结果更完整,精准覆盖页眉、页脚、页码等非正文内容;同时提供更精准的元素定位与更自然的格式还原(如列表、参考文献)。
|
||||
- 表格解析: 大幅优化了对旋转表格、无线/少线表、以及长难表格的解析能力。
|
||||
- 公式识别: 显著提升中英混合公式及复杂长公式的识别准确率,大幅改善数学类文档解析能力。
|
||||
|
||||
此外,伴随vlm 2.5的发布,我们对仓库做出一些调整:
|
||||
- vlm后端升级至2.5版本,支持MinerU2.5模型,不再兼容MinerU2.0-2505-0.9B模型,最后一个支持2.0模型的版本为mineru-2.2.2。
|
||||
- vlm推理相关代码已移至[mineru_vl_utils](https://github.com/opendatalab/mineru-vl-utils),降低与mineru主仓库的耦合度,便于后续独立迭代。
|
||||
- vlm加速推理框架从`sglang`切换至`vllm`,并实现对vllm生态的完全兼容,使得用户可以在任何支持vllm框架的平台上使用MinerU2.5模型并加速推理。
|
||||
- 由于vlm模型的重大升级,支持更多layout type,因此我们对解析的中间文件`middle.json`和结果文件`content_list.json`的结构做出一些调整,请参考[文档](https://opendatalab.github.io/MinerU/zh/reference/output_files/)了解详情。
|
||||
|
||||
其他仓库优化:
|
||||
- 移除对输入文件的后缀名白名单校验,当输入文件为PDF文档或图片时,对文件的后缀名不再有要求,提升易用性。
|
||||
|
||||
<details>
|
||||
<summary>历史日志</summary>
|
||||
|
||||
<details>
|
||||
<summary>2025/09/10 2.2.2 发布</summary>
|
||||
<ul>
|
||||
<li>修复新的表格识别模型在部分表格解析失败时影响整体解析任务的问题</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/09/08 2.2.1 发布</summary>
|
||||
<ul>
|
||||
<li>修复使用模型下载命令时,部分新增模型未下载的问题</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/09/05 2.2.0 发布</summary>
|
||||
<ul>
|
||||
<li>
|
||||
主要更新
|
||||
<ul>
|
||||
<li>在这个版本我们重点提升了表格的解析精度,通过引入新的<a href="https://github.com/RapidAI/TableStructureRec">有线表识别模型</a>和全新的混合表格结构解析算法,显著提升了<code>pipeline</code>后端的表格识别能力。</li>
|
||||
<li>另外我们增加了对跨页表格合并的支持,这一功能同时支持<code>pipeline</code>和<code>vlm</code>后端,进一步提升了表格解析的完整性和准确性。</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>
|
||||
其他更新
|
||||
<ul>
|
||||
<li><code>pipeline</code>后端增加270度旋转的表格解析能力,现已支持0/90/270度三个方向的表格解析</li>
|
||||
<li><code>pipeline</code>增加对泰文、希腊文的ocr能力支持,并更新了英文ocr模型至最新,英文识别精度提升11%,泰文识别模型精度 82.68%,希腊文识别模型精度 89.28%(by PPOCRv5)</li>
|
||||
<li>在输出的<code>content_list.json</code>中增加了<code>bbox</code>字段(映射至0-1000范围内),方便用户直接获取每个内容块的位置信息</li>
|
||||
<li>移除<code>pipeline_old_linux</code>安装可选项,不再支持老版本的Linux系统如<code>Centos 7</code>等,以便对<code>uv</code>的<code>sync</code>/<code>run</code>等命令进行更好的支持</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/08/01 2.1.10 发布</summary>
|
||||
<ul>
|
||||
<li>修复<code>pipeline</code>后端因block覆盖导致的解析结果与预期不符 #3232</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/30 2.1.9 发布</summary>
|
||||
<ul>
|
||||
<li><code>transformers</code> 4.54.1 版本适配</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/28 2.1.8 发布</summary>
|
||||
<ul>
|
||||
<li><code>sglang</code> 0.4.9.post5 版本适配</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/27 2.1.7 发布</summary>
|
||||
<ul>
|
||||
<li><code>transformers</code> 4.54.0 版本适配</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/26 2.1.6 发布</summary>
|
||||
<ul>
|
||||
<li>修复<code>vlm</code>后端解析部分手写文档时的表格异常问题</li>
|
||||
<li>修复文档旋转时可视化框位置漂移问题 #3175</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/24 2.1.5 发布</summary>
|
||||
<ul>
|
||||
<li><code>sglang</code> 0.4.9 版本适配,同步升级dockerfile基础镜像为sglang 0.4.9.post3</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/23 2.1.4 发布</summary>
|
||||
<ul>
|
||||
<li><strong>bug修复</strong>
|
||||
<ul>
|
||||
<li>修复<code>pipeline</code>后端中<code>MFR</code>步骤在某些情况下显存消耗过大的问题 #2771</li>
|
||||
<li>修复某些情况下<code>image</code>/<code>table</code>与<code>caption</code>/<code>footnote</code>匹配不准确的问题 #3129</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/16 2.1.1 发布</summary>
|
||||
<ul>
|
||||
<li><strong>bug修复</strong>
|
||||
<ul>
|
||||
<li>修复<code>pipeline</code>在某些情况可能发生的文本块内容丢失问题 #3005</li>
|
||||
<li>修复<code>sglang-client</code>需要安装<code>torch</code>等不必要的包的问题 #2968</li>
|
||||
<li>更新<code>dockerfile</code>以修复linux字体缺失导致的解析文本内容不完整问题 #2915</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><strong>易用性更新</strong>
|
||||
<ul>
|
||||
<li>更新<code>compose.yaml</code>,便于用户直接启动<code>sglang-server</code>、<code>mineru-api</code>、<code>mineru-gradio</code>服务</li>
|
||||
<li>启用全新的<a href="https://opendatalab.github.io/MinerU/zh/">在线文档站点</a>,简化readme,提供更好的文档体验</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/07/05 2.1.0 发布</summary>
|
||||
<p>这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下:</p>
|
||||
<ul>
|
||||
<li><strong>性能优化:</strong>
|
||||
<ul>
|
||||
<li>大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度</li>
|
||||
<li>大幅提升<code>pipeline</code>后端批量处理大量页数较少(<10)文档时的后处理速度</li>
|
||||
<li><code>pipeline</code>后端的layout分析速度提升约20%</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><strong>体验优化:</strong>
|
||||
<ul>
|
||||
<li>内置开箱即用的<code>fastapi服务</code>和<code>gradio webui</code>,详细使用方法请参考<a href="https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver">文档</a></li>
|
||||
<li><code>sglang</code>适配<code>0.4.8</code>版本,大幅降低<code>vlm-sglang</code>后端的显存要求,最低可在<code>8G显存</code>(Turing及以后架构)的显卡上运行</li>
|
||||
<li>对所有命令增加<code>sglang</code>的参数透传,使得<code>sglang-engine</code>后端可以与<code>sglang-server</code>一致,接收<code>sglang</code>的所有参数</li>
|
||||
<li>支持基于配置文件的功能扩展,包含<code>自定义公式标识符</code>、<code>开启标题分级功能</code>、<code>自定义本地模型目录</code>,详细使用方法请参考<a href="https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1">文档</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><strong>新特性:</strong>
|
||||
<ul>
|
||||
<li><code>pipeline</code>后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。<a href="https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html">详情</a></li>
|
||||
<li><code>pipeline</code>后端增加对竖排文本的有限支持</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2025/06/20 2.0.6发布</summary>
|
||||
<ul>
|
||||
@@ -467,7 +610,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
|
||||
<td>解析后端</td>
|
||||
<td>pipeline</td>
|
||||
<td>vlm-transformers</td>
|
||||
<td>vlm-sglang</td>
|
||||
<td>vlm-vllm</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>操作系统</td>
|
||||
@@ -516,8 +659,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> `mineru[core]`包含除`sglang`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
|
||||
> 如果您有使用`sglang`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
|
||||
> `mineru[core]`包含除`vLLM`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
|
||||
> 如果您有使用`vLLM`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](https://opendatalab.github.io/MinerU/zh/quick_start/extension_modules/)。
|
||||
|
||||
---
|
||||
|
||||
@@ -545,8 +688,8 @@ mineru -p <input_path> -o <output_path>
|
||||
- [x] 手写文本识别
|
||||
- [x] 竖排文本识别
|
||||
- [x] 拉丁字母重音符号识别
|
||||
- [ ] 正文中代码块识别
|
||||
- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
|
||||
- [x] 正文中代码块识别
|
||||
- [x] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)(https://mineru.net)
|
||||
- [ ] 图表内容识别
|
||||
|
||||
# Known Issues
|
||||
@@ -564,7 +707,7 @@ mineru -p <input_path> -o <output_path>
|
||||
|
||||
- 如果您在使用过程中遇到问题,可以先查看[常见问题](https://opendatalab.github.io/MinerU/zh/faq/)是否有解答。
|
||||
- 如果未能解决您的问题,您也可以使用[DeepWiki](https://deepwiki.com/opendatalab/MinerU)与AI助手交流,这可以解决大部分常见问题。
|
||||
- 如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)或[WeChat](http://mineru.space/s/V85Yl)加入社区,与其他用户和开发者交流。
|
||||
- 如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)或[WeChat](https://mineru.net/community-portal/?aliasId=3c430f94)加入社区,与其他用户和开发者交流。
|
||||
|
||||
# All Thanks To Our Contributors
|
||||
|
||||
@@ -584,6 +727,7 @@ mineru -p <input_path> -o <output_path>
|
||||
- [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
|
||||
- [UniMERNet](https://github.com/opendatalab/UniMERNet)
|
||||
- [RapidTable](https://github.com/RapidAI/RapidTable)
|
||||
- [TableStructureRec](https://github.com/RapidAI/TableStructureRec)
|
||||
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
|
||||
- [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch)
|
||||
- [layoutreader](https://github.com/ppaanngggg/layoutreader)
|
||||
@@ -593,10 +737,21 @@ mineru -p <input_path> -o <output_path>
|
||||
- [pdftext](https://github.com/datalab-to/pdftext)
|
||||
- [pdfminer.six](https://github.com/pdfminer/pdfminer.six)
|
||||
- [pypdf](https://github.com/py-pdf/pypdf)
|
||||
- [magika](https://github.com/google/magika)
|
||||
|
||||
# Citation
|
||||
|
||||
```bibtex
|
||||
@misc{niu2025mineru25decoupledvisionlanguagemodel,
|
||||
title={MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing},
|
||||
author={Junbo Niu and Zheng Liu and Zhuangcheng Gu and Bin Wang and Linke Ouyang and Zhiyuan Zhao and Tao Chu and Tianyao He and Fan Wu and Qintong Zhang and Zhenjiang Jin and Guang Liang and Rui Zhang and Wenzheng Zhang and Yuan Qu and Zhifei Ren and Yuefeng Sun and Yuanhong Zheng and Dongsheng Ma and Zirui Tang and Boyu Niu and Ziyang Miao and Hejun Dong and Siyi Qian and Junyuan Zhang and Jingzhou Chen and Fangdong Wang and Xiaomeng Zhao and Liqun Wei and Wei Li and Shasha Wang and Ruiliang Xu and Yuanyuan Cao and Lu Chen and Qianqian Wu and Huaiyu Gu and Lindong Lu and Keming Wang and Dechen Lin and Guanlin Shen and Xuanhe Zhou and Linfeng Zhang and Yuhang Zang and Xiaoyi Dong and Jiaqi Wang and Bo Zhang and Lei Bai and Pei Chu and Weijia Li and Jiang Wu and Lijun Wu and Zhenxiang Li and Guangyu Wang and Zhongying Tu and Chao Xu and Kai Chen and Yu Qiao and Bowen Zhou and Dahua Lin and Wentao Zhang and Conghui He},
|
||||
year={2025},
|
||||
eprint={2509.22186},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2509.22186},
|
||||
}
|
||||
|
||||
@misc{wang2024mineruopensourcesolutionprecise,
|
||||
title={MinerU: An Open-Source Solution for Precise Document Content Extraction},
|
||||
author={Bin Wang and Chao Xu and Xiaomeng Zhao and Linke Ouyang and Fan Wu and Zhiyuan Zhao and Rui Xu and Kaiwen Liu and Yuan Qu and Fukai Shang and Bo Zhang and Liqun Wei and Zhihao Sui and Wei Li and Botian Shi and Yu Qiao and Dahua Lin and Conghui He},
|
||||
@@ -634,4 +789,5 @@ mineru -p <input_path> -o <output_path>
|
||||
- [PDF-Extract-Kit (A Comprehensive Toolkit for High-Quality PDF Content Extraction)](https://github.com/opendatalab/PDF-Extract-Kit)
|
||||
- [OmniDocBench (A Comprehensive Benchmark for Document Parsing and Evaluation)](https://github.com/opendatalab/OmniDocBench)
|
||||
- [Magic-HTML (Mixed web page extraction tool)](https://github.com/opendatalab/magic-html)
|
||||
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
|
||||
- [Magic-Doc (Fast speed ppt/pptx/doc/docx/pdf extraction tool)](https://github.com/InternLM/magic-doc)
|
||||
- [Dingo: A Comprehensive AI Data Quality Evaluation Tool](https://github.com/MigoXLab/dingo)
|
||||
|
||||
166
demo/demo.py
@@ -15,7 +15,7 @@ from mineru.backend.pipeline.pipeline_analyze import doc_analyze as pipeline_doc
|
||||
from mineru.backend.pipeline.pipeline_middle_json_mkcontent import union_make as pipeline_union_make
|
||||
from mineru.backend.pipeline.model_json_to_middle_json import result_to_middle_json as pipeline_result_to_middle_json
|
||||
from mineru.backend.vlm.vlm_middle_json_mkcontent import union_make as vlm_union_make
|
||||
from mineru.utils.models_download_utils import auto_download_and_get_model_root_path
|
||||
from mineru.utils.guess_suffix_or_lang import guess_suffix_by_path
|
||||
|
||||
|
||||
def do_parse(
|
||||
@@ -27,7 +27,7 @@ def do_parse(
|
||||
parse_method="auto", # The method for parsing PDF, default is 'auto'
|
||||
formula_enable=True, # Enable formula parsing
|
||||
table_enable=True, # Enable table parsing
|
||||
server_url=None, # Server URL for vlm-sglang-client backend
|
||||
server_url=None, # Server URL for vlm-http-client backend
|
||||
f_draw_layout_bbox=True, # Whether to draw layout bounding boxes
|
||||
f_draw_span_bbox=True, # Whether to draw span bounding boxes
|
||||
f_dump_md=True, # Whether to dump markdown files
|
||||
@@ -62,47 +62,12 @@ def do_parse(
|
||||
pdf_info = middle_json["pdf_info"]
|
||||
|
||||
pdf_bytes = pdf_bytes_list[idx]
|
||||
if f_draw_layout_bbox:
|
||||
draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
|
||||
|
||||
if f_draw_span_bbox:
|
||||
draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
|
||||
|
||||
if f_dump_orig_pdf:
|
||||
md_writer.write(
|
||||
f"{pdf_file_name}_origin.pdf",
|
||||
pdf_bytes,
|
||||
)
|
||||
|
||||
if f_dump_md:
|
||||
image_dir = str(os.path.basename(local_image_dir))
|
||||
md_content_str = pipeline_union_make(pdf_info, f_make_md_mode, image_dir)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}.md",
|
||||
md_content_str,
|
||||
)
|
||||
|
||||
if f_dump_content_list:
|
||||
image_dir = str(os.path.basename(local_image_dir))
|
||||
content_list = pipeline_union_make(pdf_info, MakeMode.CONTENT_LIST, image_dir)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_content_list.json",
|
||||
json.dumps(content_list, ensure_ascii=False, indent=4),
|
||||
)
|
||||
|
||||
if f_dump_middle_json:
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_middle.json",
|
||||
json.dumps(middle_json, ensure_ascii=False, indent=4),
|
||||
)
|
||||
|
||||
if f_dump_model_output:
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_model.json",
|
||||
json.dumps(model_json, ensure_ascii=False, indent=4),
|
||||
)
|
||||
|
||||
logger.info(f"local output dir is {local_md_dir}")
|
||||
_process_output(
|
||||
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
|
||||
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
|
||||
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
|
||||
f_make_md_mode, middle_json, model_json, is_pipeline=True
|
||||
)
|
||||
else:
|
||||
if backend.startswith("vlm-"):
|
||||
backend = backend[4:]
|
||||
@@ -118,48 +83,77 @@ def do_parse(
|
||||
|
||||
pdf_info = middle_json["pdf_info"]
|
||||
|
||||
if f_draw_layout_bbox:
|
||||
draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
|
||||
_process_output(
|
||||
pdf_info, pdf_bytes, pdf_file_name, local_md_dir, local_image_dir,
|
||||
md_writer, f_draw_layout_bbox, f_draw_span_bbox, f_dump_orig_pdf,
|
||||
f_dump_md, f_dump_content_list, f_dump_middle_json, f_dump_model_output,
|
||||
f_make_md_mode, middle_json, infer_result, is_pipeline=False
|
||||
)
|
||||
|
||||
if f_draw_span_bbox:
|
||||
draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
|
||||
|
||||
if f_dump_orig_pdf:
|
||||
md_writer.write(
|
||||
f"{pdf_file_name}_origin.pdf",
|
||||
pdf_bytes,
|
||||
)
|
||||
def _process_output(
|
||||
pdf_info,
|
||||
pdf_bytes,
|
||||
pdf_file_name,
|
||||
local_md_dir,
|
||||
local_image_dir,
|
||||
md_writer,
|
||||
f_draw_layout_bbox,
|
||||
f_draw_span_bbox,
|
||||
f_dump_orig_pdf,
|
||||
f_dump_md,
|
||||
f_dump_content_list,
|
||||
f_dump_middle_json,
|
||||
f_dump_model_output,
|
||||
f_make_md_mode,
|
||||
middle_json,
|
||||
model_output=None,
|
||||
is_pipeline=True
|
||||
):
|
||||
"""处理输出文件"""
|
||||
if f_draw_layout_bbox:
|
||||
draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
|
||||
|
||||
if f_dump_md:
|
||||
image_dir = str(os.path.basename(local_image_dir))
|
||||
md_content_str = vlm_union_make(pdf_info, f_make_md_mode, image_dir)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}.md",
|
||||
md_content_str,
|
||||
)
|
||||
if f_draw_span_bbox:
|
||||
draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")
|
||||
|
||||
if f_dump_content_list:
|
||||
image_dir = str(os.path.basename(local_image_dir))
|
||||
content_list = vlm_union_make(pdf_info, MakeMode.CONTENT_LIST, image_dir)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_content_list.json",
|
||||
json.dumps(content_list, ensure_ascii=False, indent=4),
|
||||
)
|
||||
if f_dump_orig_pdf:
|
||||
md_writer.write(
|
||||
f"{pdf_file_name}_origin.pdf",
|
||||
pdf_bytes,
|
||||
)
|
||||
|
||||
if f_dump_middle_json:
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_middle.json",
|
||||
json.dumps(middle_json, ensure_ascii=False, indent=4),
|
||||
)
|
||||
image_dir = str(os.path.basename(local_image_dir))
|
||||
|
||||
if f_dump_model_output:
|
||||
model_output = ("\n" + "-" * 50 + "\n").join(infer_result)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_model_output.txt",
|
||||
model_output,
|
||||
)
|
||||
if f_dump_md:
|
||||
make_func = pipeline_union_make if is_pipeline else vlm_union_make
|
||||
md_content_str = make_func(pdf_info, f_make_md_mode, image_dir)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}.md",
|
||||
md_content_str,
|
||||
)
|
||||
|
||||
logger.info(f"local output dir is {local_md_dir}")
|
||||
if f_dump_content_list:
|
||||
make_func = pipeline_union_make if is_pipeline else vlm_union_make
|
||||
content_list = make_func(pdf_info, MakeMode.CONTENT_LIST, image_dir)
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_content_list.json",
|
||||
json.dumps(content_list, ensure_ascii=False, indent=4),
|
||||
)
|
||||
|
||||
if f_dump_middle_json:
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_middle.json",
|
||||
json.dumps(middle_json, ensure_ascii=False, indent=4),
|
||||
)
|
||||
|
||||
if f_dump_model_output:
|
||||
md_writer.write_string(
|
||||
f"{pdf_file_name}_model.json",
|
||||
json.dumps(model_output, ensure_ascii=False, indent=4),
|
||||
)
|
||||
|
||||
logger.info(f"local output dir is {local_md_dir}")
|
||||
|
||||
|
||||
def parse_doc(
|
||||
@@ -182,8 +176,8 @@ def parse_doc(
|
||||
backend: the backend for parsing pdf:
|
||||
pipeline: More general.
|
||||
vlm-transformers: More general.
|
||||
vlm-sglang-engine: Faster(engine).
|
||||
vlm-sglang-client: Faster(client).
|
||||
vlm-vllm-engine: Faster(engine).
|
||||
vlm-http-client: Faster(client).
|
||||
without method specified, pipeline will be used by default.
|
||||
method: the method for parsing pdf:
|
||||
auto: Automatically determine the method based on the file type.
|
||||
@@ -191,7 +185,7 @@ def parse_doc(
|
||||
ocr: Use OCR method for image-based PDFs.
|
||||
Without method specified, 'auto' will be used by default.
|
||||
Adapted only for the case where the backend is set to "pipeline".
|
||||
server_url: When the backend is `sglang-client`, you need to specify the server_url, for example:`http://127.0.0.1:30000`
|
||||
server_url: When the backend is `http-client`, you need to specify the server_url, for example:`http://127.0.0.1:30000`
|
||||
start_page_id: Start page ID for parsing, default is 0
|
||||
end_page_id: End page ID for parsing, default is None (parse all pages until the end of the document)
|
||||
"""
|
||||
@@ -225,12 +219,12 @@ if __name__ == '__main__':
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
pdf_files_dir = os.path.join(__dir__, "pdfs")
|
||||
output_dir = os.path.join(__dir__, "output")
|
||||
pdf_suffixes = [".pdf"]
|
||||
image_suffixes = [".png", ".jpeg", ".jpg"]
|
||||
pdf_suffixes = ["pdf"]
|
||||
image_suffixes = ["png", "jpeg", "jp2", "webp", "gif", "bmp", "jpg"]
|
||||
|
||||
doc_path_list = []
|
||||
for doc_path in Path(pdf_files_dir).glob('*'):
|
||||
if doc_path.suffix in pdf_suffixes + image_suffixes:
|
||||
if guess_suffix_by_path(doc_path) in pdf_suffixes + image_suffixes:
|
||||
doc_path_list.append(doc_path)
|
||||
|
||||
"""如果您由于网络问题无法下载模型,可以设置环境变量MINERU_MODEL_SOURCE为modelscope使用免代理仓库下载模型"""
|
||||
@@ -241,5 +235,5 @@ if __name__ == '__main__':
|
||||
|
||||
"""To enable VLM mode, change the backend to 'vlm-xxx'"""
|
||||
# parse_doc(doc_path_list, output_dir, backend="vlm-transformers") # more general.
|
||||
# parse_doc(doc_path_list, output_dir, backend="vlm-sglang-engine") # faster(engine).
|
||||
# parse_doc(doc_path_list, output_dir, backend="vlm-sglang-client", server_url="http://127.0.0.1:30000") # faster(client).
|
||||
# parse_doc(doc_path_list, output_dir, backend="vlm-vllm-engine") # faster(engine).
|
||||
# parse_doc(doc_path_list, output_dir, backend="vlm-http-client", server_url="http://127.0.0.1:30000") # faster(client).
|
||||
@@ -1,12 +1,15 @@
|
||||
# Use DaoCloud mirrored sglang image for China region
|
||||
FROM docker.m.daocloud.io/lmsysorg/sglang:v0.4.9.post6-cu126
|
||||
# For blackwell GPU, use the following line instead:
|
||||
# FROM docker.m.daocloud.io/lmsysorg/sglang:v0.4.9.post6-cu128-b200
|
||||
# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
|
||||
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
|
||||
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use the official sglang image
|
||||
# FROM lmsysorg/sglang:v0.4.9.post6-cu126
|
||||
# For blackwell GPU, use the following line instead:
|
||||
# FROM lmsysorg/sglang:v0.4.9.post6-cu128-b200
|
||||
# Use the official vllm image
|
||||
# FROM vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
|
||||
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Use the official vllm image
|
||||
# FROM vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
|
||||
@@ -1,21 +1,19 @@
|
||||
services:
|
||||
mineru-sglang-server:
|
||||
image: mineru-sglang:latest
|
||||
container_name: mineru-sglang-server
|
||||
mineru-vllm-server:
|
||||
image: mineru-vllm:latest
|
||||
container_name: mineru-vllm-server
|
||||
restart: always
|
||||
profiles: ["sglang-server"]
|
||||
profiles: ["vllm-server"]
|
||||
ports:
|
||||
- 30000:30000
|
||||
environment:
|
||||
MINERU_MODEL_SOURCE: local
|
||||
entrypoint: mineru-sglang-server
|
||||
entrypoint: mineru-vllm-server
|
||||
command:
|
||||
--host 0.0.0.0
|
||||
--port 30000
|
||||
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
|
||||
# --dp-size 2 # If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode
|
||||
# --tp-size 2 # If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode.
|
||||
# --mem-fraction-static 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
|
||||
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
|
||||
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
|
||||
ulimits:
|
||||
memlock: -1
|
||||
stack: 67108864
|
||||
@@ -31,7 +29,7 @@ services:
|
||||
capabilities: [gpu]
|
||||
|
||||
mineru-api:
|
||||
image: mineru-sglang:latest
|
||||
image: mineru-vllm:latest
|
||||
container_name: mineru-api
|
||||
restart: always
|
||||
profiles: ["api"]
|
||||
@@ -43,11 +41,9 @@ services:
|
||||
command:
|
||||
--host 0.0.0.0
|
||||
--port 8000
|
||||
# parameters for sglang-engine
|
||||
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
|
||||
# --dp-size 2 # If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode
|
||||
# --tp-size 2 # If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode.
|
||||
# --mem-fraction-static 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
|
||||
# parameters for vllm-engine
|
||||
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
|
||||
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
|
||||
ulimits:
|
||||
memlock: -1
|
||||
stack: 67108864
|
||||
@@ -61,7 +57,7 @@ services:
|
||||
capabilities: [ gpu ]
|
||||
|
||||
mineru-gradio:
|
||||
image: mineru-sglang:latest
|
||||
image: mineru-vllm:latest
|
||||
container_name: mineru-gradio
|
||||
restart: always
|
||||
profiles: ["gradio"]
|
||||
@@ -73,14 +69,12 @@ services:
|
||||
command:
|
||||
--server-name 0.0.0.0
|
||||
--server-port 7860
|
||||
--enable-sglang-engine true # Enable the sglang engine for Gradio
|
||||
--enable-vllm-engine true # Enable the vllm engine for Gradio
|
||||
# --enable-api false # If you want to disable the API, set this to false
|
||||
# --max-convert-pages 20 # If you want to limit the number of pages for conversion, set this to a specific number
|
||||
# parameters for sglang-engine
|
||||
# --enable-torch-compile # You can also enable torch.compile to accelerate inference speed by approximately 15%
|
||||
# --dp-size 2 # If using multiple GPUs, increase throughput using sglang's multi-GPU parallel mode
|
||||
# --tp-size 2 # If you have more than one GPU, you can expand available VRAM using tensor parallelism (TP) mode.
|
||||
# --mem-fraction-static 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
|
||||
# parameters for vllm-engine
|
||||
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
|
||||
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
|
||||
ulimits:
|
||||
memlock: -1
|
||||
stack: 67108864
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
# Use the official sglang image
|
||||
FROM lmsysorg/sglang:v0.4.9.post6-cu126
|
||||
# For blackwell GPU, use the following line instead:
|
||||
# FROM lmsysorg/sglang:v0.4.9.post6-cu128-b200
|
||||
# Use the official vllm image for gpu with Ampere architecture and above (Compute Capability>=8.0)
|
||||
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
|
||||
FROM vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use the official vllm image for gpu with Turing architecture and below (Compute Capability<8.0)
|
||||
# FROM vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
|
||||
BIN
docs/assets/images/BISHENG_01.png
Normal file
|
After Width: | Height: | Size: 96 KiB |
BIN
docs/assets/images/Cherry_Studio_1.png
Normal file
|
After Width: | Height: | Size: 34 KiB |
BIN
docs/assets/images/Cherry_Studio_2.png
Normal file
|
After Width: | Height: | Size: 51 KiB |
BIN
docs/assets/images/Cherry_Studio_3.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/Cherry_Studio_4.png
Normal file
|
After Width: | Height: | Size: 55 KiB |
BIN
docs/assets/images/Cherry_Studio_5.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
docs/assets/images/Cherry_Studio_6.png
Normal file
|
After Width: | Height: | Size: 75 KiB |
BIN
docs/assets/images/Cherry_Studio_7.png
Normal file
|
After Width: | Height: | Size: 56 KiB |
BIN
docs/assets/images/Cherry_Studio_8.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
docs/assets/images/Coze_1.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
docs/assets/images/Coze_10.png
Normal file
|
After Width: | Height: | Size: 88 KiB |
BIN
docs/assets/images/Coze_11.png
Normal file
|
After Width: | Height: | Size: 76 KiB |
BIN
docs/assets/images/Coze_12.png
Normal file
|
After Width: | Height: | Size: 110 KiB |
BIN
docs/assets/images/Coze_13.png
Normal file
|
After Width: | Height: | Size: 79 KiB |
BIN
docs/assets/images/Coze_14.png
Normal file
|
After Width: | Height: | Size: 104 KiB |
BIN
docs/assets/images/Coze_15.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/Coze_16.png
Normal file
|
After Width: | Height: | Size: 87 KiB |
BIN
docs/assets/images/Coze_17.png
Normal file
|
After Width: | Height: | Size: 201 KiB |
BIN
docs/assets/images/Coze_18.png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/assets/images/Coze_19.png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/assets/images/Coze_2.png
Normal file
|
After Width: | Height: | Size: 53 KiB |
BIN
docs/assets/images/Coze_20.png
Normal file
|
After Width: | Height: | Size: 145 KiB |
BIN
docs/assets/images/Coze_21.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/assets/images/Coze_3.png
Normal file
|
After Width: | Height: | Size: 95 KiB |
BIN
docs/assets/images/Coze_4.png
Normal file
|
After Width: | Height: | Size: 110 KiB |
BIN
docs/assets/images/Coze_5.png
Normal file
|
After Width: | Height: | Size: 102 KiB |
BIN
docs/assets/images/Coze_6.png
Normal file
|
After Width: | Height: | Size: 101 KiB |
BIN
docs/assets/images/Coze_7.png
Normal file
|
After Width: | Height: | Size: 214 KiB |
BIN
docs/assets/images/Coze_8.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
docs/assets/images/Coze_9.png
Normal file
|
After Width: | Height: | Size: 83 KiB |
BIN
docs/assets/images/DataFLow_01.png
Normal file
|
After Width: | Height: | Size: 89 KiB |
BIN
docs/assets/images/DataFlow_02.png
Normal file
|
After Width: | Height: | Size: 147 KiB |
BIN
docs/assets/images/Dify_1.png
Normal file
|
After Width: | Height: | Size: 108 KiB |
BIN
docs/assets/images/Dify_10.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
docs/assets/images/Dify_11.png
Normal file
|
After Width: | Height: | Size: 85 KiB |
BIN
docs/assets/images/Dify_12.png
Normal file
|
After Width: | Height: | Size: 129 KiB |
BIN
docs/assets/images/Dify_13.png
Normal file
|
After Width: | Height: | Size: 35 KiB |
BIN
docs/assets/images/Dify_14.png
Normal file
|
After Width: | Height: | Size: 249 KiB |
BIN
docs/assets/images/Dify_15.png
Normal file
|
After Width: | Height: | Size: 255 KiB |
BIN
docs/assets/images/Dify_16.png
Normal file
|
After Width: | Height: | Size: 107 KiB |
BIN
docs/assets/images/Dify_17.png
Normal file
|
After Width: | Height: | Size: 125 KiB |
BIN
docs/assets/images/Dify_18.png
Normal file
|
After Width: | Height: | Size: 180 KiB |
BIN
docs/assets/images/Dify_19.png
Normal file
|
After Width: | Height: | Size: 105 KiB |
BIN
docs/assets/images/Dify_2.png
Normal file
|
After Width: | Height: | Size: 236 KiB |
BIN
docs/assets/images/Dify_20.png
Normal file
|
After Width: | Height: | Size: 177 KiB |
BIN
docs/assets/images/Dify_21.png
Normal file
|
After Width: | Height: | Size: 77 KiB |
BIN
docs/assets/images/Dify_22.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
BIN
docs/assets/images/Dify_23.png
Normal file
|
After Width: | Height: | Size: 94 KiB |
BIN
docs/assets/images/Dify_24.png
Normal file
|
After Width: | Height: | Size: 133 KiB |
BIN
docs/assets/images/Dify_25.png
Normal file
|
After Width: | Height: | Size: 161 KiB |
BIN
docs/assets/images/Dify_26.png
Normal file
|
After Width: | Height: | Size: 190 KiB |
BIN
docs/assets/images/Dify_3.png
Normal file
|
After Width: | Height: | Size: 263 KiB |
BIN
docs/assets/images/Dify_4.png
Normal file
|
After Width: | Height: | Size: 264 KiB |
BIN
docs/assets/images/Dify_5.png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/assets/images/Dify_6.png
Normal file
|
After Width: | Height: | Size: 286 KiB |
BIN
docs/assets/images/Dify_7.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
docs/assets/images/Dify_8.png
Normal file
|
After Width: | Height: | Size: 136 KiB |
BIN
docs/assets/images/Dify_9.png
Normal file
|
After Width: | Height: | Size: 110 KiB |
BIN
docs/assets/images/DingTalk_01.png
Normal file
|
After Width: | Height: | Size: 133 KiB |
BIN
docs/assets/images/FastGPT_01.png
Normal file
|
After Width: | Height: | Size: 185 KiB |
BIN
docs/assets/images/FastGPT_02.png
Normal file
|
After Width: | Height: | Size: 92 KiB |
BIN
docs/assets/images/ModelWhale_01.png
Normal file
|
After Width: | Height: | Size: 246 KiB |
BIN
docs/assets/images/ModelWhale_02.png
Normal file
|
After Width: | Height: | Size: 71 KiB |
BIN
docs/assets/images/ModelWhale_1.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/RagFlow_01.png
Normal file
|
After Width: | Height: | Size: 500 KiB |
BIN
docs/assets/images/Sider_1.png
Normal file
|
After Width: | Height: | Size: 62 KiB |
BIN
docs/assets/images/coze_0.png
Normal file
|
After Width: | Height: | Size: 92 KiB |
BIN
docs/assets/images/n8n_0.png
Normal file
|
After Width: | Height: | Size: 276 KiB |
BIN
docs/assets/images/n8n_1.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
docs/assets/images/n8n_10.png
Normal file
|
After Width: | Height: | Size: 14 KiB |
BIN
docs/assets/images/n8n_2.png
Normal file
|
After Width: | Height: | Size: 74 KiB |
BIN
docs/assets/images/n8n_3.png
Normal file
|
After Width: | Height: | Size: 71 KiB |
BIN
docs/assets/images/n8n_4.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/assets/images/n8n_5.png
Normal file
|
After Width: | Height: | Size: 70 KiB |
BIN
docs/assets/images/n8n_6.png
Normal file
|
After Width: | Height: | Size: 63 KiB |
BIN
docs/assets/images/n8n_7.png
Normal file
|
After Width: | Height: | Size: 23 KiB |
BIN
docs/assets/images/n8n_8.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
BIN
docs/assets/images/n8n_9.png
Normal file
|
After Width: | Height: | Size: 89 KiB |
@@ -2,7 +2,7 @@
|
||||
|
||||
If your question is not listed, try using [DeepWiki](https://deepwiki.com/opendatalab/MinerU)'s AI assistant for common issues.
|
||||
|
||||
For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](http://mineru.space/s/V85Yl) community for support.
|
||||
For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [WeChat](https://mineru.net/community-portal/?aliasId=3c430f94) community for support.
|
||||
|
||||
??? question "Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2"
|
||||
|
||||
@@ -15,18 +15,6 @@ For unresolved problems, join our [Discord](https://discord.gg/Tdedn9GTXq) or [W
|
||||
Reference: [#388](https://github.com/opendatalab/MinerU/issues/388)
|
||||
|
||||
|
||||
??? question "Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`"
|
||||
|
||||
The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
|
||||
```
|
||||
conda create -n mineru python=3.11 -y
|
||||
conda activate mineru
|
||||
pip install -U "mineru[pipeline_old_linux]"
|
||||
```
|
||||
|
||||
Reference: [#1004](https://github.com/opendatalab/MinerU/issues/1004)
|
||||
|
||||
|
||||
??? question "Missing text information in parsing results when installing and using on Linux systems."
|
||||
|
||||
MinerU uses `pypdfium2` instead of `pymupdf` as the PDF page rendering engine in versions >=2.0 to resolve AGPLv3 license issues. On some Linux distributions, due to missing CJK fonts, some text may be lost during the process of rendering PDFs to images.
|
||||
|
||||
@@ -19,7 +19,8 @@
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
<div align="center">
|
||||
@@ -34,7 +35,7 @@
|
||||
<!-- join us -->
|
||||
|
||||
<p align="center">
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="http://mineru.space/s/V85Yl" target="_blank">WeChat</a>
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="https://mineru.net/community-portal/?aliasId=3c430f94" target="_blank">WeChat</a>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
|
||||
@@ -6,25 +6,23 @@ MinerU provides a convenient Docker deployment method, which helps quickly set u
|
||||
|
||||
```bash
|
||||
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
|
||||
docker build -t mineru-sglang:latest -f Dockerfile .
|
||||
docker build -t mineru-vllm:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.9.post6-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
|
||||
> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.9.post6-cu128-b200` before executing the build operation.
|
||||
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default. This version of vLLM v1 engine has limited support for GPU models.
|
||||
> If you cannot use vLLM accelerated inference on Turing and earlier architecture GPUs, you can resolve this issue by changing the base image to `vllm/vllm-openai:v0.10.2`.
|
||||
|
||||
## Docker Description
|
||||
|
||||
MinerU's Docker uses `lmsysorg/sglang` as the base image, so it includes the `sglang` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `sglang` to accelerate VLM model inference.
|
||||
MinerU's Docker uses `vllm/vllm-openai` as the base image, so it includes the `vllm` inference acceleration framework and necessary dependencies by default. Therefore, on compatible devices, you can directly use `vllm` to accelerate VLM model inference.
|
||||
|
||||
> [!NOTE]
|
||||
> Requirements for using `sglang` to accelerate VLM model inference:
|
||||
> Requirements for using `vllm` to accelerate VLM model inference:
|
||||
>
|
||||
> - Device must have Turing architecture or later graphics cards with 8GB+ available VRAM.
|
||||
> - The host machine's graphics driver should support CUDA 12.6 or higher; `Blackwell` platform should support CUDA 12.8 or higher. You can check the driver version using the `nvidia-smi` command.
|
||||
> - The host machine's graphics driver should support CUDA 12.8 or higher; You can check the driver version using the `nvidia-smi` command.
|
||||
> - Docker container must have access to the host machine's graphics devices.
|
||||
>
|
||||
> If your device doesn't meet the above requirements, you can still use other features of MinerU, but cannot use `sglang` to accelerate VLM model inference, meaning you cannot use the `vlm-sglang-engine` backend or start the `vlm-sglang-server` service.
|
||||
|
||||
## Start Docker Container
|
||||
|
||||
@@ -33,12 +31,12 @@ docker run --gpus all \
|
||||
--shm-size 32g \
|
||||
-p 30000:30000 -p 7860:7860 -p 8000:8000 \
|
||||
--ipc=host \
|
||||
-it mineru-sglang:latest \
|
||||
-it mineru-vllm:latest \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
After executing this command, you will enter the Docker container's interactive terminal with some ports mapped for potential services. You can directly run MinerU-related commands within the container to use MinerU's features.
|
||||
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [Start the service via command](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver).
|
||||
You can also directly start MinerU services by replacing `/bin/bash` with service startup commands. For detailed instructions, please refer to the [Start the service via command](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-http-clientserver).
|
||||
|
||||
## Start Services Directly with Docker Compose
|
||||
|
||||
@@ -53,19 +51,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
|
||||
>
|
||||
>- The `compose.yaml` file contains configurations for multiple services of MinerU, you can choose to start specific services as needed.
|
||||
>- Different services might have additional parameter configurations, which you can view and edit in the `compose.yaml` file.
|
||||
>- Due to the pre-allocation of GPU memory by the `sglang` inference acceleration framework, you may not be able to run multiple `sglang` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-sglang-server` service or using the `vlm-sglang-engine` backend.
|
||||
>- Due to the pre-allocation of GPU memory by the `vllm` inference acceleration framework, you may not be able to run multiple `vllm` services simultaneously on the same machine. Therefore, ensure that other services that might use GPU memory have been stopped before starting the `vlm-vllm-server` service or using the `vlm-vllm-engine` backend.
|
||||
|
||||
---
|
||||
|
||||
### Start sglang-server service
|
||||
connect to `sglang-server` via `vlm-sglang-client` backend
|
||||
### Start vllm-server service
|
||||
connect to `vllm-server` via `vlm-http-client` backend
|
||||
```bash
|
||||
docker compose -f compose.yaml --profile sglang-server up -d
|
||||
docker compose -f compose.yaml --profile vllm-server up -d
|
||||
```
|
||||
>[!TIP]
|
||||
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
|
||||
>In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
|
||||
> ```bash
|
||||
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
|
||||
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
|
||||
> ```
|
||||
|
||||
---
|
||||
|
||||
@@ -4,34 +4,26 @@ MinerU supports installing extension modules on demand based on different needs
|
||||
## Common Scenarios
|
||||
|
||||
### Core Functionality Installation
|
||||
The `core` module is the core dependency of MinerU, containing all functional modules except `sglang`. Installing this module ensures the basic functionality of MinerU works properly.
|
||||
The `core` module is the core dependency of MinerU, containing all functional modules except `vllm`. Installing this module ensures the basic functionality of MinerU works properly.
|
||||
```bash
|
||||
uv pip install mineru[core]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Using `sglang` to Accelerate VLM Model Inference
|
||||
The `sglang` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
|
||||
In the configuration, `all` includes both `core` and `sglang` modules, so `mineru[all]` and `mineru[core,sglang]` are equivalent.
|
||||
### Using `vllm` to Accelerate VLM Model Inference
|
||||
The `vllm` module provides acceleration support for VLM model inference, suitable for graphics cards with Turing architecture and later (8GB+ VRAM). Installing this module can significantly improve model inference speed.
|
||||
In the configuration, `all` includes both `core` and `vllm` modules, so `mineru[all]` and `mineru[core,vllm]` are equivalent.
|
||||
```bash
|
||||
uv pip install mineru[all]
|
||||
```
|
||||
> [!TIP]
|
||||
> If exceptions occur during installation of the complete package including sglang, please refer to the [sglang official documentation](https://docs.sglang.ai/start/install.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
|
||||
> If exceptions occur during installation of the complete package including vllm, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) to try to resolve the issue, or directly use the [Docker](./docker_deployment.md) deployment method.
|
||||
|
||||
---
|
||||
|
||||
### Installing Lightweight Client to Connect to sglang-server
|
||||
If you need to install a lightweight client on edge devices to connect to `sglang-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
|
||||
### Installing Lightweight Client to Connect to vllm-server
|
||||
If you need to install a lightweight client on edge devices to connect to `vllm-server`, you can install the basic mineru package, which is very lightweight and suitable for devices with only CPU and network connectivity.
|
||||
```bash
|
||||
uv pip install mineru
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Using Pipeline Backend on Outdated Linux Systems
|
||||
If your system is too outdated to meet the dependency requirements of `mineru[core]`, this option can minimally meet MinerU's runtime requirements, suitable for old systems that cannot be upgraded and only need to use the pipeline backend.
|
||||
```bash
|
||||
uv pip install mineru[pipeline_old_linux]
|
||||
```
|
||||
|
||||
@@ -31,7 +31,7 @@ A WebUI developed based on Gradio, with a simple interface and only core parsing
|
||||
<td>Parsing Backend</td>
|
||||
<td>pipeline</td>
|
||||
<td>vlm-transformers</td>
|
||||
<td>vlm-sglang</td>
|
||||
<td>vlm-vllm</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Operating System</td>
|
||||
@@ -80,8 +80,8 @@ uv pip install -e .[core]
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> `mineru[core]` includes all core features except `sglang` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
|
||||
> If you need to use `sglang` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
|
||||
> `mineru[core]` includes all core features except `vllm` acceleration, compatible with Windows / Linux / macOS systems, suitable for most users.
|
||||
> If you need to use `vllm` acceleration for VLM model inference or install a lightweight client on edge devices, please refer to the documentation [Extension Modules Installation Guide](./extension_modules.md).
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -51,14 +51,16 @@ The following sections provide detailed descriptions of each file's purpose and
|
||||
|
||||
## Structured Data Files
|
||||
|
||||
### Model Inference Results (model.json)
|
||||
> [!IMPORTANT]
|
||||
> The VLM backend output has significant changes in version 2.5 and is not backward-compatible with the pipeline backend. If you plan to build secondary development on structured outputs, please read this document carefully.
|
||||
|
||||
> [!NOTE]
|
||||
> Only applicable to pipeline backend
|
||||
### Pipeline Backend Output Results
|
||||
|
||||
#### Model Inference Results (model.json)
|
||||
|
||||
**File naming format**: `{original_filename}_model.json`
|
||||
|
||||
#### Data Structure Definition
|
||||
##### Data Structure Definition
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
@@ -103,7 +105,7 @@ class PageInferenceResults(BaseModel):
|
||||
inference_result: list[PageInferenceResults] = []
|
||||
```
|
||||
|
||||
#### Coordinate System Description
|
||||
##### Coordinate System Description
|
||||
|
||||
`poly` coordinate format: `[x0, y0, x1, y1, x2, y2, x3, y3]`
|
||||
|
||||
@@ -112,7 +114,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
|
||||

|
||||
|
||||
#### Sample Data
|
||||
##### Sample Data
|
||||
|
||||
```json
|
||||
[
|
||||
@@ -165,52 +167,11 @@ inference_result: list[PageInferenceResults] = []
|
||||
]
|
||||
```
|
||||
|
||||
### VLM Output Results (model_output.txt)
|
||||
|
||||
> [!NOTE]
|
||||
> Only applicable to VLM backend
|
||||
|
||||
**File naming format**: `{original_filename}_model_output.txt`
|
||||
|
||||
#### File Format Description
|
||||
|
||||
- Uses `----` to separate output results for each page
|
||||
- Each page contains multiple text blocks starting with `<|box_start|>` and ending with `<|md_end|>`
|
||||
|
||||
#### Field Meanings
|
||||
|
||||
| Tag | Format | Description |
|
||||
|-----|--------|-------------|
|
||||
| Bounding box | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
|
||||
| Type tag | `<\|ref_start\|>type<\|ref_end\|>` | Content block type identifier |
|
||||
| Content | `<\|md_start\|>markdown content<\|md_end\|>` | Markdown content of the block |
|
||||
|
||||
#### Supported Content Types
|
||||
|
||||
```json
|
||||
{
|
||||
"text": "Text",
|
||||
"title": "Title",
|
||||
"image": "Image",
|
||||
"image_caption": "Image caption",
|
||||
"image_footnote": "Image footnote",
|
||||
"table": "Table",
|
||||
"table_caption": "Table caption",
|
||||
"table_footnote": "Table footnote",
|
||||
"equation": "Interline formula"
|
||||
}
|
||||
```
|
||||
|
||||
#### Special Tags
|
||||
|
||||
- `<|txt_contd|>`: Appears at the end of text, indicating that this text block can be connected with subsequent text blocks
|
||||
- Table content uses `otsl` format and needs to be converted to HTML for rendering in Markdown
|
||||
|
||||
### Intermediate Processing Results (middle.json)
|
||||
#### Intermediate Processing Results (middle.json)
|
||||
|
||||
**File naming format**: `{original_filename}_middle.json`
|
||||
|
||||
#### Top-level Structure
|
||||
##### Top-level Structure
|
||||
|
||||
| Field Name | Type | Description |
|
||||
|------------|------|-------------|
|
||||
@@ -218,22 +179,20 @@ inference_result: list[PageInferenceResults] = []
|
||||
| `_backend` | `string` | Parsing mode: `pipeline` or `vlm` |
|
||||
| `_version_name` | `string` | MinerU version number |
|
||||
|
||||
#### Page Information Structure (pdf_info)
|
||||
##### Page Information Structure (pdf_info)
|
||||
|
||||
| Field Name | Description |
|
||||
|------------|-------------|
|
||||
| `preproc_blocks` | Unsegmented intermediate results after PDF preprocessing |
|
||||
| `layout_bboxes` | Layout segmentation results, including layout direction and bounding boxes, sorted by reading order |
|
||||
| `page_idx` | Page number, starting from 0 |
|
||||
| `page_size` | Page width and height `[width, height]` |
|
||||
| `_layout_tree` | Layout tree structure |
|
||||
| `images` | Image block information list |
|
||||
| `tables` | Table block information list |
|
||||
| `interline_equations` | Interline formula block information list |
|
||||
| `discarded_blocks` | Block information to be discarded |
|
||||
| `para_blocks` | Content block results after segmentation |
|
||||
|
||||
#### Block Structure Hierarchy
|
||||
##### Block Structure Hierarchy
|
||||
|
||||
```
|
||||
Level 1 blocks (table | image)
|
||||
@@ -242,7 +201,7 @@ Level 1 blocks (table | image)
|
||||
└── Spans
|
||||
```
|
||||
|
||||
#### Level 1 Block Fields
|
||||
##### Level 1 Block Fields
|
||||
|
||||
| Field Name | Description |
|
||||
|------------|-------------|
|
||||
@@ -250,7 +209,7 @@ Level 1 blocks (table | image)
|
||||
| `bbox` | Rectangular box coordinates of the block `[x0, y0, x1, y1]` |
|
||||
| `blocks` | List of contained level 2 blocks |
|
||||
|
||||
#### Level 2 Block Fields
|
||||
##### Level 2 Block Fields
|
||||
|
||||
| Field Name | Description |
|
||||
|------------|-------------|
|
||||
@@ -258,7 +217,7 @@ Level 1 blocks (table | image)
|
||||
| `bbox` | Rectangular box coordinates of the block |
|
||||
| `lines` | List of contained line information |
|
||||
|
||||
#### Level 2 Block Types
|
||||
##### Level 2 Block Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
@@ -274,7 +233,7 @@ Level 1 blocks (table | image)
|
||||
| `list` | List block |
|
||||
| `interline_equation` | Interline formula block |
|
||||
|
||||
#### Line and Span Structure
|
||||
##### Line and Span Structure
|
||||
|
||||
**Line fields**:
|
||||
- `bbox`: Rectangular box coordinates of the line
|
||||
@@ -285,7 +244,7 @@ Level 1 blocks (table | image)
|
||||
- `type`: Span type (`image`, `table`, `text`, `inline_equation`, `interline_equation`)
|
||||
- `content` | `img_path`: Text content or image path
|
||||
|
||||
#### Sample Data
|
||||
##### Sample Data
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -388,15 +347,15 @@ Level 1 blocks (table | image)
|
||||
}
|
||||
```
|
||||
|
||||
### Content List (content_list.json)
|
||||
#### Content List (content_list.json)
|
||||
|
||||
**File naming format**: `{original_filename}_content_list.json`
|
||||
|
||||
#### Functionality
|
||||
##### Functionality
|
||||
|
||||
This is a simplified version of `middle.json` that stores all readable content blocks in reading order as a flat structure, removing complex layout information for easier subsequent processing.
|
||||
|
||||
#### Content Types
|
||||
##### Content Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
@@ -405,7 +364,7 @@ This is a simplified version of `middle.json` that stores all readable content b
|
||||
| `text` | Text/Title |
|
||||
| `equation` | Interline formula |
|
||||
|
||||
#### Text Level Identification
|
||||
##### Text Level Identification
|
||||
|
||||
Text levels are distinguished through the `text_level` field:
|
||||
|
||||
@@ -414,49 +373,40 @@ Text levels are distinguished through the `text_level` field:
|
||||
- `text_level: 2`: Level 2 heading
|
||||
- And so on...
|
||||
|
||||
#### Common Fields
|
||||
##### Common Fields
|
||||
|
||||
All content blocks include a `page_idx` field indicating the page number (starting from 0).
|
||||
- All content blocks include a `page_idx` field indicating the page number (starting from 0).
|
||||
- All content blocks include a `bbox` field representing the bounding box coordinates of the content block `[x0, y0, x1, y1]`, mapped to a range of 0-1000.
|
||||
|
||||
#### Sample Data
|
||||
##### Sample Data
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"type": "text",
|
||||
"text": "The response of flow duration curves to afforestation ",
|
||||
"text_level": 1,
|
||||
"text_level": 1,
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Received 1 October 2003; revised 22 December 2004; accepted 3 January 2005 ",
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Abstract ",
|
||||
"text_level": 2,
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "The hydrologic effect of replacing pasture or other short crops with trees is reasonably well understood on a mean annual basis. The impact on flow regime, as described by the annual flow duration curve (FDC) is less certain. A method to assess the impact of plantation establishment on FDCs was developed. The starting point for the analyses was the assumption that rainfall and vegetation age are the principal drivers of evapotranspiration. A key objective was to remove the variability in the rainfall signal, leaving changes in streamflow solely attributable to the evapotranspiration of the plantation. A method was developed to (1) fit a model to the observed annual time series of FDC percentiles; i.e. 10th percentile for each year of record with annual rainfall and plantation age as parameters, (2) replace the annual rainfall variation with the long term mean to obtain climate adjusted FDCs, and (3) quantify changes in FDC percentiles as plantations age. Data from 10 catchments from Australia, South Africa and New Zealand were used. The model was able to represent flow variation for the majority of percentiles at eight of the 10 catchments, particularly for the 10–50th percentiles. The adjusted FDCs revealed variable patterns in flow reductions with two types of responses (groups) being identified. Group 1 catchments show a substantial increase in the number of zero flow days, with low flows being more affected than high flows. Group 2 catchments show a more uniform reduction in flows across all percentiles. The differences may be partly explained by storage characteristics. The modelled flow reductions were in accord with published results of paired catchment experiments. An additional analysis was performed to characterise the impact of afforestation on the number of zero flow days $( N _ { \\mathrm { z e r o } } )$ for the catchments in group 1. This model performed particularly well, and when adjusted for climate, indicated a significant increase in $N _ { \\mathrm { z e r o } }$ . The zero flow day method could be used to determine change in the occurrence of any given flow in response to afforestation. The methods used in this study proved satisfactory in removing the rainfall variability, and have added useful insight into the hydrologic impacts of plantation establishment. This approach provides a methodology for understanding catchment response to afforestation, where paired catchment data is not available. ",
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "1. Introduction ",
|
||||
"text_level": 2,
|
||||
"page_idx": 1
|
||||
},
|
||||
{
|
||||
"type": "image",
|
||||
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
|
||||
"img_caption": [
|
||||
"image_caption": [
|
||||
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
|
||||
],
|
||||
"img_footnote": [],
|
||||
"image_footnote": [],
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 1
|
||||
},
|
||||
{
|
||||
@@ -464,6 +414,12 @@ All content blocks include a `page_idx` field indicating the page number (starti
|
||||
"img_path": "images/181ea56ef185060d04bf4e274685f3e072e922e7b839f093d482c29bf89b71e8.jpg",
|
||||
"text": "$$\nQ _ { \\% } = f ( P ) + g ( T )\n$$",
|
||||
"text_format": "latex",
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 2
|
||||
},
|
||||
{
|
||||
@@ -476,16 +432,281 @@ All content blocks include a `page_idx` field indicating the page number (starti
|
||||
"indicates that the rainfall term was significant at the $5 \\%$ level, $T$ indicates that the time term was significant at the $5 \\%$ level, \\* represents significance at the $10 \\%$ level, and na denotes too few data points for meaningful analysis. "
|
||||
],
|
||||
"table_body": "<html><body><table><tr><td rowspan=\"2\">Site</td><td colspan=\"10\">Percentile</td></tr><tr><td>10</td><td>20</td><td>30</td><td>40</td><td>50</td><td>60</td><td>70</td><td>80</td><td>90</td><td>100</td></tr><tr><td>Traralgon Ck</td><td>P</td><td>P,*</td><td>P</td><td>P</td><td>P,</td><td>P,</td><td>P,</td><td>P,</td><td>P</td><td>P</td></tr><tr><td>Redhill</td><td>P,T</td><td>P,T</td><td>,*</td><td>**</td><td>P.T</td><td>P,*</td><td>P*</td><td>P*</td><td>*</td><td>,*</td></tr><tr><td>Pine Ck</td><td></td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td><td>T</td><td>na</td><td>na</td></tr><tr><td>Stewarts Ck 5</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P.T</td><td>P.T</td><td>P,T</td><td>na</td><td>na</td><td>na</td></tr><tr><td>Glendhu 2</td><td>P</td><td>P,T</td><td>P,*</td><td>P,T</td><td>P.T</td><td>P,ns</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td></tr><tr><td>Cathedral Peak 2</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Cathedral Peak 3</td><td>P.T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Lambrechtsbos A</td><td>P,T</td><td>P</td><td>P</td><td>P,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>T</td></tr><tr><td>Lambrechtsbos B</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td></tr><tr><td>Biesievlei</td><td>P,T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>*,T</td><td>T</td><td>T</td><td>P,T</td><td>P,T</td></tr></table></body></html>",
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 5
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### VLM Backend Output Results
|
||||
|
||||
#### Model Inference Results (model.json)
|
||||
|
||||
**File naming format**: `{original_filename}_model.json`
|
||||
|
||||
##### File format description
|
||||
- Two-level nested list: outer list = pages; inner list = content blocks of that page
|
||||
- Each block is a dict with at least: `type`, `bbox`, `angle`, `content` (some types add extra fields like `score`, `block_tags`, `content_tags`, `format`)
|
||||
- Designed for direct, raw model inspection
|
||||
|
||||
##### Supported content types (type field values)
|
||||
```json
|
||||
{
|
||||
"text": "Plain text",
|
||||
"title": "Title",
|
||||
"equation": "Display (interline) formula",
|
||||
"image": "Image",
|
||||
"image_caption": "Image caption",
|
||||
"image_footnote": "Image footnote",
|
||||
"table": "Table",
|
||||
"table_caption": "Table caption",
|
||||
"table_footnote": "Table footnote",
|
||||
"phonetic": "Phonetic annotation",
|
||||
"code": "Code block",
|
||||
"code_caption": "Code caption",
|
||||
"ref_text": "Reference / citation entry",
|
||||
"algorithm": "Algorithm block (treated as code subtype)",
|
||||
"list": "List container",
|
||||
"header": "Page header",
|
||||
"footer": "Page footer",
|
||||
"page_number": "Page number",
|
||||
"aside_text": "Side / margin note",
|
||||
"page_footnote": "Page footnote"
|
||||
}
|
||||
```
|
||||
|
||||
##### Coordinate system
|
||||
- `bbox` = `[x0, y0, x1, y1]` (top-left, bottom-right)
|
||||
- Origin at top-left of the page
|
||||
- All coordinates are normalized percentages in `[0,1]`
|
||||
|
||||
##### Sample data
|
||||
```json
|
||||
[
|
||||
[
|
||||
{
|
||||
"type": "header",
|
||||
"bbox": [0.077, 0.095, 0.18, 0.181],
|
||||
"angle": 0,
|
||||
"score": null,
|
||||
"block_tags": null,
|
||||
"content": "ELSEVIER",
|
||||
"format": null,
|
||||
"content_tags": null
|
||||
},
|
||||
{
|
||||
"type": "title",
|
||||
"bbox": [0.157, 0.228, 0.833, 0.253],
|
||||
"angle": 0,
|
||||
"score": null,
|
||||
"block_tags": null,
|
||||
"content": "The response of flow duration curves to afforestation",
|
||||
"format": null,
|
||||
"content_tags": null
|
||||
}
|
||||
]
|
||||
]
|
||||
```
|
||||
|
||||
#### Intermediate Processing Results (middle.json)
|
||||
|
||||
**File naming format**: `{original_filename}_middle.json`
|
||||
|
||||
Structure is broadly similar to the pipeline backend, but with these differences:
|
||||
|
||||
- `list` becomes a second‑level block, a new field `sub_type` distinguishes list categories:
|
||||
* `text`: ordinary list
|
||||
* `ref_text`: reference / bibliography style list
|
||||
- New `code` block type with `sub_type`(a code block always has at least a `code_body`, it may optionally have a `code_caption`):
|
||||
* `code`
|
||||
* `algorithm`
|
||||
- `discarded_blocks` may contain additional types:
|
||||
* `header`
|
||||
* `footer`
|
||||
* `page_number`
|
||||
* `aside_text`
|
||||
* `page_footnote`
|
||||
- All blocks include an `angle` field indicating rotation (one of `0, 90, 180, 270`).
|
||||
|
||||
##### Examples
|
||||
- Example: list block
|
||||
```json
|
||||
{
|
||||
"bbox": [174,155,818,333],
|
||||
"type": "list",
|
||||
"angle": 0,
|
||||
"index": 11,
|
||||
"blocks": [
|
||||
{
|
||||
"bbox": [174,157,311,175],
|
||||
"type": "text",
|
||||
"angle": 0,
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [174,157,311,175],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [174,157,311,175],
|
||||
"type": "text",
|
||||
"content": "H.1 Introduction"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 3
|
||||
},
|
||||
{
|
||||
"bbox": [175,182,464,229],
|
||||
"type": "text",
|
||||
"angle": 0,
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [175,182,464,229],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [175,182,464,229],
|
||||
"type": "text",
|
||||
"content": "H.2 Example: Divide by Zero without Exception Handling"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 4
|
||||
}
|
||||
],
|
||||
"sub_type": "text"
|
||||
}
|
||||
```
|
||||
|
||||
- Example: code block with optional caption:
|
||||
```json
|
||||
{
|
||||
"type": "code",
|
||||
"bbox": [114,780,885,1231],
|
||||
"blocks": [
|
||||
{
|
||||
"bbox": [114,780,885,1231],
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [114,780,885,1231],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [114,780,885,1231],
|
||||
"type": "text",
|
||||
"content": "1 // Fig. H.1: DivideByZeroNoExceptionHandling.java \n2 // Integer division without exception handling. \n3 import java.util.Scanner; \n4 \n5 public class DivideByZeroNoExceptionHandling \n6 { \n7 // demonstrates throwing an exception when a divide-by-zero occurs \n8 public static int quotient( int numerator, int denominator ) \n9 { \n10 return numerator / denominator; // possible division by zero \n11 } // end method quotient \n12 \n13 public static void main(String[] args) \n14 { \n15 Scanner scanner = new Scanner(System.in); // scanner for input \n16 \n17 System.out.print(\"Please enter an integer numerator: \"); \n18 int numerator = scanner.nextInt(); \n19 System.out.print(\"Please enter an integer denominator: \"); \n20 int denominator = scanner.nextInt(); \n21"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 17,
|
||||
"angle": 0,
|
||||
"type": "code_body"
|
||||
},
|
||||
{
|
||||
"bbox": [867,160,1280,189],
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [867,160,1280,189],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [867,160,1280,189],
|
||||
"type": "text",
|
||||
"content": "Algorithm 1 Modules for MCTSteg"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 19,
|
||||
"angle": 0,
|
||||
"type": "code_caption"
|
||||
}
|
||||
],
|
||||
"index": 17,
|
||||
"sub_type": "code"
|
||||
}
|
||||
```
|
||||
|
||||
#### Content List (content_list.json)
|
||||
|
||||
**File naming format**: `{original_filename}_content_list.json`
|
||||
|
||||
Based on the pipeline format, with these VLM-specific extensions:
|
||||
|
||||
- New `code` type with `sub_type` (`code` | `algorithm`):
|
||||
* Fields: `code_body` (string), optional `code_caption` (list of strings)
|
||||
- New `list` type with `sub_type` (`text` | `ref_text`):
|
||||
* Field: `list_items` (array of strings)
|
||||
- All `discarded_blocks` entries are also output (e.g., headers, footers, page numbers, margin notes, page footnotes).
|
||||
- Existing types (`image`, `table`, `text`, `equation`) remain unchanged.
|
||||
- `bbox` still uses the 0–1000 normalized coordinate mapping.
|
||||
|
||||
|
||||
##### Examples
|
||||
Example: code (algorithm) entry
|
||||
```json
|
||||
{
|
||||
"type": "code",
|
||||
"sub_type": "algorithm",
|
||||
"code_caption": ["Algorithm 1 Modules for MCTSteg"],
|
||||
"code_body": "1: function GETCOORDINATE(d) \n2: $x \\gets d / l$ , $y \\gets d$ mod $l$ \n3: return $(x, y)$ \n4: end function \n5: function BESTCHILD(v) \n6: $C \\gets$ child set of $v$ \n7: $v' \\gets \\arg \\max_{c \\in C} \\mathrm{UCTScore}(c)$ \n8: $v'.n \\gets v'.n + 1$ \n9: return $v'$ \n10: end function \n11: function BACK PROPAGATE(v) \n12: Calculate $R$ using Equation 11 \n13: while $v$ is not a root node do \n14: $v.r \\gets v.r + R$ , $v \\gets v.p$ \n15: end while \n16: end function \n17: function RANDOMSEARCH(v) \n18: while $v$ is not a leaf node do \n19: Randomly select an untried action $a \\in A(v)$ \n20: Create a new node $v'$ \n21: $(x, y) \\gets \\mathrm{GETCOORDINATE}(v'.d)$ \n22: $v'.p \\gets v$ , $v'.d \\gets v.d + 1$ , $v'.\\Gamma \\gets v.\\Gamma$ \n23: $v'.\\gamma_{x,y} \\gets a$ \n24: if $a = -1$ then \n25: $v.lc \\gets v'$ \n26: else if $a = 0$ then \n27: $v.mc \\gets v'$ \n28: else \n29: $v.rc \\gets v'$ \n30: end if \n31: $v \\gets v'$ \n32: end while \n33: return $v$ \n34: end function \n35: function SEARCH(v) \n36: while $v$ is fully expanded do \n37: $v \\gets$ BESTCHILD(v) \n38: end while \n39: if $v$ is not a leaf node then \n40: $v \\gets$ RANDOMSEARCH(v) \n41: end if \n42: return $v$ \n43: end function",
|
||||
"bbox": [510,87,881,740],
|
||||
"page_idx": 0
|
||||
}
|
||||
```
|
||||
|
||||
Example: list (text) entry
|
||||
```json
|
||||
{
|
||||
"type": "list",
|
||||
"sub_type": "text",
|
||||
"list_items": [
|
||||
"H.1 Introduction",
|
||||
"H.2 Example: Divide by Zero without Exception Handling",
|
||||
"H.3 Example: Divide by Zero with Exception Handling",
|
||||
"H.4 Summary"
|
||||
],
|
||||
"bbox": [174,155,818,333],
|
||||
"page_idx": 0
|
||||
}
|
||||
```
|
||||
|
||||
Example: discarded blocks output
|
||||
```json
|
||||
[
|
||||
{
|
||||
"type": "header",
|
||||
"text": "Journal of Hydrology 310 (2005) 253-265",
|
||||
"bbox": [363,164,623,177],
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "page_footnote",
|
||||
"text": "* Corresponding author. Address: Forest Science Centre, Department of Sustainability and Environment, P.O. Box 137, Heidelberg, Vic. 3084, Australia. Tel.: +61 3 9450 8719; fax: +61 3 9450 8644.",
|
||||
"bbox": [71,815,915,841],
|
||||
"page_idx": 0
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
The above files constitute MinerU's complete output results. Users can choose appropriate files for subsequent processing based on their needs:
|
||||
|
||||
- **Model outputs**: Use raw outputs (model.json, model_output.txt)
|
||||
- **Debugging and verification**: Use visualization files (layout.pdf, spans.pdf)
|
||||
- **Content extraction**: Use simplified files (*.md, content_list.json)
|
||||
- **Secondary development**: Use structured files (middle.json)
|
||||
- **Model outputs** (Use raw outputs):
|
||||
* model.json
|
||||
|
||||
- **Debugging and verification** (Use visualization files):
|
||||
* layout.pdf
|
||||
* spans.pdf
|
||||
|
||||
- **Content extraction**: (Use simplified files):
|
||||
* *.md
|
||||
* content_list.json
|
||||
|
||||
- **Secondary development**: (Use structured files):
|
||||
* middle.json
|
||||
|
||||
@@ -1,25 +1,17 @@
|
||||
# Advanced Command Line Parameters
|
||||
|
||||
## SGLang Acceleration Parameter Optimization
|
||||
|
||||
### Memory Optimization Parameters
|
||||
> [!TIP]
|
||||
> SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
|
||||
>
|
||||
> - If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with `--mem-fraction-static 0.5`. If VRAM issues persist, try reducing it further to `0.4` or lower.
|
||||
> - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM: `--tp-size 2`
|
||||
## vllm Acceleration Parameter Optimization
|
||||
|
||||
### Performance Optimization Parameters
|
||||
> [!TIP]
|
||||
> If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
|
||||
> If you can already use vllm normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
|
||||
>
|
||||
> - If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput: `--dp-size 2`
|
||||
> - You can also enable `torch.compile` to accelerate inference speed by approximately 15%: `--enable-torch-compile`
|
||||
> - If you have multiple graphics cards, you can use vllm's multi-card parallel mode to increase throughput: `--data-parallel-size 2`
|
||||
|
||||
### Parameter Passing Instructions
|
||||
> [!TIP]
|
||||
> - All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`
|
||||
> - If you want to learn more about `sglang` parameter usage, please refer to the [SGLang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
|
||||
> - All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`
|
||||
> - If you want to learn more about `vllm` parameter usage, please refer to the [vllm official documentation](https://docs.vllm.ai/en/latest/cli/serve.html)
|
||||
|
||||
## GPU Device Selection and Configuration
|
||||
|
||||
@@ -29,7 +21,7 @@
|
||||
> ```bash
|
||||
> CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
|
||||
> ```
|
||||
> - This specification method is effective for all command line calls, including `mineru`, `mineru-sglang-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
|
||||
> - This specification method is effective for all command line calls, including `mineru`, `mineru-vllm-server`, `mineru-gradio`, and `mineru-api`, and applies to both `pipeline` and `vlm` backends.
|
||||
|
||||
### Common Device Configuration Examples
|
||||
> [!TIP]
|
||||
@@ -46,14 +38,9 @@
|
||||
> [!TIP]
|
||||
> Here are some possible usage scenarios:
|
||||
>
|
||||
> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `sglang-server`, you can use the following command:
|
||||
> - If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start `vllm-server`, you can use the following command:
|
||||
> ```bash
|
||||
> CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
|
||||
> ```
|
||||
>
|
||||
> - If you have multiple GPUs and need to specify GPU 0–3, and start the `sglang-server` using multi-GPU data parallelism and tensor parallelism, you can use the following command:
|
||||
> ```bash
|
||||
> CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
|
||||
> CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --port 30000 --data-parallel-size 2
|
||||
> ```
|
||||
>
|
||||
> - If you have multiple graphics cards and need to start two `fastapi` services on cards 0 and 1, listening on different ports respectively, you can use the following commands:
|
||||
|
||||
@@ -11,11 +11,11 @@ Options:
|
||||
-p, --path PATH Input file path or directory (required)
|
||||
-o, --output PATH Output directory (required)
|
||||
-m, --method [auto|txt|ocr] Parsing method: auto (default), txt, ocr (pipeline backend only)
|
||||
-b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
|
||||
-b, --backend [pipeline|vlm-transformers|vlm-vllm-engine|vlm-http-client]
|
||||
Parsing backend (default: pipeline)
|
||||
-l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|latin|arabic|east_slavic|cyrillic|devanagari]
|
||||
-l, --lang [ch|ch_server|ch_lite|en|korean|japan|chinese_cht|ta|te|ka|th|el|latin|arabic|east_slavic|cyrillic|devanagari]
|
||||
Specify document language (improves OCR accuracy, pipeline backend only)
|
||||
-u, --url TEXT Service address when using sglang-client
|
||||
-u, --url TEXT Service address when using http-client
|
||||
-s, --start INTEGER Starting page number for parsing (0-based)
|
||||
-e, --end INTEGER Ending page number for parsing (0-based)
|
||||
-f, --formula BOOLEAN Enable formula parsing (default: enabled)
|
||||
@@ -45,7 +45,7 @@ Options:
|
||||
files to be input need to be placed in the
|
||||
`example` folder within the directory where
|
||||
the command is currently executed.
|
||||
--enable-sglang-engine BOOLEAN Enable SgLang engine backend for faster
|
||||
--enable-vllm-engine BOOLEAN Enable vllm engine backend for faster
|
||||
processing.
|
||||
--enable-api BOOLEAN Enable gradio API for serving the
|
||||
application.
|
||||
@@ -65,9 +65,38 @@ Options:
|
||||
Some parameters of MinerU command line tools have equivalent environment variable configurations. Generally, environment variable configurations have higher priority than command line parameters and take effect across all command line tools.
|
||||
Here are the environment variables and their descriptions:
|
||||
|
||||
- `MINERU_DEVICE_MODE`: Used to specify inference device, supports device types like `cpu/cuda/cuda:0/npu/mps`, only effective for `pipeline` backend.
|
||||
- `MINERU_VIRTUAL_VRAM_SIZE`: Used to specify maximum GPU VRAM usage per process (GB), only effective for `pipeline` backend.
|
||||
- `MINERU_MODEL_SOURCE`: Used to specify model source, supports `huggingface/modelscope/local`, defaults to `huggingface`, can be switched to `modelscope` or local models through environment variables.
|
||||
- `MINERU_TOOLS_CONFIG_JSON`: Used to specify configuration file path, defaults to `mineru.json` in user directory, can specify other configuration file paths through environment variables.
|
||||
- `MINERU_FORMULA_ENABLE`: Used to enable formula parsing, defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
|
||||
- `MINERU_TABLE_ENABLE`: Used to enable table parsing, defaults to `true`, can be set to `false` through environment variables to disable table parsing.
|
||||
- `MINERU_DEVICE_MODE`:
|
||||
* Used to specify inference device
|
||||
* supports device types like `cpu/cuda/cuda:0/npu/mps`
|
||||
* only effective for `pipeline` backend.
|
||||
|
||||
- `MINERU_VIRTUAL_VRAM_SIZE`:
|
||||
* Used to specify maximum GPU VRAM usage per process (GB)
|
||||
* only effective for `pipeline` backend.
|
||||
|
||||
- `MINERU_MODEL_SOURCE`:
|
||||
* Used to specify model source
|
||||
* supports `huggingface/modelscope/local`
|
||||
* defaults to `huggingface`, can be switched to `modelscope` or local models through environment variables.
|
||||
|
||||
- `MINERU_TOOLS_CONFIG_JSON`:
|
||||
* Used to specify configuration file path
|
||||
* defaults to `mineru.json` in user directory, can specify other configuration file paths through environment variables.
|
||||
|
||||
- `MINERU_FORMULA_ENABLE`:
|
||||
* Used to enable formula parsing
|
||||
* defaults to `true`, can be set to `false` through environment variables to disable formula parsing.
|
||||
|
||||
- `MINERU_FORMULA_CH_SUPPORT`:
|
||||
* Used to enable Chinese formula parsing optimization (experimental feature)
|
||||
* Default is `false`, can be set to `true` via environment variable to enable Chinese formula parsing optimization.
|
||||
* Only effective for `pipeline` backend.
|
||||
|
||||
- `MINERU_TABLE_ENABLE`:
|
||||
* Used to enable table parsing
|
||||
* Default is `true`, can be set to `false` via environment variable to disable table parsing.
|
||||
|
||||
- `MINERU_TABLE_MERGE_ENABLE`:
|
||||
* Used to enable table merging functionality
|
||||
* Default is `true`, can be set to `false` via environment variable to disable table merging functionality.
|
||||
|
||||
|
||||
@@ -29,11 +29,11 @@ mineru -p <input_path> -o <output_path>
|
||||
mineru -p <input_path> -o <output_path> -b vlm-transformers
|
||||
```
|
||||
> [!TIP]
|
||||
> The vlm backend additionally supports `sglang` acceleration. Compared to the `transformers` backend, `sglang` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `sglang` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
|
||||
> The vlm backend additionally supports `vllm` acceleration. Compared to the `transformers` backend, `vllm` can achieve 20-30x speedup. You can check the installation method for the complete package supporting `vllm` acceleration in the [Extension Modules Installation Guide](../quick_start/extension_modules.md).
|
||||
|
||||
If you need to adjust parsing options through custom parameters, you can also check the more detailed [Command Line Tools Usage Instructions](./cli_tools.md) in the documentation.
|
||||
|
||||
## Advanced Usage via API, WebUI, sglang-client/server
|
||||
## Advanced Usage via API, WebUI, http-client/server
|
||||
|
||||
- Direct Python API calls: [Python Usage Example](https://github.com/opendatalab/MinerU/blob/master/demo/demo.py)
|
||||
- FastAPI calls:
|
||||
@@ -44,29 +44,29 @@ If you need to adjust parsing options through custom parameters, you can also ch
|
||||
>Access `http://127.0.0.1:8000/docs` in your browser to view the API documentation.
|
||||
- Start Gradio WebUI visual frontend:
|
||||
```bash
|
||||
# Using pipeline/vlm-transformers/vlm-sglang-client backends
|
||||
# Using pipeline/vlm-transformers/vlm-http-client backends
|
||||
mineru-gradio --server-name 0.0.0.0 --server-port 7860
|
||||
# Or using vlm-sglang-engine/pipeline backends (requires sglang environment)
|
||||
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-sglang-engine true
|
||||
# Or using vlm-vllm-engine/pipeline backends (requires vllm environment)
|
||||
mineru-gradio --server-name 0.0.0.0 --server-port 7860 --enable-vllm-engine true
|
||||
```
|
||||
>[!TIP]
|
||||
>
|
||||
>- Access `http://127.0.0.1:7860` in your browser to use the Gradio WebUI.
|
||||
>- Access `http://127.0.0.1:7860/?view=api` to use the Gradio API.
|
||||
- Using `sglang-client/server` method:
|
||||
|
||||
- Using `http-client/server` method:
|
||||
```bash
|
||||
# Start sglang server (requires sglang environment)
|
||||
mineru-sglang-server --port 30000
|
||||
# Start vllm server (requires vllm environment)
|
||||
mineru-vllm-server --port 30000
|
||||
```
|
||||
>[!TIP]
|
||||
>In another terminal, connect to sglang server via sglang client (only requires CPU and network, no sglang environment needed)
|
||||
>In another terminal, connect to vllm server via http client (only requires CPU and network, no vllm environment needed)
|
||||
> ```bash
|
||||
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000
|
||||
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://127.0.0.1:30000
|
||||
> ```
|
||||
|
||||
> [!NOTE]
|
||||
> All officially supported sglang parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-sglang-server`, `mineru-gradio`, `mineru-api`.
|
||||
> We have compiled some commonly used parameters and usage methods for `sglang`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
|
||||
> All officially supported vllm parameters can be passed to MinerU through command line arguments, including the following commands: `mineru`, `mineru-vllm-server`, `mineru-gradio`, `mineru-api`.
|
||||
> We have compiled some commonly used parameters and usage methods for `vllm`, which can be found in the documentation [Advanced Command Line Parameters](./advanced_cli_parameters.md).
|
||||
|
||||
## Extending MinerU Functionality with Configuration Files
|
||||
|
||||
@@ -77,7 +77,16 @@ MinerU is now ready to use out of the box, but also supports extending functiona
|
||||
|
||||
Here are some available configuration options:
|
||||
|
||||
- `latex-delimiter-config`: Used to configure LaTeX formula delimiters, defaults to `$` symbol, can be modified to other symbols or strings as needed.
|
||||
- `llm-aided-config`: Used to configure parameters for LLM-assisted title hierarchy, compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model. You need to configure your own API key and set `enable` to `true` to enable this feature.
|
||||
- `models-dir`: Used to specify local model storage directory, please specify model directories for `pipeline` and `vlm` backends separately. After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
|
||||
|
||||
- `latex-delimiter-config`:
|
||||
* Used to configure LaTeX formula delimiters
|
||||
* Defaults to `$` symbol, can be modified to other symbols or strings as needed.
|
||||
|
||||
- `llm-aided-config`:
|
||||
* Used to configure parameters for LLM-assisted title hierarchy
|
||||
* Compatible with all LLM models supporting `openai protocol`, defaults to using Alibaba Cloud Bailian's `qwen2.5-32b-instruct` model.
|
||||
* You need to configure your own API key and set `enable` to `true` to enable this feature.
|
||||
|
||||
- `models-dir`:
|
||||
* Used to specify local model storage directory
|
||||
* Please specify model directories for `pipeline` and `vlm` backends separately.
|
||||
* After specifying the directory, you can use local models by configuring the environment variable `export MINERU_MODEL_SOURCE=local`.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
如果未能列出您的问题,您也可以使用[DeepWiki](https://deepwiki.com/opendatalab/MinerU)与AI助手交流,这可以解决大部分常见问题。
|
||||
|
||||
如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)或[WeChat](http://mineru.space/s/V85Yl)加入社区,与其他用户和开发者交流。
|
||||
如果您仍然无法解决问题,您可通过[Discord](https://discord.gg/Tdedn9GTXq)或[WeChat](https://mineru.net/community-portal/?aliasId=3c430f94)加入社区,与其他用户和开发者交流。
|
||||
|
||||
??? question "在WSL2的Ubuntu22.04中遇到报错`ImportError: libGL.so.1: cannot open shared object file: No such file or directory`"
|
||||
|
||||
@@ -14,18 +14,6 @@
|
||||
|
||||
参考:[#388](https://github.com/opendatalab/MinerU/issues/388)
|
||||
|
||||
|
||||
??? question "在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`"
|
||||
|
||||
新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
|
||||
```
|
||||
conda create -n mineru python=3.11 -y
|
||||
conda activate mineru
|
||||
pip install -U "mineru[pipeline_old_linux]"
|
||||
```
|
||||
|
||||
参考:[#1004](https://github.com/opendatalab/MinerU/issues/1004)
|
||||
|
||||
??? question "在 Linux 系统安装并使用时,解析结果缺失部份文字信息。"
|
||||
|
||||
MinerU在>=2.0的版本中使用`pypdfium2`代替`pymupdf`作为PDF页面的渲染引擎,以解决AGPLv3的许可证问题,在某些Linux发行版,由于缺少CJK字体,可能会在将PDF渲染成图片的过程中丢失部份文字。
|
||||
|
||||
@@ -19,7 +19,8 @@
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
[](https://arxiv.org/abs/2509.22186)
|
||||
[](https://deepwiki.com/opendatalab/MinerU)
|
||||
|
||||
<div align="center">
|
||||
@@ -33,7 +34,7 @@
|
||||
<!-- join us -->
|
||||
|
||||
<p align="center">
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="http://mineru.space/s/V85Yl" target="_blank">WeChat</a>
|
||||
👋 join us on <a href="https://discord.gg/Tdedn9GTXq" target="_blank">Discord</a> and <a href="https://mineru.net/community-portal/?aliasId=3c430f94" target="_blank">WeChat</a>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
|
||||
@@ -6,24 +6,23 @@ MinerU提供了便捷的docker部署方式,这有助于快速搭建环境并
|
||||
|
||||
```bash
|
||||
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/Dockerfile
|
||||
docker build -t mineru-sglang:latest -f Dockerfile .
|
||||
docker build -t mineru-vllm:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.9.post6-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台,
|
||||
> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.9.post6-cu128-b200` 再执行build操作。
|
||||
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,
|
||||
> 该版本的vLLM v1 engine对显卡型号支持有限,如您无法在Turing及更早架构的显卡上使用vLLM加速推理,可通过更改基础镜像为`vllm/vllm-openai:v0.10.2`来解决该问题。
|
||||
|
||||
## Docker说明
|
||||
|
||||
Mineru的docker使用了`lmsysorg/sglang`作为基础镜像,因此在docker中默认集成了`sglang`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`sglang`加速VLM模型推理。
|
||||
Mineru的docker使用了`vllm/vllm-openai`作为基础镜像,因此在docker中默认集成了`vllm`推理加速框架和必需的依赖环境。因此在满足条件的设备上,您可以直接使用`vllm`加速VLM模型推理。
|
||||
> [!NOTE]
|
||||
> 使用`sglang`加速VLM模型推理需要满足的条件是:
|
||||
> 使用`vllm`加速VLM模型推理需要满足的条件是:
|
||||
>
|
||||
> - 设备包含Turing及以后架构的显卡,且可用显存大于等于8G。
|
||||
> - 物理机的显卡驱动应支持CUDA 12.6或更高版本,`Blackwell`平台应支持CUDA 12.8及更高版本,可通过`nvidia-smi`命令检查驱动版本。
|
||||
> - 物理机的显卡驱动应支持CUDA 12.8或更高版本,可通过`nvidia-smi`命令检查驱动版本。
|
||||
> - docker中能够访问物理机的显卡设备。
|
||||
>
|
||||
> 如果您的设备不满足上述条件,您仍然可以使用MinerU的其他功能,但无法使用`sglang`加速VLM模型推理,即无法使用`vlm-sglang-engine`后端和启动`vlm-sglang-server`服务。
|
||||
|
||||
|
||||
## 启动 Docker 容器
|
||||
|
||||
@@ -32,12 +31,12 @@ docker run --gpus all \
|
||||
--shm-size 32g \
|
||||
-p 30000:30000 -p 7860:7860 -p 8000:8000 \
|
||||
--ipc=host \
|
||||
-it mineru-sglang:latest \
|
||||
-it mineru-vllm:latest \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
执行该命令后,您将进入到Docker容器的交互式终端,并映射了一些端口用于可能会使用的服务,您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。
|
||||
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver)。
|
||||
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuihttp-clientserver)。
|
||||
|
||||
## 通过 Docker Compose 直接启动服务
|
||||
|
||||
@@ -51,19 +50,19 @@ wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
|
||||
>
|
||||
>- `compose.yaml`文件中包含了MinerU的多个服务配置,您可以根据需要选择启动特定的服务。
|
||||
>- 不同的服务可能会有额外的参数配置,您可以在`compose.yaml`文件中查看并编辑。
|
||||
>- 由于`sglang`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`sglang`服务,因此请确保在启动`vlm-sglang-server`服务或使用`vlm-sglang-engine`后端时,其他可能使用显存的服务已停止。
|
||||
>- 由于`vllm`推理加速框架预分配显存的特性,您可能无法在同一台机器上同时运行多个`vllm`服务,因此请确保在启动`vlm-vllm-server`服务或使用`vlm-vllm-engine`后端时,其他可能使用显存的服务已停止。
|
||||
|
||||
---
|
||||
|
||||
### 启动 sglang-server 服务
|
||||
并通过`vlm-sglang-client`后端连接`sglang-server`
|
||||
### 启动 vllm-server 服务
|
||||
并通过`vlm-http-client`后端连接`vllm-server`
|
||||
```bash
|
||||
docker compose -f compose.yaml --profile sglang-server up -d
|
||||
docker compose -f compose.yaml --profile vllm-server up -d
|
||||
```
|
||||
>[!TIP]
|
||||
>在另一个终端中通过sglang client连接sglang server(只需cpu与网络,不需要sglang环境)
|
||||
>在另一个终端中通过http client连接vllm server(只需cpu与网络,不需要vllm环境)
|
||||
> ```bash
|
||||
> mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<server_ip>:30000
|
||||
> mineru -p <input_path> -o <output_path> -b vlm-http-client -u http://<server_ip>:30000
|
||||
> ```
|
||||
|
||||
---
|
||||
|
||||
@@ -4,34 +4,26 @@ MinerU 支持根据不同需求,按需安装扩展模块,以增强功能或
|
||||
## 常见场景
|
||||
|
||||
### 核心功能安装
|
||||
`core` 模块是 MinerU 的核心依赖,包含了除`sglang`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
|
||||
`core` 模块是 MinerU 的核心依赖,包含了除`vllm`外的所有功能模块。安装此模块可以确保 MinerU 的基本功能正常运行。
|
||||
```bash
|
||||
uv pip install mineru[core]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 使用`sglang`加速 VLM 模型推理
|
||||
`sglang` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
|
||||
在配置中,`all`包含了`core`和`sglang`模块,因此`mineru[all]`和`mineru[core,sglang]`是等价的。
|
||||
### 使用`vllm`加速 VLM 模型推理
|
||||
`vllm` 模块提供了对 VLM 模型推理的加速支持,适用于具有 Turing 及以后架构的显卡(8G 显存及以上)。安装此模块可以显著提升模型推理速度。
|
||||
在配置中,`all`包含了`core`和`vllm`模块,因此`mineru[all]`和`mineru[core,vllm]`是等价的。
|
||||
```bash
|
||||
uv pip install mineru[all]
|
||||
```
|
||||
> [!TIP]
|
||||
> 如在安装包含sglang的完整包过程中发生异常,请参考 [sglang 官方文档](https://docs.sglang.ai/start/install.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
|
||||
> 如在安装包含vllm的完整包过程中发生异常,请参考 [vllm 官方文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 尝试解决,或直接使用 [Docker](./docker_deployment.md) 方式部署镜像。
|
||||
|
||||
---
|
||||
|
||||
### 安装轻量版client连接sglang-server使用
|
||||
如果您需要在边缘设备上安装轻量版的 client 端以连接 `sglang-server`,可以安装mineru的基础包,非常轻量,适合在只有cpu和网络连接的设备上使用。
|
||||
### 安装轻量版client连接vllm-server使用
|
||||
如果您需要在边缘设备上安装轻量版的 client 端以连接 `vllm-server`,可以安装mineru的基础包,非常轻量,适合在只有cpu和网络连接的设备上使用。
|
||||
```bash
|
||||
uv pip install mineru
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 在过时的linux系统上使用pipeline后端
|
||||
如果您的系统过于陈旧,无法满足`mineru[core]`的依赖要求,该选项可以最低限度的满足 MinerU 的运行需求,适用于老旧系统无法升级且仅需使用 pipeline 后端的场景。
|
||||
```bash
|
||||
uv pip install mineru[pipeline_old_linux]
|
||||
```
|
||||
@@ -31,7 +31,7 @@
|
||||
<td>解析后端</td>
|
||||
<td>pipeline</td>
|
||||
<td>vlm-transformers</td>
|
||||
<td>vlm-sglang</td>
|
||||
<td>vlm-vllm</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>操作系统</td>
|
||||
@@ -80,8 +80,8 @@ uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> `mineru[core]`包含除`sglang`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
|
||||
> 如果您有使用`sglang`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
|
||||
> `mineru[core]`包含除`vllm`加速外的所有核心功能,兼容Windows / Linux / macOS系统,适合绝大多数用户。
|
||||
> 如果您有使用`vllm`加速VLM模型推理,或是在边缘设备安装轻量版client端等需求,可以参考文档[扩展模块安装指南](./extension_modules.md)。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -51,14 +51,16 @@
|
||||
|
||||
## 结构化数据文件
|
||||
|
||||
### 模型推理结果 (model.json)
|
||||
> [!IMPORTANT]
|
||||
> 2.5版本vlm后端的输出存在较大变化,与pipeline版本存在不兼容情况,如需基于结构化输出进行二次开发,请仔细阅读本文档内容。
|
||||
|
||||
> [!NOTE]
|
||||
> 仅适用于 pipeline 后端
|
||||
### pipeline 后端 输出结果
|
||||
|
||||
#### 模型推理结果 (model.json)
|
||||
|
||||
**文件命名格式**:`{原文件名}_model.json`
|
||||
|
||||
#### 数据结构定义
|
||||
##### 数据结构定义
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
@@ -103,7 +105,7 @@ class PageInferenceResults(BaseModel):
|
||||
inference_result: list[PageInferenceResults] = []
|
||||
```
|
||||
|
||||
#### 坐标系统说明
|
||||
##### 坐标系统说明
|
||||
|
||||
`poly` 坐标格式:`[x0, y0, x1, y1, x2, y2, x3, y3]`
|
||||
|
||||
@@ -112,7 +114,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
|
||||

|
||||
|
||||
#### 示例数据
|
||||
##### 示例数据
|
||||
|
||||
```json
|
||||
[
|
||||
@@ -165,52 +167,11 @@ inference_result: list[PageInferenceResults] = []
|
||||
]
|
||||
```
|
||||
|
||||
### VLM 输出结果 (model_output.txt)
|
||||
|
||||
> [!NOTE]
|
||||
> 仅适用于 VLM 后端
|
||||
|
||||
**文件命名格式**:`{原文件名}_model_output.txt`
|
||||
|
||||
#### 文件格式说明
|
||||
|
||||
- 使用 `----` 分割每一页的输出结果
|
||||
- 每页包含多个以 `<|box_start|>` 开头、`<|md_end|>` 结尾的文本块
|
||||
|
||||
#### 字段含义
|
||||
|
||||
| 标记 | 格式 | 说明 |
|
||||
|------|---|------|
|
||||
| 边界框 | `<\|box_start\|>x0 y0 x1 y1<\|box_end\|>` | 四边形坐标(左上、右下两点),页面缩放至 1000×1000 后的坐标值 |
|
||||
| 类型标记 | `<\|ref_start\|>type<\|ref_end\|>` | 内容块类型标识 |
|
||||
| 内容 | `<\|md_start\|>markdown内容<\|md_end\|>` | 该块的 Markdown 内容 |
|
||||
|
||||
#### 支持的内容类型
|
||||
|
||||
```json
|
||||
{
|
||||
"text": "文本",
|
||||
"title": "标题",
|
||||
"image": "图片",
|
||||
"image_caption": "图片描述",
|
||||
"image_footnote": "图片脚注",
|
||||
"table": "表格",
|
||||
"table_caption": "表格描述",
|
||||
"table_footnote": "表格脚注",
|
||||
"equation": "行间公式"
|
||||
}
|
||||
```
|
||||
|
||||
#### 特殊标记
|
||||
|
||||
- `<|txt_contd|>`:出现在文本末尾,表示该文本块可与后续文本块连接
|
||||
- 表格内容采用 `otsl` 格式,需转换为 HTML 才能在 Markdown 中渲染
|
||||
|
||||
### 中间处理结果 (middle.json)
|
||||
#### 中间处理结果 (middle.json)
|
||||
|
||||
**文件命名格式**:`{原文件名}_middle.json`
|
||||
|
||||
#### 顶层结构
|
||||
##### 顶层结构
|
||||
|
||||
| 字段名 | 类型 | 说明 |
|
||||
|--------|------|------|
|
||||
@@ -218,22 +179,20 @@ inference_result: list[PageInferenceResults] = []
|
||||
| `_backend` | `string` | 解析模式:`pipeline` 或 `vlm` |
|
||||
| `_version_name` | `string` | MinerU 版本号 |
|
||||
|
||||
#### 页面信息结构 (pdf_info)
|
||||
##### 页面信息结构 (pdf_info)
|
||||
|
||||
| 字段名 | 说明 |
|
||||
|--------|------|
|
||||
| `preproc_blocks` | PDF 预处理后的未分段中间结果 |
|
||||
| `layout_bboxes` | 布局分割结果,包含布局方向和边界框,按阅读顺序排序 |
|
||||
| `page_idx` | 页码,从 0 开始 |
|
||||
| `page_size` | 页面的宽度和高度 `[width, height]` |
|
||||
| `_layout_tree` | 布局树状结构 |
|
||||
| `images` | 图片块信息列表 |
|
||||
| `tables` | 表格块信息列表 |
|
||||
| `interline_equations` | 行间公式块信息列表 |
|
||||
| `discarded_blocks` | 需要丢弃的块信息 |
|
||||
| `para_blocks` | 分段后的内容块结果 |
|
||||
|
||||
#### 块结构层次
|
||||
##### 块结构层次
|
||||
|
||||
```
|
||||
一级块 (table | image)
|
||||
@@ -242,7 +201,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
└── 片段 (span)
|
||||
```
|
||||
|
||||
#### 一级块字段
|
||||
##### 一级块字段
|
||||
|
||||
| 字段名 | 说明 |
|
||||
|--------|------|
|
||||
@@ -250,7 +209,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
| `bbox` | 块的矩形框坐标 `[x0, y0, x1, y1]` |
|
||||
| `blocks` | 包含的二级块列表 |
|
||||
|
||||
#### 二级块字段
|
||||
##### 二级块字段
|
||||
|
||||
| 字段名 | 说明 |
|
||||
|--------|------|
|
||||
@@ -258,7 +217,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
| `bbox` | 块的矩形框坐标 |
|
||||
| `lines` | 包含的行信息列表 |
|
||||
|
||||
#### 二级块类型
|
||||
##### 二级块类型
|
||||
|
||||
| 类型 | 说明 |
|
||||
|------|------|
|
||||
@@ -274,7 +233,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
| `list` | 列表块 |
|
||||
| `interline_equation` | 行间公式块 |
|
||||
|
||||
#### 行和片段结构
|
||||
##### 行和片段结构
|
||||
|
||||
**行 (line) 字段**:
|
||||
- `bbox`:行的矩形框坐标
|
||||
@@ -285,7 +244,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
- `type`:片段类型(`image`、`table`、`text`、`inline_equation`、`interline_equation`)
|
||||
- `content` | `img_path`:文本内容或图片路径
|
||||
|
||||
#### 示例数据
|
||||
##### 示例数据
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -388,15 +347,15 @@ inference_result: list[PageInferenceResults] = []
|
||||
}
|
||||
```
|
||||
|
||||
### 内容列表 (content_list.json)
|
||||
#### 内容列表 (content_list.json)
|
||||
|
||||
**文件命名格式**:`{原文件名}_content_list.json`
|
||||
|
||||
#### 功能说明
|
||||
##### 功能说明
|
||||
|
||||
这是一个简化版的 `middle.json`,按阅读顺序平铺存储所有可读内容块,去除了复杂的布局信息,便于后续处理。
|
||||
|
||||
#### 内容类型
|
||||
##### 内容类型
|
||||
|
||||
| 类型 | 说明 |
|
||||
|------|------|
|
||||
@@ -405,7 +364,7 @@ inference_result: list[PageInferenceResults] = []
|
||||
| `text` | 文本/标题 |
|
||||
| `equation` | 行间公式 |
|
||||
|
||||
#### 文本层级标识
|
||||
##### 文本层级标识
|
||||
|
||||
通过 `text_level` 字段区分文本层级:
|
||||
|
||||
@@ -414,49 +373,40 @@ inference_result: list[PageInferenceResults] = []
|
||||
- `text_level: 2`:二级标题
|
||||
- 以此类推...
|
||||
|
||||
#### 通用字段
|
||||
##### 通用字段
|
||||
|
||||
所有内容块都包含 `page_idx` 字段,表示所在页码(从 0 开始)。
|
||||
- 所有内容块都包含 `page_idx` 字段,表示所在页码(从 0 开始)。
|
||||
- 所有内容块都包含 `bbox` 字段,表示内容块的边界框坐标 `[x0, y0, x1, y1]` 映射在0-1000范围内的结果。
|
||||
|
||||
#### 示例数据
|
||||
##### 示例数据
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"type": "text",
|
||||
"text": "The response of flow duration curves to afforestation ",
|
||||
"text_level": 1,
|
||||
"text_level": 1,
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Received 1 October 2003; revised 22 December 2004; accepted 3 January 2005 ",
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "Abstract ",
|
||||
"text_level": 2,
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "The hydrologic effect of replacing pasture or other short crops with trees is reasonably well understood on a mean annual basis. The impact on flow regime, as described by the annual flow duration curve (FDC) is less certain. A method to assess the impact of plantation establishment on FDCs was developed. The starting point for the analyses was the assumption that rainfall and vegetation age are the principal drivers of evapotranspiration. A key objective was to remove the variability in the rainfall signal, leaving changes in streamflow solely attributable to the evapotranspiration of the plantation. A method was developed to (1) fit a model to the observed annual time series of FDC percentiles; i.e. 10th percentile for each year of record with annual rainfall and plantation age as parameters, (2) replace the annual rainfall variation with the long term mean to obtain climate adjusted FDCs, and (3) quantify changes in FDC percentiles as plantations age. Data from 10 catchments from Australia, South Africa and New Zealand were used. The model was able to represent flow variation for the majority of percentiles at eight of the 10 catchments, particularly for the 10–50th percentiles. The adjusted FDCs revealed variable patterns in flow reductions with two types of responses (groups) being identified. Group 1 catchments show a substantial increase in the number of zero flow days, with low flows being more affected than high flows. Group 2 catchments show a more uniform reduction in flows across all percentiles. The differences may be partly explained by storage characteristics. The modelled flow reductions were in accord with published results of paired catchment experiments. An additional analysis was performed to characterise the impact of afforestation on the number of zero flow days $( N _ { \\mathrm { z e r o } } )$ for the catchments in group 1. This model performed particularly well, and when adjusted for climate, indicated a significant increase in $N _ { \\mathrm { z e r o } }$ . The zero flow day method could be used to determine change in the occurrence of any given flow in response to afforestation. The methods used in this study proved satisfactory in removing the rainfall variability, and have added useful insight into the hydrologic impacts of plantation establishment. This approach provides a methodology for understanding catchment response to afforestation, where paired catchment data is not available. ",
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "text",
|
||||
"text": "1. Introduction ",
|
||||
"text_level": 2,
|
||||
"page_idx": 1
|
||||
},
|
||||
{
|
||||
"type": "image",
|
||||
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
|
||||
"img_caption": [
|
||||
"image_caption": [
|
||||
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
|
||||
],
|
||||
"img_footnote": [],
|
||||
"image_footnote": [],
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 1
|
||||
},
|
||||
{
|
||||
@@ -464,6 +414,12 @@ inference_result: list[PageInferenceResults] = []
|
||||
"img_path": "images/181ea56ef185060d04bf4e274685f3e072e922e7b839f093d482c29bf89b71e8.jpg",
|
||||
"text": "$$\nQ _ { \\% } = f ( P ) + g ( T )\n$$",
|
||||
"text_format": "latex",
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 2
|
||||
},
|
||||
{
|
||||
@@ -476,16 +432,396 @@ inference_result: list[PageInferenceResults] = []
|
||||
"indicates that the rainfall term was significant at the $5 \\%$ level, $T$ indicates that the time term was significant at the $5 \\%$ level, \\* represents significance at the $10 \\%$ level, and na denotes too few data points for meaningful analysis. "
|
||||
],
|
||||
"table_body": "<html><body><table><tr><td rowspan=\"2\">Site</td><td colspan=\"10\">Percentile</td></tr><tr><td>10</td><td>20</td><td>30</td><td>40</td><td>50</td><td>60</td><td>70</td><td>80</td><td>90</td><td>100</td></tr><tr><td>Traralgon Ck</td><td>P</td><td>P,*</td><td>P</td><td>P</td><td>P,</td><td>P,</td><td>P,</td><td>P,</td><td>P</td><td>P</td></tr><tr><td>Redhill</td><td>P,T</td><td>P,T</td><td>,*</td><td>**</td><td>P.T</td><td>P,*</td><td>P*</td><td>P*</td><td>*</td><td>,*</td></tr><tr><td>Pine Ck</td><td></td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td><td>T</td><td>na</td><td>na</td></tr><tr><td>Stewarts Ck 5</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P.T</td><td>P.T</td><td>P,T</td><td>na</td><td>na</td><td>na</td></tr><tr><td>Glendhu 2</td><td>P</td><td>P,T</td><td>P,*</td><td>P,T</td><td>P.T</td><td>P,ns</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td></tr><tr><td>Cathedral Peak 2</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Cathedral Peak 3</td><td>P.T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Lambrechtsbos A</td><td>P,T</td><td>P</td><td>P</td><td>P,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>T</td></tr><tr><td>Lambrechtsbos B</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td></tr><tr><td>Biesievlei</td><td>P,T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>*,T</td><td>T</td><td>T</td><td>P,T</td><td>P,T</td></tr></table></body></html>",
|
||||
"bbox": [
|
||||
62,
|
||||
480,
|
||||
946,
|
||||
904
|
||||
],
|
||||
"page_idx": 5
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### VLM 后端 输出结果
|
||||
|
||||
#### 模型推理结果 (model.json)
|
||||
|
||||
**文件命名格式**:`{原文件名}_model.json`
|
||||
|
||||
##### 文件格式说明
|
||||
|
||||
- 该文件为 VLM 模型的原始输出结果,包含两层嵌套list,外层表示页面,内层表示该页的内容块
|
||||
- 每个内容块都是一个dict,包含 `type`、`bbox`、`angle`、`content` 字段
|
||||
|
||||
|
||||
##### 支持的内容类型
|
||||
|
||||
```json
|
||||
{
|
||||
"text": "文本",
|
||||
"title": "标题",
|
||||
"equation": "行间公式",
|
||||
"image": "图片",
|
||||
"image_caption": "图片描述",
|
||||
"image_footnote": "图片脚注",
|
||||
"table": "表格",
|
||||
"table_caption": "表格描述",
|
||||
"table_footnote": "表格脚注",
|
||||
"phonetic": "拼音",
|
||||
"code": "代码块",
|
||||
"code_caption": "代码描述",
|
||||
"ref_text": "参考文献",
|
||||
"algorithm": "算法块",
|
||||
"list": "列表",
|
||||
"header": "页眉",
|
||||
"footer": "页脚",
|
||||
"page_number": "页码",
|
||||
"aside_text": "装订线旁注",
|
||||
"page_footnote": "页面脚注"
|
||||
}
|
||||
```
|
||||
|
||||
##### 坐标系统说明
|
||||
|
||||
`bbox` 坐标格式:`[x0, y0, x1, y1]`
|
||||
|
||||
- 分别表示左上、右下两点的坐标
|
||||
- 坐标原点在页面左上角
|
||||
- 坐标为相对于原始页面尺寸的百分比,范围在0-1之间
|
||||
|
||||
##### 示例数据
|
||||
|
||||
```json
|
||||
[
|
||||
[
|
||||
{
|
||||
"type": "header",
|
||||
"bbox": [
|
||||
0.077,
|
||||
0.095,
|
||||
0.18,
|
||||
0.181
|
||||
],
|
||||
"angle": 0,
|
||||
"score": null,
|
||||
"block_tags": null,
|
||||
"content": "ELSEVIER",
|
||||
"format": null,
|
||||
"content_tags": null
|
||||
},
|
||||
{
|
||||
"type": "title",
|
||||
"bbox": [
|
||||
0.157,
|
||||
0.228,
|
||||
0.833,
|
||||
0.253
|
||||
],
|
||||
"angle": 0,
|
||||
"score": null,
|
||||
"block_tags": null,
|
||||
"content": "The response of flow duration curves to afforestation",
|
||||
"format": null,
|
||||
"content_tags": null
|
||||
}
|
||||
]
|
||||
]
|
||||
```
|
||||
|
||||
#### 中间处理结果 (middle.json)
|
||||
|
||||
**文件命名格式**:`{原文件名}_middle.json`
|
||||
|
||||
##### 文件格式说明
|
||||
vlm 后端的 middle.json 文件结构与 pipeline 后端类似,但存在以下差异:
|
||||
|
||||
- list变成二级block,增加`sub_type`字段区分list类型:
|
||||
* `text`(文本类型)
|
||||
* `ref_text`(引用类型)
|
||||
|
||||
- 增加code类型block,code类型包含两种"sub_type":
|
||||
* 分别是`code`和`algorithm`
|
||||
* 至少有`code_body`, 可选`code_caption`
|
||||
|
||||
- `discarded_blocks`内元素type增加以下类型:
|
||||
* `header`(页眉)
|
||||
* `footer`(页脚)
|
||||
* `page_number`(页码)
|
||||
* `aside_text`(装订线文本)
|
||||
* `page_footnote`(脚注)
|
||||
- 所有block增加`angle`字段,用来表示旋转角度,0,90,180,270
|
||||
|
||||
|
||||
##### 示例数据
|
||||
- list block 示例
|
||||
```json
|
||||
{
|
||||
"bbox": [
|
||||
174,
|
||||
155,
|
||||
818,
|
||||
333
|
||||
],
|
||||
"type": "list",
|
||||
"angle": 0,
|
||||
"index": 11,
|
||||
"blocks": [
|
||||
{
|
||||
"bbox": [
|
||||
174,
|
||||
157,
|
||||
311,
|
||||
175
|
||||
],
|
||||
"type": "text",
|
||||
"angle": 0,
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [
|
||||
174,
|
||||
157,
|
||||
311,
|
||||
175
|
||||
],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [
|
||||
174,
|
||||
157,
|
||||
311,
|
||||
175
|
||||
],
|
||||
"type": "text",
|
||||
"content": "H.1 Introduction"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 3
|
||||
},
|
||||
{
|
||||
"bbox": [
|
||||
175,
|
||||
182,
|
||||
464,
|
||||
229
|
||||
],
|
||||
"type": "text",
|
||||
"angle": 0,
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [
|
||||
175,
|
||||
182,
|
||||
464,
|
||||
229
|
||||
],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [
|
||||
175,
|
||||
182,
|
||||
464,
|
||||
229
|
||||
],
|
||||
"type": "text",
|
||||
"content": "H.2 Example: Divide by Zero without Exception Handling"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 4
|
||||
}
|
||||
],
|
||||
"sub_type": "text"
|
||||
}
|
||||
```
|
||||
- code block 示例
|
||||
```json
|
||||
{
|
||||
"type": "code",
|
||||
"bbox": [
|
||||
114,
|
||||
780,
|
||||
885,
|
||||
1231
|
||||
],
|
||||
"blocks": [
|
||||
{
|
||||
"bbox": [
|
||||
114,
|
||||
780,
|
||||
885,
|
||||
1231
|
||||
],
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [
|
||||
114,
|
||||
780,
|
||||
885,
|
||||
1231
|
||||
],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [
|
||||
114,
|
||||
780,
|
||||
885,
|
||||
1231
|
||||
],
|
||||
"type": "text",
|
||||
"content": "1 // Fig. H.1: DivideByZeroNoExceptionHandling.java \n2 // Integer division without exception handling. \n3 import java.util.Scanner; \n4 \n5 public class DivideByZeroNoExceptionHandling \n6 { \n7 // demonstrates throwing an exception when a divide-by-zero occurs \n8 public static int quotient( int numerator, int denominator ) \n9 { \n10 return numerator / denominator; // possible division by zero \n11 } // end method quotient \n12 \n13 public static void main(String[] args) \n14 { \n15 Scanner scanner = new Scanner(System.in); // scanner for input \n16 \n17 System.out.print(\"Please enter an integer numerator: \"); \n18 int numerator = scanner.nextInt(); \n19 System.out.print(\"Please enter an integer denominator: \"); \n20 int denominator = scanner.nextInt(); \n21"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 17,
|
||||
"angle": 0,
|
||||
"type": "code_body"
|
||||
},
|
||||
{
|
||||
"bbox": [
|
||||
867,
|
||||
160,
|
||||
1280,
|
||||
189
|
||||
],
|
||||
"lines": [
|
||||
{
|
||||
"bbox": [
|
||||
867,
|
||||
160,
|
||||
1280,
|
||||
189
|
||||
],
|
||||
"spans": [
|
||||
{
|
||||
"bbox": [
|
||||
867,
|
||||
160,
|
||||
1280,
|
||||
189
|
||||
],
|
||||
"type": "text",
|
||||
"content": "Algorithm 1 Modules for MCTSteg"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"index": 19,
|
||||
"angle": 0,
|
||||
"type": "code_caption"
|
||||
}
|
||||
],
|
||||
"index": 17,
|
||||
"sub_type": "code"
|
||||
}
|
||||
```
|
||||
|
||||
#### 内容列表 (content_list.json)
|
||||
|
||||
**文件命名格式**:`{原文件名}_content_list.json`
|
||||
|
||||
##### 文件格式说明
|
||||
vlm 后端的 content_list.json 文件结构与 pipeline 后端类似,伴随本次middle.json的变化,做了以下调整:
|
||||
|
||||
- 新增`code`类型,code类型包含两种"sub_type":
|
||||
* 分别是`code`和`algorithm`
|
||||
* 至少有`code_body`, 可选`code_caption`
|
||||
|
||||
- 新增`list`类型,list类型包含两种"sub_type":
|
||||
* `text`
|
||||
* `ref_text`
|
||||
|
||||
- 增加所有所有`discarded_blocks`的输出内容
|
||||
* `header`
|
||||
* `footer`
|
||||
* `page_number`
|
||||
* `aside_text`
|
||||
* `page_footnote`
|
||||
|
||||
##### 示例数据
|
||||
- code 类型 content
|
||||
```json
|
||||
{
|
||||
"type": "code",
|
||||
"sub_type": "algorithm",
|
||||
"code_caption": [
|
||||
"Algorithm 1 Modules for MCTSteg"
|
||||
],
|
||||
"code_body": "1: function GETCOORDINATE(d) \n2: $x \\gets d / l$ , $y \\gets d$ mod $l$ \n3: return $(x, y)$ \n4: end function \n5: function BESTCHILD(v) \n6: $C \\gets$ child set of $v$ \n7: $v' \\gets \\arg \\max_{c \\in C} \\mathrm{UCTScore}(c)$ \n8: $v'.n \\gets v'.n + 1$ \n9: return $v'$ \n10: end function \n11: function BACK PROPAGATE(v) \n12: Calculate $R$ using Equation 11 \n13: while $v$ is not a root node do \n14: $v.r \\gets v.r + R$ , $v \\gets v.p$ \n15: end while \n16: end function \n17: function RANDOMSEARCH(v) \n18: while $v$ is not a leaf node do \n19: Randomly select an untried action $a \\in A(v)$ \n20: Create a new node $v'$ \n21: $(x, y) \\gets \\mathrm{GETCOORDINATE}(v'.d)$ \n22: $v'.p \\gets v$ , $v'.d \\gets v.d + 1$ , $v'.\\Gamma \\gets v.\\Gamma$ \n23: $v'.\\gamma_{x,y} \\gets a$ \n24: if $a = -1$ then \n25: $v.lc \\gets v'$ \n26: else if $a = 0$ then \n27: $v.mc \\gets v'$ \n28: else \n29: $v.rc \\gets v'$ \n30: end if \n31: $v \\gets v'$ \n32: end while \n33: return $v$ \n34: end function \n35: function SEARCH(v) \n36: while $v$ is fully expanded do \n37: $v \\gets$ BESTCHILD(v) \n38: end while \n39: if $v$ is not a leaf node then \n40: $v \\gets$ RANDOMSEARCH(v) \n41: end if \n42: return $v$ \n43: end function",
|
||||
"bbox": [
|
||||
510,
|
||||
87,
|
||||
881,
|
||||
740
|
||||
],
|
||||
"page_idx": 0
|
||||
}
|
||||
```
|
||||
- list 类型 content
|
||||
```json
|
||||
{
|
||||
"type": "list",
|
||||
"sub_type": "text",
|
||||
"list_items": [
|
||||
"H.1 Introduction",
|
||||
"H.2 Example: Divide by Zero without Exception Handling",
|
||||
"H.3 Example: Divide by Zero with Exception Handling",
|
||||
"H.4 Summary"
|
||||
],
|
||||
"bbox": [
|
||||
174,
|
||||
155,
|
||||
818,
|
||||
333
|
||||
],
|
||||
"page_idx": 0
|
||||
}
|
||||
```
|
||||
- discarded 类型 content
|
||||
```json
|
||||
[{
|
||||
"type": "header",
|
||||
"text": "Journal of Hydrology 310 (2005) 253-265",
|
||||
"bbox": [
|
||||
363,
|
||||
164,
|
||||
623,
|
||||
177
|
||||
],
|
||||
"page_idx": 0
|
||||
},
|
||||
{
|
||||
"type": "page_footnote",
|
||||
"text": "* Corresponding author. Address: Forest Science Centre, Department of Sustainability and Environment, P.O. Box 137, Heidelberg, Vic. 3084, Australia. Tel.: +61 3 9450 8719; fax: +61 3 9450 8644.",
|
||||
"bbox": [
|
||||
71,
|
||||
815,
|
||||
915,
|
||||
841
|
||||
],
|
||||
"page_idx": 0
|
||||
}]
|
||||
```
|
||||
|
||||
|
||||
## 总结
|
||||
|
||||
以上文件为 MinerU 的完整输出结果,用户可根据需要选择合适的文件进行后续处理:
|
||||
|
||||
- **模型输出**:使用原始输出(model.json、model_output.txt)
|
||||
- **调试和验证**:使用可视化文件(layout.pdf、spans.pdf)
|
||||
- **内容提取**:使用简化文件(*.md、content_list.json)
|
||||
- **二次开发**:使用结构化文件(middle.json)
|
||||
- **模型输出**(使用原始输出):
|
||||
* model.json
|
||||
|
||||
- **调试和验证**(使用可视化文件):
|
||||
* layout.pdf
|
||||
* spans.pdf
|
||||
|
||||
- **内容提取**(使用简化文件):
|
||||
* *.md
|
||||
* content_list.json
|
||||
|
||||
- **二次开发**(使用结构化文件):
|
||||
* middle.json
|
||||
|
||||