mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-04-01 21:48:36 +07:00
Compare commits
21 Commits
release-2.
...
release-2.
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e853563182 | ||
|
|
238bf86e6f | ||
|
|
e387233c7d | ||
|
|
868a7a5402 | ||
|
|
9b28ed8a7a | ||
|
|
5fe068d441 | ||
|
|
cdd7bef996 | ||
|
|
4156a2b89d | ||
|
|
2c702890a4 | ||
|
|
3c8385c2c6 | ||
|
|
d29cf4e076 | ||
|
|
ec85af39dc | ||
|
|
b40c432741 | ||
|
|
1cd683b944 | ||
|
|
6162ae2be1 | ||
|
|
fa9aaaa7b7 | ||
|
|
ac5db5d455 | ||
|
|
0031981e60 | ||
|
|
c47faa4d4f | ||
|
|
5c579d8919 | ||
|
|
e8865a679a |
7
.github/ISSUE_TEMPLATE/bug_report.yml
vendored
7
.github/ISSUE_TEMPLATE/bug_report.yml
vendored
@@ -109,14 +109,11 @@ body:
|
||||
- type: dropdown
|
||||
id: software_version
|
||||
attributes:
|
||||
label: Software version | 软件版本 (magic-pdf --version)
|
||||
label: Software version | 软件版本 (mineru --version)
|
||||
#multiple: false
|
||||
options:
|
||||
-
|
||||
- "1.0.x"
|
||||
- "1.1.x"
|
||||
- "1.2.x"
|
||||
- "1.3.x"
|
||||
- "2.0.x"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
|
||||
18
README.md
18
README.md
@@ -10,16 +10,13 @@
|
||||
[](https://github.com/opendatalab/MinerU)
|
||||
[](https://github.com/opendatalab/MinerU/issues)
|
||||
[](https://github.com/opendatalab/MinerU/issues)
|
||||
|
||||
[](https://pypi.org/project/mineru/)
|
||||
[](https://pypi.org/project/mineru/)
|
||||
[](https://pepy.tech/project/mineru)
|
||||
[](https://pepy.tech/project/mineru)
|
||||
|
||||
[](https://mineru.net/OpenSourceTools/Extractor?source=github)
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
|
||||
[](https://huggingface.co/spaces/opendatalab/mineru2)
|
||||
[](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
@@ -51,6 +48,9 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
|
||||
</div>
|
||||
|
||||
# Changelog
|
||||
- 2025/06/20 2.0.6 Released
|
||||
- Fixed occasional parsing interruptions caused by invalid block content in `vlm` mode
|
||||
- Fixed parsing interruptions caused by incomplete table structures in `vlm` mode
|
||||
- 2025/06/17 2.0.5 Released
|
||||
- Fixed the issue where models were still required to be downloaded in the `sglang-client` mode
|
||||
- Fixed the issue where the `sglang-client` mode unnecessarily depended on packages like `torch` during runtime.
|
||||
@@ -502,7 +502,11 @@ cd MinerU
|
||||
uv pip install -e .[core]
|
||||
```
|
||||
|
||||
#### 1.3 Install the Full Version (Supports sglang Acceleration)
|
||||
> [!TIP]
|
||||
> Linux and macOS systems automatically support CUDA/MPS acceleration after installation. For Windows users who want to use CUDA acceleration,
|
||||
> please visit the [PyTorch official website](https://pytorch.org/get-started/locally/) to install PyTorch with the appropriate CUDA version.
|
||||
|
||||
#### 1.3 Install Full Version (supports sglang acceleration) (requires device with Ampere or newer architecture and at least 24GB GPU memory)
|
||||
|
||||
If you need to use **sglang to accelerate VLM model inference**, you can choose any of the following methods to install the full version:
|
||||
|
||||
@@ -661,6 +665,12 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
|
||||
mineru-sglang-server --port 30000
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> sglang acceleration requires a GPU with Ampere architecture or newer, and at least 24GB VRAM. If you have two 12GB or 16GB GPUs, you can use Tensor Parallelism (TP) mode:
|
||||
> `mineru-sglang-server --port 30000 --tp 2`
|
||||
>
|
||||
> If you still encounter out-of-memory errors with two GPUs, or if you need to improve throughput or inference speed using multi-GPU parallelism, please refer to the [sglang official documentation](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands).
|
||||
|
||||
2. Use Client in another terminal:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -10,16 +10,13 @@
|
||||
[](https://github.com/opendatalab/MinerU)
|
||||
[](https://github.com/opendatalab/MinerU/issues)
|
||||
[](https://github.com/opendatalab/MinerU/issues)
|
||||
|
||||
[](https://pypi.org/project/mineru/)
|
||||
[](https://pypi.org/project/mineru/)
|
||||
[](https://pepy.tech/project/mineru)
|
||||
[](https://pepy.tech/project/mineru)
|
||||
|
||||
[](https://mineru.net/OpenSourceTools/Extractor?source=github)
|
||||
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
|
||||
[](https://huggingface.co/spaces/opendatalab/MinerU)
|
||||
|
||||
[](https://huggingface.co/spaces/opendatalab/mineru2)
|
||||
[](https://colab.research.google.com/gist/myhloli/3b3a00a4a0a61577b6c30f989092d20d/mineru_demo.ipynb)
|
||||
[](https://arxiv.org/abs/2409.18839)
|
||||
@@ -50,6 +47,9 @@
|
||||
</div>
|
||||
|
||||
# 更新记录
|
||||
- 2025/06/20 2.0.6发布
|
||||
- 修复`vlm`模式下,某些偶发的无效块内容导致解析中断问题
|
||||
- 修复`vlm`模式下,某些不完整的表结构导致的解析中断问题
|
||||
- 2025/06/17 2.0.5发布
|
||||
- 修复了`sglang-client`模式下依然需要下载模型的问题
|
||||
- 修复了`sglang-client`模式需要依赖`torch`等实际运行不需要的包的问题
|
||||
@@ -492,7 +492,11 @@ cd MinerU
|
||||
uv pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple
|
||||
```
|
||||
|
||||
#### 1.3 安装完整版(支持 sglang 加速)
|
||||
> [!TIP]
|
||||
> Linux和macOS系统安装后自动支持cuda/mps加速,Windows用户如需使用cuda加速,
|
||||
> 请前往 [Pytorch官网](https://pytorch.org/get-started/locally/) 选择合适的cuda版本安装pytorch。
|
||||
|
||||
#### 1.3 安装完整版(支持 sglang 加速)(需确保设备有Ampere及以后架构,24G显存及以上显卡)
|
||||
|
||||
如需使用 **sglang 加速 VLM 模型推理**,请选择合适的方式安装完整版本:
|
||||
|
||||
@@ -650,6 +654,12 @@ mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
|
||||
mineru-sglang-server --port 30000
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> sglang加速需设备有Ampere及以后架构,24G显存及以上显卡,如您有两张12G或16G显卡,可以通过张量并行(TP)模式使用:
|
||||
> `mineru-sglang-server --port 30000 --tp 2`
|
||||
>
|
||||
> 如使用两张卡仍出现显存不足错误或需要使用多卡并行增加吞吐量或推理速度,请参考 [sglang官方文档](https://docs.sglang.ai/backend/server_arguments.html#common-launch-commands)
|
||||
|
||||
2. 在另一个终端中使用 Client 调用:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
import re
|
||||
from typing import Literal
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from mineru.utils.boxbase import bbox_distance, is_in
|
||||
from mineru.utils.enum_class import ContentType, BlockType, SplitFlag
|
||||
from mineru.backend.vlm.vlm_middle_json_mkcontent import merge_para_with_text
|
||||
@@ -22,25 +24,30 @@ class MagicModel:
|
||||
# 解析每个块
|
||||
for index, block_info in enumerate(block_infos):
|
||||
block_bbox = block_info[0].strip()
|
||||
x1, y1, x2, y2 = map(int, block_bbox.split())
|
||||
x_1, y_1, x_2, y_2 = (
|
||||
int(x1 * width / 1000),
|
||||
int(y1 * height / 1000),
|
||||
int(x2 * width / 1000),
|
||||
int(y2 * height / 1000),
|
||||
)
|
||||
if x_2 < x_1:
|
||||
x_1, x_2 = x_2, x_1
|
||||
if y_2 < y_1:
|
||||
y_1, y_2 = y_2, y_1
|
||||
block_bbox = (x_1, y_1, x_2, y_2)
|
||||
block_type = block_info[1].strip()
|
||||
block_content = block_info[2].strip()
|
||||
try:
|
||||
x1, y1, x2, y2 = map(int, block_bbox.split())
|
||||
x_1, y_1, x_2, y_2 = (
|
||||
int(x1 * width / 1000),
|
||||
int(y1 * height / 1000),
|
||||
int(x2 * width / 1000),
|
||||
int(y2 * height / 1000),
|
||||
)
|
||||
if x_2 < x_1:
|
||||
x_1, x_2 = x_2, x_1
|
||||
if y_2 < y_1:
|
||||
y_1, y_2 = y_2, y_1
|
||||
block_bbox = (x_1, y_1, x_2, y_2)
|
||||
block_type = block_info[1].strip()
|
||||
block_content = block_info[2].strip()
|
||||
|
||||
# print(f"坐标: {block_bbox}")
|
||||
# print(f"类型: {block_type}")
|
||||
# print(f"内容: {block_content}")
|
||||
# print("-" * 50)
|
||||
# print(f"坐标: {block_bbox}")
|
||||
# print(f"类型: {block_type}")
|
||||
# print(f"内容: {block_content}")
|
||||
# print("-" * 50)
|
||||
except Exception as e:
|
||||
# 如果解析失败,可能是因为格式不正确,跳过这个块
|
||||
logger.warning(f"Invalid block format: {block_info}, error: {e}")
|
||||
continue
|
||||
|
||||
span_type = "unknown"
|
||||
if block_type in [
|
||||
|
||||
@@ -58,7 +58,7 @@ class PytorchPaddleOCR(TextSystem):
|
||||
|
||||
device = get_device()
|
||||
if device == 'cpu' and self.lang in ['ch', 'ch_server', 'japan', 'chinese_cht']:
|
||||
logger.warning("The current device in use is CPU. To ensure the speed of parsing, the language is automatically switched to ch_lite.")
|
||||
# logger.warning("The current device in use is CPU. To ensure the speed of parsing, the language is automatically switched to ch_lite.")
|
||||
self.lang = 'ch_lite'
|
||||
|
||||
if self.lang in latin_lang:
|
||||
|
||||
@@ -62,7 +62,7 @@ class Mineru2QwenForCausalLM(nn.Module):
|
||||
|
||||
# load vision tower
|
||||
mm_vision_tower = self.config.mm_vision_tower
|
||||
model_root_path = auto_download_and_get_model_root_path("/", "vlm")
|
||||
model_root_path = auto_download_and_get_model_root_path(mm_vision_tower, "vlm")
|
||||
mm_vision_tower = f"{model_root_path}/{mm_vision_tower}"
|
||||
|
||||
if "clip" in mm_vision_tower:
|
||||
|
||||
@@ -132,6 +132,35 @@ def otsl_parse_texts(texts, tokens):
|
||||
r_idx = 0
|
||||
c_idx = 0
|
||||
|
||||
# Check and complete the matrix
|
||||
if split_row_tokens:
|
||||
max_cols = max(len(row) for row in split_row_tokens)
|
||||
|
||||
# Insert additional <ecel> to tags
|
||||
for row_idx, row in enumerate(split_row_tokens):
|
||||
while len(row) < max_cols:
|
||||
row.append(OTSL_ECEL)
|
||||
|
||||
# Insert additional <ecel> to texts
|
||||
new_texts = []
|
||||
text_idx = 0
|
||||
|
||||
for row_idx, row in enumerate(split_row_tokens):
|
||||
for col_idx, token in enumerate(row):
|
||||
new_texts.append(token)
|
||||
if text_idx < len(texts) and texts[text_idx] == token:
|
||||
text_idx += 1
|
||||
if (text_idx < len(texts) and
|
||||
texts[text_idx] not in [OTSL_NL, OTSL_FCEL, OTSL_ECEL, OTSL_LCEL, OTSL_UCEL, OTSL_XCEL]):
|
||||
new_texts.append(texts[text_idx])
|
||||
text_idx += 1
|
||||
|
||||
new_texts.append(OTSL_NL)
|
||||
if text_idx < len(texts) and texts[text_idx] == OTSL_NL:
|
||||
text_idx += 1
|
||||
|
||||
texts = new_texts
|
||||
|
||||
def count_right(tokens, c_idx, r_idx, which_tokens):
|
||||
span = 0
|
||||
c_idx_iter = c_idx
|
||||
@@ -235,10 +264,11 @@ def export_to_html(table_data: TableData):
|
||||
|
||||
body = ""
|
||||
|
||||
grid = table_data.grid
|
||||
for i in range(nrows):
|
||||
body += "<tr>"
|
||||
for j in range(ncols):
|
||||
cell: TableCell = table_data.grid[i][j]
|
||||
cell: TableCell = grid[i][j]
|
||||
|
||||
rowspan, rowstart = (
|
||||
cell.row_span,
|
||||
|
||||
@@ -57,8 +57,12 @@ def auto_download_and_get_model_root_path(relative_path: str, repo_mode='pipelin
|
||||
relative_path = relative_path.strip('/')
|
||||
cache_dir = snapshot_download(repo, allow_patterns=[relative_path, relative_path+"/*"])
|
||||
elif repo_mode == 'vlm':
|
||||
# VLM 模式下,直接下载整个模型目录
|
||||
cache_dir = snapshot_download(repo)
|
||||
# VLM 模式下,根据 relative_path 的不同处理方式
|
||||
if relative_path == "/":
|
||||
cache_dir = snapshot_download(repo)
|
||||
else:
|
||||
relative_path = relative_path.strip('/')
|
||||
cache_dir = snapshot_download(repo, allow_patterns=[relative_path, relative_path+"/*"])
|
||||
|
||||
if not cache_dir:
|
||||
raise FileNotFoundError(f"Failed to download model: {relative_path} from {repo}")
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = "2.0.4"
|
||||
__version__ = "2.0.5"
|
||||
|
||||
Reference in New Issue
Block a user