mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-03-27 19:18:34 +07:00
Compare commits
40 Commits
release-2.
...
release-2.
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e96e4a0ce4 | ||
|
|
c7bde0ab39 | ||
|
|
8754c24e42 | ||
|
|
4f8c00cc34 | ||
|
|
89681f98ad | ||
|
|
66d328dbc5 | ||
|
|
f0c1318545 | ||
|
|
6e97f3cf70 | ||
|
|
aede62167e | ||
|
|
5f2740f743 | ||
|
|
a888d2b625 | ||
|
|
4275876331 | ||
|
|
ec9f7f54ab | ||
|
|
7861e5e369 | ||
|
|
159f3a89a3 | ||
|
|
d9452bbeb9 | ||
|
|
d808a32c0b | ||
|
|
12ce3bd024 | ||
|
|
e3d7aece50 | ||
|
|
7c55a0ea65 | ||
|
|
f1659eb7a7 | ||
|
|
c6bffd9382 | ||
|
|
857dcb2ef5 | ||
|
|
ef69f98cd6 | ||
|
|
6d5d1cf26b | ||
|
|
7c481796f8 | ||
|
|
7d62b7b7cc | ||
|
|
5a0cf9af7f | ||
|
|
f5e0e67545 | ||
|
|
a4cac624df | ||
|
|
e1eb318b9b | ||
|
|
31834b1e68 | ||
|
|
100ace2e99 | ||
|
|
6aac639686 | ||
|
|
82f94a9a84 | ||
|
|
d928334c61 | ||
|
|
ebad82bd8c | ||
|
|
b03c5fb449 | ||
|
|
c343afd20c | ||
|
|
6586c7c01e |
@@ -44,7 +44,13 @@
|
||||
|
||||
# Changelog
|
||||
|
||||
- 2025/09/19 2.5.1 Released
|
||||
- 2025/09/20 2.5.3 Released
|
||||
- Dependency version range adjustment to enable Turing and earlier architecture GPUs to use vLLM acceleration for MinerU2.5 model inference.
|
||||
- `pipeline` backend compatibility fixes for torch 2.8.0.
|
||||
- Reduced default concurrency for vLLM async backend to lower server pressure and avoid connection closure issues caused by high load.
|
||||
- More compatibility-related details can be found in the [announcement](https://github.com/opendatalab/MinerU/discussions/3548)
|
||||
|
||||
- 2025/09/19 2.5.2 Released
|
||||
|
||||
We are officially releasing MinerU2.5, currently the most powerful multimodal large model for document parsing.
|
||||
With only 1.2B parameters, MinerU2.5's accuracy on the OmniDocBench benchmark comprehensively surpasses top-tier multimodal models like Gemini 2.5 Pro, GPT-4o, and Qwen2.5-VL-72B. It also significantly outperforms leading specialized models such as dots.ocr, MonkeyOCR, and PP-StructureV3.
|
||||
|
||||
@@ -43,10 +43,15 @@
|
||||
</div>
|
||||
|
||||
# 更新记录
|
||||
- 2025/09/20 2.5.3 发布
|
||||
- 依赖版本范围调整,使得Turing及更早架构显卡可以使用vLLM加速推理MinerU2.5模型。
|
||||
- `pipeline`后端对torch 2.8.0的一些兼容性修复。
|
||||
- 降低vLLM异步后端默认的并发数,降低服务端压力以避免高压导致的链接关闭问题。
|
||||
- 更多兼容性相关内容详见[公告](https://github.com/opendatalab/MinerU/discussions/3547)
|
||||
|
||||
- 2025/09/19 2.5.1 发布
|
||||
- 2025/09/19 2.5.2 发布
|
||||
我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数,MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型,并显著领先于主流文档解析专用模型(如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。
|
||||
模型已发布至[HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B)和[ModelScope](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B)平台,欢迎大家下载使用!
|
||||
模型已发布至[HuggingFace](https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B)和[ModelScope](https://modelscope.cn/models/opendatalab/MinerU2.5-2509-1.2B)平台,欢迎大家下载使用!
|
||||
- 核心亮点
|
||||
- 极致能效,性能SOTA: 以 1.2B 的轻量化规模,实现了超越百亿乃至千亿级模型的SOTA性能,重新定义了文档解析的能效比。
|
||||
- 先进架构,全面领先: 通过 “两阶段推理” (解耦布局分析与内容识别) 与 原生高分辨率架构 的结合,在布局分析、文本识别、公式识别、表格识别及阅读顺序五大方面均达到 SOTA 水平。
|
||||
|
||||
@@ -1,9 +1,16 @@
|
||||
# Use DaoCloud mirrored vllm image for China region
|
||||
# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
|
||||
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
|
||||
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use the official vllm image
|
||||
# FROM vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
|
||||
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Use the official vllm image
|
||||
# FROM vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
apt-get install -y \
|
||||
|
||||
@@ -1,6 +1,10 @@
|
||||
# Use the official vllm image
|
||||
# Use the official vllm image for gpu with Ampere architecture and above (Compute Capability>=8.0)
|
||||
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
|
||||
FROM vllm/vllm-openai:v0.10.1.1
|
||||
|
||||
# Use the official vllm image for gpu with Turing architecture and below (Compute Capability<8.0)
|
||||
# FROM vllm/vllm-openai:v0.10.2
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
apt-get install -y \
|
||||
|
||||
@@ -10,7 +10,8 @@ docker build -t mineru-vllm:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper/Blackwell platforms.
|
||||
> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `vllm/vllm-openai:v0.10.1.1` as the base image by default. This version of vLLM v1 engine has limited support for GPU models.
|
||||
> If you cannot use vLLM accelerated inference on Turing and earlier architecture GPUs, you can resolve this issue by changing the base image to `vllm/vllm-openai:v0.10.2`.
|
||||
|
||||
## Docker Description
|
||||
|
||||
|
||||
@@ -10,7 +10,8 @@ docker build -t mineru-vllm:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper/Blackwell平台,
|
||||
> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`vllm/vllm-openai:v0.10.1.1`作为基础镜像,
|
||||
> 该版本的vLLM v1 engine对显卡型号支持有限,如您无法在Turing及更早架构的显卡上使用vLLM加速推理,可通过更改基础镜像为`vllm/vllm-openai:v0.10.2`来解决该问题。
|
||||
|
||||
## Docker说明
|
||||
|
||||
|
||||
@@ -116,9 +116,14 @@ class BatchAnalyze:
|
||||
atom_model_name=AtomicModel.ImgOrientationCls,
|
||||
)
|
||||
try:
|
||||
img_orientation_cls_model.batch_predict(table_res_list_all_page,
|
||||
det_batch_size=self.batch_ratio * OCR_DET_BASE_BATCH_SIZE,
|
||||
batch_size=TABLE_ORI_CLS_BATCH_SIZE)
|
||||
if self.enable_ocr_det_batch:
|
||||
img_orientation_cls_model.batch_predict(table_res_list_all_page,
|
||||
det_batch_size=self.batch_ratio * OCR_DET_BASE_BATCH_SIZE,
|
||||
batch_size=TABLE_ORI_CLS_BATCH_SIZE)
|
||||
else:
|
||||
for table_res in table_res_list_all_page:
|
||||
rotate_label = img_orientation_cls_model.predict(table_res['table_img'])
|
||||
img_orientation_cls_model.img_rotate(table_res, rotate_label)
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Image orientation classification failed: {e}, using original image"
|
||||
|
||||
41
mineru/backend/vlm/custom_logits_processors.py
Normal file
41
mineru/backend/vlm/custom_logits_processors.py
Normal file
@@ -0,0 +1,41 @@
|
||||
import os
|
||||
|
||||
from loguru import logger
|
||||
from packaging import version
|
||||
|
||||
|
||||
def enable_custom_logits_processors():
|
||||
import torch
|
||||
from vllm import __version__ as vllm_version
|
||||
|
||||
if not torch.cuda.is_available():
|
||||
logger.info("CUDA not available, disabling custom_logits_processors")
|
||||
return False
|
||||
|
||||
major, minor = torch.cuda.get_device_capability()
|
||||
# 正确计算Compute Capability
|
||||
compute_capability = f"{major}.{minor}"
|
||||
|
||||
# 安全地处理环境变量
|
||||
vllm_use_v1_str = os.getenv('VLLM_USE_V1', "1")
|
||||
if vllm_use_v1_str.isdigit():
|
||||
vllm_use_v1 = int(vllm_use_v1_str)
|
||||
else:
|
||||
vllm_use_v1 = 1
|
||||
|
||||
if vllm_use_v1 == 0:
|
||||
logger.info("VLLM_USE_V1 is set to 0, disabling custom_logits_processors")
|
||||
return False
|
||||
elif version.parse(vllm_version) < version.parse("0.10.1"):
|
||||
logger.info(f"vllm version: {vllm_version} < 0.10.1, disable custom_logits_processors")
|
||||
return False
|
||||
elif version.parse(compute_capability) < version.parse("8.0"):
|
||||
if version.parse(vllm_version) >= version.parse("0.10.2"):
|
||||
logger.info(f"compute_capability: {compute_capability} < 8.0, but vllm version: {vllm_version} >= 0.10.2, enable custom_logits_processors")
|
||||
return True
|
||||
else:
|
||||
logger.info(f"compute_capability: {compute_capability} < 8.0 and vllm version: {vllm_version} < 0.10.2, disable custom_logits_processors")
|
||||
return False
|
||||
else:
|
||||
logger.info(f"compute_capability: {compute_capability} >= 8.0 and vllm version: {vllm_version} >= 0.10.1, enable custom_logits_processors")
|
||||
return True
|
||||
@@ -4,6 +4,7 @@ import time
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from .custom_logits_processors import enable_custom_logits_processors
|
||||
from .model_output_to_middle_json import result_to_middle_json
|
||||
from ...data.data_reader_writer import DataWriter
|
||||
from mineru.utils.pdf_image_tools import load_images_from_pdf
|
||||
@@ -88,7 +89,6 @@ class ModelSingleton:
|
||||
elif backend == "vllm-engine":
|
||||
try:
|
||||
import vllm
|
||||
vllm_version = vllm.__version__
|
||||
from mineru_vl_utils import MinerULogitsProcessor
|
||||
except ImportError:
|
||||
raise ImportError("Please install vllm to use the vllm-engine backend.")
|
||||
@@ -96,7 +96,7 @@ class ModelSingleton:
|
||||
kwargs["gpu_memory_utilization"] = 0.5
|
||||
if "model" not in kwargs:
|
||||
kwargs["model"] = model_path
|
||||
if version.parse(vllm_version) >= version.parse("0.10.1") and "logits_processors" not in kwargs:
|
||||
if enable_custom_logits_processors() and ("logits_processors" not in kwargs):
|
||||
kwargs["logits_processors"] = [MinerULogitsProcessor]
|
||||
# 使用kwargs为 vllm初始化参数
|
||||
vllm_llm = vllm.LLM(**kwargs)
|
||||
@@ -104,7 +104,6 @@ class ModelSingleton:
|
||||
try:
|
||||
from vllm.engine.arg_utils import AsyncEngineArgs
|
||||
from vllm.v1.engine.async_llm import AsyncLLM
|
||||
from vllm import __version__ as vllm_version
|
||||
from mineru_vl_utils import MinerULogitsProcessor
|
||||
except ImportError:
|
||||
raise ImportError("Please install vllm to use the vllm-async-engine backend.")
|
||||
@@ -112,7 +111,7 @@ class ModelSingleton:
|
||||
kwargs["gpu_memory_utilization"] = 0.5
|
||||
if "model" not in kwargs:
|
||||
kwargs["model"] = model_path
|
||||
if version.parse(vllm_version) >= version.parse("0.10.1") and "logits_processors" not in kwargs:
|
||||
if enable_custom_logits_processors() and ("logits_processors" not in kwargs):
|
||||
kwargs["logits_processors"] = [MinerULogitsProcessor]
|
||||
# 使用kwargs为 vllm初始化参数
|
||||
vllm_async_llm = AsyncLLM.from_engine_args(AsyncEngineArgs(**kwargs))
|
||||
|
||||
@@ -54,7 +54,7 @@ def mk_blocks_to_markdown(para_blocks, make_mode, formula_enable, table_enable,
|
||||
elif para_type == BlockType.LIST:
|
||||
for block in para_block['blocks']:
|
||||
item_text = merge_para_with_text(block, formula_enable=formula_enable, img_buket_path=img_buket_path)
|
||||
para_text += f"{item_text}\n"
|
||||
para_text += f"{item_text} \n"
|
||||
elif para_type == BlockType.TITLE:
|
||||
title_level = get_title_level(para_block)
|
||||
para_text = f'{"#" * title_level} {merge_para_with_text(para_block)}'
|
||||
|
||||
@@ -255,25 +255,28 @@ class PaddleOrientationClsModel:
|
||||
results = self.sess.run(None, {"x": x})
|
||||
for img_info, res in zip(rotated_imgs, results[0]):
|
||||
label = self.labels[np.argmax(res)]
|
||||
if label == "270":
|
||||
img_info["table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["table_img"]),
|
||||
cv2.ROTATE_90_CLOCKWISE,
|
||||
)
|
||||
img_info["wired_table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["wired_table_img"]),
|
||||
cv2.ROTATE_90_CLOCKWISE,
|
||||
)
|
||||
elif label == "90":
|
||||
img_info["table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["table_img"]),
|
||||
cv2.ROTATE_90_COUNTERCLOCKWISE,
|
||||
)
|
||||
img_info["wired_table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["wired_table_img"]),
|
||||
cv2.ROTATE_90_COUNTERCLOCKWISE,
|
||||
)
|
||||
else:
|
||||
# 180度和0度不做处理
|
||||
pass
|
||||
self.img_rotate(img_info, label)
|
||||
pbar.update(1)
|
||||
|
||||
def img_rotate(self, img_info, label):
|
||||
if label == "270":
|
||||
img_info["table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["table_img"]),
|
||||
cv2.ROTATE_90_CLOCKWISE,
|
||||
)
|
||||
img_info["wired_table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["wired_table_img"]),
|
||||
cv2.ROTATE_90_CLOCKWISE,
|
||||
)
|
||||
elif label == "90":
|
||||
img_info["table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["table_img"]),
|
||||
cv2.ROTATE_90_COUNTERCLOCKWISE,
|
||||
)
|
||||
img_info["wired_table_img"] = cv2.rotate(
|
||||
np.asarray(img_info["wired_table_img"]),
|
||||
cv2.ROTATE_90_COUNTERCLOCKWISE,
|
||||
)
|
||||
else:
|
||||
# 180度和0度不做处理
|
||||
pass
|
||||
|
||||
@@ -1,10 +1,9 @@
|
||||
import sys
|
||||
|
||||
from mineru.backend.vlm.custom_logits_processors import enable_custom_logits_processors
|
||||
from mineru.utils.models_download_utils import auto_download_and_get_model_root_path
|
||||
|
||||
from vllm.entrypoints.cli.main import main as vllm_main
|
||||
from vllm import __version__ as vllm_version
|
||||
from packaging import version
|
||||
|
||||
|
||||
def main():
|
||||
@@ -37,6 +36,8 @@ def main():
|
||||
for index in sorted(model_arg_indices, reverse=True):
|
||||
args.pop(index)
|
||||
|
||||
custom_logits_processors = enable_custom_logits_processors()
|
||||
|
||||
# 添加默认参数
|
||||
if not has_port_arg:
|
||||
args.extend(["--port", "30000"])
|
||||
@@ -44,7 +45,7 @@ def main():
|
||||
args.extend(["--gpu-memory-utilization", "0.5"])
|
||||
if not model_path:
|
||||
model_path = auto_download_and_get_model_root_path("/", "vlm")
|
||||
if not has_logits_processors_arg and version.parse(vllm_version) >= version.parse("0.10.1"):
|
||||
if (not has_logits_processors_arg) and custom_logits_processors:
|
||||
args.extend(["--logits-processors", "mineru_vl_utils:MinerULogitsProcessor"])
|
||||
|
||||
# 重构参数,将模型路径作为位置参数
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = "2.5.0"
|
||||
__version__ = "2.5.2"
|
||||
|
||||
@@ -39,7 +39,7 @@ dependencies = [
|
||||
"openai>=1.70.0,<2",
|
||||
"beautifulsoup4>=4.13.5,<5",
|
||||
"magika>=0.6.2,<0.7.0",
|
||||
"mineru-vl-utils>=0.1.7,<1",
|
||||
"mineru-vl-utils>=0.1.8,<1",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
@@ -51,12 +51,12 @@ test = [
|
||||
"fuzzywuzzy"
|
||||
]
|
||||
vlm = [
|
||||
"torch>=2.6.0,<2.8.0",
|
||||
"torch>=2.6.0,<3",
|
||||
"transformers>=4.51.1,<5.0.0",
|
||||
"accelerate>=1.5.1",
|
||||
]
|
||||
vllm = [
|
||||
"vllm==0.10.1.1",
|
||||
"vllm>=0.10.1.1,<0.11",
|
||||
]
|
||||
pipeline = [
|
||||
"matplotlib>=3.10,<4",
|
||||
@@ -68,7 +68,7 @@ pipeline = [
|
||||
"shapely>=2.0.7,<3",
|
||||
"pyclipper>=1.3.0,<2",
|
||||
"omegaconf>=2.3.0,<3",
|
||||
"torch>=2.6.0,<2.8.0",
|
||||
"torch>=2.6.0,<3",
|
||||
"torchvision",
|
||||
"transformers>=4.49.0,!=4.51.0,<5.0.0",
|
||||
"onnxruntime>1.17.0",
|
||||
|
||||
Reference in New Issue
Block a user