mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-03-27 11:08:32 +07:00
Merge pull request #4359 from opendatalab/dev
feat: add Hygon entry to acceleration cards list
This commit is contained in:
34
docker/china/dcu.Dockerfile
Normal file
34
docker/china/dcu.Dockerfile
Normal file
@@ -0,0 +1,34 @@
|
||||
# Base image containing the vLLM inference environment, requiring amd64(x86-64) CPU + Hygon DCU.
|
||||
FROM harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
|
||||
|
||||
|
||||
# Install libgl for opencv support & Noto fonts for Chinese characters
|
||||
RUN apt-get update && \
|
||||
apt-get install -y \
|
||||
fonts-noto-core \
|
||||
fonts-noto-cjk \
|
||||
fontconfig && \
|
||||
fc-cache -fv && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install mineru latest
|
||||
RUN python3 -m pip install -U pip -i https://mirrors.aliyun.com/pypi/simple && \
|
||||
python3 -m pip install mineru[api,gradio] \
|
||||
"matplotlib>=3.10,<4" \
|
||||
"ultralytics>=8.3.48,<9" \
|
||||
"doclayout_yolo==0.0.4" \
|
||||
"ftfy>=6.3.1,<7" \
|
||||
"shapely>=2.0.7,<3" \
|
||||
"pyclipper>=1.3.0,<2" \
|
||||
"omegaconf>=2.3.0,<3" \
|
||||
numpy==1.25.0 \
|
||||
opencv-python==4.11.0.86 \
|
||||
-i https://mirrors.aliyun.com/pypi/simple && \
|
||||
python3 -m pip cache purge
|
||||
|
||||
# Download models and update the configuration file
|
||||
RUN /bin/bash -c "mineru-models-download -s modelscope -m all"
|
||||
|
||||
# Set the entry point to activate the virtual environment and run the command line tool
|
||||
ENTRYPOINT ["/bin/bash", "-c", "export MINERU_MODEL_SOURCE=local && exec \"$@\"", "--"]
|
||||
115
docs/zh/usage/acceleration_cards/Hygon.md
Normal file
115
docs/zh/usage/acceleration_cards/Hygon.md
Normal file
@@ -0,0 +1,115 @@
|
||||
## 1. 测试平台
|
||||
以下为本指南测试使用的平台信息,供参考:
|
||||
```
|
||||
os: Ubuntu 22.04.3 LTS
|
||||
cpu: Hygon Hygon C86-4G(x86-64)
|
||||
dcu: BW200
|
||||
driver: 6.3.13-V1.12.0a
|
||||
docker: 20.10.24
|
||||
```
|
||||
|
||||
## 2. 环境准备
|
||||
|
||||
### 2.1 使用 Dockerfile 构建镜像
|
||||
|
||||
```bash
|
||||
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/dcu.Dockerfile
|
||||
docker build --network=host -t mineru:dcu-vllm-latest -f dcu.Dockerfile .
|
||||
```
|
||||
|
||||
|
||||
## 3. 启动 Docker 容器
|
||||
|
||||
```bash
|
||||
docker run -u root --name mineru_docker \
|
||||
--network=host \
|
||||
--ipc=host \
|
||||
--shm-size=16G \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/mkfd \
|
||||
--device=/dev/dri \
|
||||
-v /opt/hyhal:/opt/hyhal \
|
||||
--group-add video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-e MINERU_MODEL_SOURCE=local \
|
||||
-it mineru:dcu-vllm-latest \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
执行该命令后,您将进入到Docker容器的交互式终端,您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。
|
||||
您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务,详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuihttp-clientserver)。
|
||||
|
||||
|
||||
## 4. 注意事项
|
||||
|
||||
不同环境下,MinerU对Hygon加速卡的支持情况如下表所示:
|
||||
|
||||
<table border="1">
|
||||
<thead>
|
||||
<tr>
|
||||
<th rowspan="2" colspan="2">使用场景</th>
|
||||
<th colspan="2">容器环境</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>vllm</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td rowspan="3">命令行工具(mineru)</td>
|
||||
<td>pipeline</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><vlm/hybrid>-auto-engine</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><vlm/hybrid>-http-client</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="3">fastapi服务(mineru-api)</td>
|
||||
<td>pipeline</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><vlm/hybrid>-auto-engine</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><vlm/hybrid>-http-client</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="3">gradio界面(mineru-gradio)</td>
|
||||
<td>pipeline</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><vlm/hybrid>-auto-engine</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><vlm/hybrid>-http-client</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td colspan="2">openai-server服务(mineru-openai-server)</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td colspan="2">数据并行 (--data-parallel-size)</td>
|
||||
<td>🟢</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
注:
|
||||
🟢: 支持,运行较稳定,精度与Nvidia GPU基本一致
|
||||
🟡: 支持但较不稳定,在某些场景下可能出现异常,或精度存在一定差异
|
||||
🔴: 不支持,无法运行,或精度存在较大差异
|
||||
|
||||
>[!TIP]
|
||||
>DCU加速卡指定可用加速卡的方式与AMD GPU类似,请参考[GPU isolation techniques](https://rocm.docs.amd.com/en/docs-6.2.4/conceptual/gpu-isolation.html)
|
||||
@@ -12,6 +12,7 @@
|
||||
* [昇腾 Ascend](acceleration_cards/Ascend.md) 🚀
|
||||
* [平头哥 T-Head](acceleration_cards/THead.md) 🚀
|
||||
* [沐曦 METAX](acceleration_cards/METAX.md) 🚀
|
||||
* [海光 Hygon](acceleration_cards/Hygon.md) 🚀
|
||||
* [AMD](acceleration_cards/AMD.md) [#3662](https://github.com/opendatalab/MinerU/discussions/3662) ❤️
|
||||
* [太初元碁 Tecorigin](acceleration_cards/Tecorigin.md) [#3767](https://github.com/opendatalab/MinerU/pull/3767) ❤️
|
||||
* [寒武纪 Cambricon](acceleration_cards/Cambricon.md) [#4004](https://github.com/opendatalab/MinerU/discussions/4004) ❤️
|
||||
|
||||
@@ -9,13 +9,18 @@ from mineru.utils.char_utils import full_to_half
|
||||
from mineru.utils.enum_class import BlockType, SplitFlag
|
||||
|
||||
|
||||
CONTINUATION_MARKERS = [
|
||||
CONTINUATION_END_MARKERS = [
|
||||
"(续)",
|
||||
"(续表)",
|
||||
"(续上表)",
|
||||
"(continued)",
|
||||
"(cont.)",
|
||||
"(cont’d)",
|
||||
"(…continued)",
|
||||
]
|
||||
|
||||
CONTINUATION_INLINE_MARKERS = [
|
||||
"(continued)",
|
||||
]
|
||||
|
||||
|
||||
@@ -163,20 +168,32 @@ def detect_table_headers(soup1, soup2, max_header_rows=5):
|
||||
def can_merge_tables(current_table_block, previous_table_block):
|
||||
"""判断两个表格是否可以合并"""
|
||||
# 检查表格是否有caption和footnote
|
||||
# 计算previous_table_block中的footnote数量
|
||||
footnote_count = sum(1 for block in previous_table_block["blocks"] if block["type"] == BlockType.TABLE_FOOTNOTE)
|
||||
# 如果有TABLE_CAPTION类型的块,检查是否至少有一个以"(续)"结尾
|
||||
caption_blocks = [block for block in current_table_block["blocks"] if block["type"] == BlockType.TABLE_CAPTION]
|
||||
if caption_blocks:
|
||||
# 如果所有caption都不以"(续)"、"(续表)"、"(continued)"或"(cont.)"结尾,则不合并
|
||||
# 检查是否至少有一个caption包含续表标识
|
||||
has_continuation_marker = False
|
||||
for block in caption_blocks:
|
||||
caption_text = full_to_half(merge_para_with_text(block).strip()).lower()
|
||||
if (
|
||||
any(caption_text.endswith(marker.lower()) for marker in CONTINUATION_END_MARKERS)
|
||||
or any(marker.lower() in caption_text for marker in CONTINUATION_INLINE_MARKERS)
|
||||
):
|
||||
has_continuation_marker = True
|
||||
break
|
||||
|
||||
if not any(
|
||||
any(full_to_half(merge_para_with_text(block).strip()).lower().endswith(marker.lower())
|
||||
for marker in CONTINUATION_MARKERS)
|
||||
for block in caption_blocks
|
||||
):
|
||||
# 如果所有caption都不包含续表标识,则不允许合并
|
||||
if not has_continuation_marker:
|
||||
return False, None, None, None, None
|
||||
|
||||
if any(block["type"] == BlockType.TABLE_FOOTNOTE for block in previous_table_block["blocks"]):
|
||||
return False, None, None, None, None
|
||||
# 如果current_table_block的caption存在续标识,放宽footnote的限制允许previous_table_block有最多一条footnote
|
||||
if footnote_count > 1:
|
||||
return False, None, None, None, None
|
||||
else:
|
||||
if footnote_count > 0:
|
||||
return False, None, None, None, None
|
||||
|
||||
# 获取两个表格的HTML内容
|
||||
current_html = ""
|
||||
@@ -363,6 +380,11 @@ def perform_table_merge(soup1, soup2, previous_table_block, wait_merge_table_foo
|
||||
row.extract()
|
||||
tbody1.append(row)
|
||||
|
||||
# 清空previous_table_block的footnote
|
||||
previous_table_block["blocks"] = [
|
||||
block for block in previous_table_block["blocks"]
|
||||
if block["type"] != BlockType.TABLE_FOOTNOTE
|
||||
]
|
||||
# 添加待合并表格的footnote到前一个表格中
|
||||
for table_footnote in wait_merge_table_footnotes:
|
||||
temp_table_footnote = table_footnote.copy()
|
||||
|
||||
Reference in New Issue
Block a user