Compare commits

...

5 Commits

Author SHA1 Message Date
Xiaomeng Zhao
d59a69236c Merge pull request #4724 from opendatalab/dev
Dev
2026-04-02 16:36:39 +08:00
Xiaomeng Zhao
904394497e Merge pull request #4723 from myhloli/dev
docs: add detailed description of MinerU capabilities and integration…
2026-04-02 16:35:55 +08:00
myhloli
a25798b43a docs: add detailed description of MinerU capabilities and integration in README files 2026-04-02 16:29:53 +08:00
Xiaomeng Zhao
d7011f42e2 Merge pull request #4719 from opendatalab/master
master->Dev
2026-04-01 21:29:00 +08:00
myhloli
ede8d95bf1 Update version.py with new version 2026-04-01 13:20:54 +00:00
3 changed files with 66 additions and 1 deletions

View File

@@ -43,6 +43,39 @@
</div>
<details>
<summary>MinerU — High-accuracy document parsing engine for LLM · RAG · Agent workflows</summary>
Converts PDF · Word · PPT · Images · Web pages into structured Markdown / JSON · VLM+OCR dual engine · 109 languages <br>
MCP Server · LangChain / Dify / FastGPT native integration · 10+ domestic AI chip support
**🔍 Core Parsing Capabilities**
- Formulas → LaTeX · Tables → HTML, accurate layout reconstruction
- Supports scanned docs, handwriting, multi-column layouts, cross-page table merging
- Output follows human reading order with automatic header/footer removal
- VLM + OCR dual engine, 109-language OCR recognition
**🔌 Integration**
| Use Case | Solution |
|----------|----------|
| AI Coding Tools | MCP Server — Cursor · Claude Desktop · Windsurf |
| RAG Frameworks | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |
| Development | Python / Go / TypeScript SDK · CLI · REST API · Docker |
| No-Code | mineru.net online · Gradio WebUI · Desktop client |
**🖥️ Deployment (Private · Fully Offline)**
| Inference Backend | Best For |
|------------------|---------|
| pipeline | Fast & stable, no hallucination, runs on CPU or GPU |
| vlm-engine | High accuracy, supports vLLM / LMDeploy / mlx ecosystem |
| hybrid-engine | High accuracy, native text extraction, low hallucination |
Domestic AI chips: Ascend · Cambricon · Enflame · MetaX · Moore Threads · Kunlunxin · Iluvatar · Hygon · Biren · T-Head
</details>
# Changelog
- 2026/03/29 3.0.0 Released

View File

@@ -43,6 +43,38 @@
</div>
<details>
<summary>MinerU — 专为 LLM · RAG · Agent 场景构建的高精度文档解析引擎 </summary>
将 PDF · Word · PPT · 图片 · 网页转为结构化 Markdown / JSON · VLM+OCR 双引擎 · 109 种语言 <br>
MCP Server · LangChain / Dify / FastGPT 原生集成 · 10+ 国产算力适配 <br>
**🔍 核心解析能力**
- 公式 → LaTeX · 表格 → HTML精准还原复杂版面
- 支持扫描件、手写体、多栏布局、跨页表格合并
- 输出符合人类阅读顺序,自动去除页眉页脚
- VLM + OCR 双引擎,支持 109 种语言识别
**🔌 接入方式**
| 场景 | 方案 |
|------|------|
| AI 编程工具 | MCP Server — Cursor · Claude Desktop · Windsurf |
| RAG 框架 | LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT |
| 开发集成 | Python / Go / TypeScript SDK · CLI · REST API · Docker |
| 零代码 | mineru.net 在线版 · Gradio WebUI · 桌面客户端 |
**🖥️ 部署生态(支持私有化 · 完全离线)**
| 推理后端 | 适用场景 |
|--------------|-----------------------------|
| pipeline | 快速稳定无幻觉CPU / GPU 均可运行 |
| vlm-engine | 高精度,支持 vLLM / LMdeploy / mlx 生态 |
| hybrid-engine| 高精度,原生文本提取,低幻觉 |
国产算力:昇腾 · 寒武纪 · 燧原 · 沐曦 · 摩尔线程 · 昆仑芯 · 天数智芯 · 瀚博 · 太初元碁 · 海光 · 平头哥
</details>
# 更新记录
- 2026/03/29 3.0.0 发布

View File

@@ -1 +1 @@
__version__ = "3.0.6"
__version__ = "3.0.7"