From 29e37933aa8b2ede4bb2183dc016eac63cc132f5 Mon Sep 17 00:00:00 2001 From: myhloli Date: Fri, 5 Sep 2025 18:45:49 +0800 Subject: [PATCH] feat: update changelog for version 2.2.0 with new table recognition model and OCR enhancements --- README.md | 151 +++++++++++++++++++++++++++++++++++------------ README_zh-CN.md | 153 ++++++++++++++++++++++++++++++++++++------------ 2 files changed, 226 insertions(+), 78 deletions(-) diff --git a/README.md b/README.md index f749d7f4..28c57c73 100644 --- a/README.md +++ b/README.md @@ -43,48 +43,121 @@ # Changelog -- 2025/08/01 2.1.10 Released - - Fixed an issue in the `pipeline` backend where block overlap caused the parsing results to deviate from expectations #3232 -- 2025/07/30 2.1.9 Released - - `transformers` 4.54.1 version adaptation -- 2025/07/28 2.1.8 Released - - `sglang` 0.4.9.post5 version adaptation -- 2025/07/27 2.1.7 Released - - `transformers` 4.54.0 version adaptation -- 2025/07/26 2.1.6 Released - - Fixed table parsing issues in handwritten documents when using `vlm` backend - - Fixed visualization box position drift issue when document is rotated #3175 -- 2025/07/24 2.1.5 Released - - `sglang` 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3 -- 2025/07/23 2.1.4 Released - - Bug Fixes - - Fixed the issue of excessive memory consumption during the `MFR` step in the `pipeline` backend under certain scenarios #2771 - - Fixed the inaccurate matching between `image`/`table` and `caption`/`footnote` under certain conditions #3129 -- 2025/07/16 2.1.1 Released - - Bug fixes - - Fixed text block content loss issue that could occur in certain `pipeline` scenarios #3005 - - Fixed issue where `sglang-client` required unnecessary packages like `torch` #2968 - - Updated `dockerfile` to fix incomplete text content parsing due to missing fonts in Linux #2915 - - Usability improvements - - Updated `compose.yaml` to facilitate direct startup of `sglang-server`, `mineru-api`, and `mineru-gradio` services - - Launched brand new [online documentation site](https://opendatalab.github.io/MinerU/), simplified readme, providing better documentation experience -- 2025/07/05 Version 2.1.0 Released - - This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows: - - **Performance Optimizations:** - - Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side). - - Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages). - - Layout analysis speed of the `pipeline` backend has been increased by approximately 20%. - - **Experience Enhancements:** - - Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver). - - Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer). - - Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`. - - Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files). - - **New Features:** - - Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html) - - Introduced limited support for vertical text layout in the `pipeline` backend. +- 2025/09/05 2.2.0 Released + - Major Updates + - In this version, we focused on improving table parsing accuracy by introducing a new [wired table recognition model](https://github.com/RapidAI/TableStructureRec) and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the `pipeline` backend. + - We also added support for cross-page table merging, which is supported by both `pipeline` and `vlm` backends, further improving the completeness and accuracy of table parsing. + - Other Updates + - The `pipeline` backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations + - `pipeline` added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5) + - Added `bbox` field (mapped to 0-1000 range) in the output `content_list.json`, making it convenient for users to directly obtain position information for each content block +
History Log + +
+ 2025/08/01 2.1.10 Released +
    +
  • Fixed an issue in the pipeline backend where block overlap caused the parsing results to deviate from expectations #3232
  • +
+
+ +
+ 2025/07/30 2.1.9 Released +
    +
  • transformers 4.54.1 version adaptation
  • +
+
+ +
+ 2025/07/28 2.1.8 Released +
    +
  • sglang 0.4.9.post5 version adaptation
  • +
+
+ +
+ 2025/07/27 2.1.7 Released +
    +
  • transformers 4.54.0 version adaptation
  • +
+
+ +
+ 2025/07/26 2.1.6 Released +
    +
  • Fixed table parsing issues in handwritten documents when using vlm backend
  • +
  • Fixed visualization box position drift issue when document is rotated #3175
  • +
+
+ +
+ 2025/07/24 2.1.5 Released +
    +
  • sglang 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3
  • +
+
+ +
+ 2025/07/23 2.1.4 Released +
    +
  • Bug Fixes +
      +
    • Fixed the issue of excessive memory consumption during the MFR step in the pipeline backend under certain scenarios #2771
    • +
    • Fixed the inaccurate matching between image/table and caption/footnote under certain conditions #3129
    • +
    +
  • +
+
+ +
+ 2025/07/16 2.1.1 Released +
    +
  • Bug fixes +
      +
    • Fixed text block content loss issue that could occur in certain pipeline scenarios #3005
    • +
    • Fixed issue where sglang-client required unnecessary packages like torch #2968
    • +
    • Updated dockerfile to fix incomplete text content parsing due to missing fonts in Linux #2915
    • +
    +
  • +
  • Usability improvements +
      +
    • Updated compose.yaml to facilitate direct startup of sglang-server, mineru-api, and mineru-gradio services
    • +
    • Launched brand new online documentation site, simplified readme, providing better documentation experience
    • +
    +
  • +
+
+ +
+ 2025/07/05 2.1.0 Released +
    +
  • This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows:
  • +
  • Performance Optimizations: +
      +
    • Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side).
    • +
    • Greatly enhanced post-processing speed when the pipeline backend handles batch processing of documents with fewer pages (<10 pages).
    • +
    • Layout analysis speed of the pipeline backend has been increased by approximately 20%.
    • +
    +
  • +
  • Experience Enhancements: +
      +
    • Built-in ready-to-use fastapi service and gradio webui. For detailed usage instructions, please refer to Documentation.
    • +
    • Adapted to sglang version 0.4.8, significantly reducing the GPU memory requirements for the vlm-sglang backend. It can now run on graphics cards with as little as 8GB GPU memory (Turing architecture or newer).
    • +
    • Added transparent parameter passing for all commands related to sglang, allowing the sglang-engine backend to receive all sglang parameters consistently with the sglang-server.
    • +
    • Supports feature extensions based on configuration files, including custom formula delimiters, enabling heading classification, and customizing local model directories. For detailed usage instructions, please refer to Documentation.
    • +
    +
  • +
  • New Features: +
      +
    • Updated the pipeline backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. Details
    • +
    • Introduced limited support for vertical text layout in the pipeline backend.
    • +
    +
  • +
+
+
2025/06/20 2.0.6 Released
    diff --git a/README_zh-CN.md b/README_zh-CN.md index a74b8a06..f58a2b86 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -43,48 +43,122 @@ # 更新记录 -- 2025/08/01 2.1.10 发布 - - 修复`pipeline`后端因block覆盖导致的解析结果与预期不符 #3232 -- 2025/07/30 2.1.9 发布 - - `transformers` 4.54.1 版本适配 -- 2025/07/28 2.1.8 发布 - - `sglang` 0.4.9.post5 版本适配 -- 2025/07/27 2.1.7 发布 - - `transformers` 4.54.0 版本适配 -- 2025/07/26 2.1.6 发布 - - 修复`vlm`后端解析部分手写文档时的表格异常问题 - - 修复文档旋转时可视化框位置漂移问题 #3175 -- 2025/07/24 2.1.5 发布 - - `sglang` 0.4.9 版本适配,同步升级dockerfile基础镜像为sglang 0.4.9.post3 -- 2025/07/23 2.1.4 发布 - - bug修复 - - 修复`pipeline`后端中`MFR`步骤在某些情况下显存消耗过大的问题 #2771 - - 修复某些情况下`image`/`table`与`caption`/`footnote`匹配不准确的问题 #3129 -- 2025/07/16 2.1.1 发布 - - bug修复 - - 修复`pipeline`在某些情况可能发生的文本块内容丢失问题 #3005 - - 修复`sglang-client`需要安装`torch`等不必要的包的问题 #2968 - - 更新`dockerfile`以修复linux字体缺失导致的解析文本内容不完整问题 #2915 - - 易用性更新 - - 更新`compose.yaml`,便于用户直接启动`sglang-server`、`mineru-api`、`mineru-gradio`服务 - - 启用全新的[在线文档站点](https://opendatalab.github.io/MinerU/zh/),简化readme,提供更好的文档体验 -- 2025/07/05 2.1.0 发布 - - 这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下: - - 性能优化: - - 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度 - - 大幅提升`pipeline`后端批量处理大量页数较少(<10)文档时的后处理速度 - - `pipeline`后端的layout分析速度提升约20% - - 体验优化: - - 内置开箱即用的`fastapi服务`和`gradio webui`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuisglang-clientserver) - - `sglang`适配`0.4.8`版本,大幅降低`vlm-sglang`后端的显存要求,最低可在`8G显存`(Turing及以后架构)的显卡上运行 - - 对所有命令增加`sglang`的参数透传,使得`sglang-engine`后端可以与`sglang-server`一致,接收`sglang`的所有参数 - - 支持基于配置文件的功能扩展,包含`自定义公式标识符`、`开启标题分级功能`、`自定义本地模型目录`,详细使用方法请参考[文档](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#mineru_1) - - 新特性: - - `pipeline`后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。[详情](https://paddlepaddle.github.io/PaddleOCR/latest/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html) - - `pipeline`后端增加对竖排文本的有限支持 + +- 2025/09/05 2.2.0 发布 + - 主要更新 + - 在这个版本我们重点提升了表格的解析精度,通过引入新的[有线表识别模型](https://github.com/RapidAI/TableStructureRec)和全新的混合表格结构解析算法,显著提升了`pipeline`后端的表格识别能力。 + - 另外我们增加了对跨页表格合并的支持,这一功能同时支持`pipeline`和`vlm`后端,进一步提升了表格解析的完整性和准确性。 + - 其他更新 + - `pipeline`后端增加270度旋转的表格解析能力,现已支持0/90/270度三个方向的表格解析 + - `pipeline`增加对泰文、希腊文的ocr能力支持,并更新了英文ocr模型至最新,英文识别精度提升11%,泰文识别模型精度 82.68%,希腊文识别模型精度 89.28%(by PPOCRv5) + - 在输出的`content_list.json`中增加了`bbox`字段(映射至0-1000范围内),方便用户直接获取每个内容块的位置信息 +
    历史日志 + +
    + 2025/08/01 2.1.10 发布 +
      +
    • 修复pipeline后端因block覆盖导致的解析结果与预期不符 #3232
    • +
    +
    + +
    + 2025/07/30 2.1.9 发布 +
      +
    • transformers 4.54.1 版本适配
    • +
    +
    + +
    + 2025/07/28 2.1.8 发布 +
      +
    • sglang 0.4.9.post5 版本适配
    • +
    +
    + +
    + 2025/07/27 2.1.7 发布 +
      +
    • transformers 4.54.0 版本适配
    • +
    +
    + +
    + 2025/07/26 2.1.6 发布 +
      +
    • 修复vlm后端解析部分手写文档时的表格异常问题
    • +
    • 修复文档旋转时可视化框位置漂移问题 #3175
    • +
    +
    + +
    + 2025/07/24 2.1.5 发布 +
      +
    • sglang 0.4.9 版本适配,同步升级dockerfile基础镜像为sglang 0.4.9.post3
    • +
    +
    + +
    + 2025/07/23 2.1.4 发布 +
      +
    • bug修复 +
        +
      • 修复pipeline后端中MFR步骤在某些情况下显存消耗过大的问题 #2771
      • +
      • 修复某些情况下image/tablecaption/footnote匹配不准确的问题 #3129
      • +
      +
    • +
    +
    + +
    + 2025/07/16 2.1.1 发布 +
      +
    • bug修复 +
        +
      • 修复pipeline在某些情况可能发生的文本块内容丢失问题 #3005
      • +
      • 修复sglang-client需要安装torch等不必要的包的问题 #2968
      • +
      • 更新dockerfile以修复linux字体缺失导致的解析文本内容不完整问题 #2915
      • +
      +
    • +
    • 易用性更新 +
        +
      • 更新compose.yaml,便于用户直接启动sglang-servermineru-apimineru-gradio服务
      • +
      • 启用全新的在线文档站点,简化readme,提供更好的文档体验
      • +
      +
    • +
    +
    + +
    + 2025/07/05 2.1.0 发布 +

    这是 MinerU 2 的第一个大版本更新,包含了大量新功能和改进,包含众多性能优化、体验优化和bug修复,具体更新内容如下:

    +
      +
    • 性能优化: +
        +
      • 大幅提升某些特定分辨率(长边2000像素左右)文档的预处理速度
      • +
      • 大幅提升pipeline后端批量处理大量页数较少(<10)文档时的后处理速度
      • +
      • pipeline后端的layout分析速度提升约20%
      • +
      +
    • +
    • 体验优化: +
        +
      • 内置开箱即用的fastapi服务gradio webui,详细使用方法请参考文档
      • +
      • sglang适配0.4.8版本,大幅降低vlm-sglang后端的显存要求,最低可在8G显存(Turing及以后架构)的显卡上运行
      • +
      • 对所有命令增加sglang的参数透传,使得sglang-engine后端可以与sglang-server一致,接收sglang的所有参数
      • +
      • 支持基于配置文件的功能扩展,包含自定义公式标识符开启标题分级功能自定义本地模型目录,详细使用方法请参考文档
      • +
      +
    • +
    • 新特性: +
        +
      • pipeline后端更新 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别,平均精度涨幅超30%。详情
      • +
      • pipeline后端增加对竖排文本的有限支持
      • +
      +
    • +
    +
    +
    2025/06/20 2.0.6发布
      @@ -584,6 +658,7 @@ mineru -p -o - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - [UniMERNet](https://github.com/opendatalab/UniMERNet) - [RapidTable](https://github.com/RapidAI/RapidTable) +- [TableStructureRec](https://github.com/RapidAI/TableStructureRec) - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch) - [layoutreader](https://github.com/ppaanngggg/layoutreader)