Merge pull request #2199 from opendatalab/release-1.3.2

Release 1.3.2
Merge pull request #2198 from opendatalab/dev
2026-03-27 11:08:32 +07:00 · 2025-04-12 18:58:15 +08:00 · 2025-04-12 18:52:52 +08:00 · 2025-04-12 18:52:30 +08:00 · 2025-04-12 18:51:39 +08:00 · 2025-04-12 18:48:11 +08:00
30 changed files with 817 additions and 349 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -64,10 +64,10 @@ body:
      # Need quotes around `3.10` otherwise it is treated as a number and shows as `3.1`.
      options:
        -
+        - "3.13"
        - "3.12"
        - "3.11"
        - "3.10"
-        - "3.9"
    validations:
      required: true

@@ -78,10 +78,10 @@ body:
      #multiple: false
      options:
        -
-        - "0.8.x"
-        - "0.9.x"
-        - "0.10.x"
        - "1.0.x"
+        - "1.1.x"
+        - "1.2.x"
+        - "1.3.x"
    validations:
      required: true

--- a/.github/workflows/python-package.yml
+++ b/.github/workflows/python-package.yml
@@ -54,13 +54,13 @@ jobs:
        run: |
          git push origin HEAD:master

-  build:
+  check-install:
    needs: [ update-version ]
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
-        python-version: ["3.10"]
+        python-version: ["3.10", "3.11", "3.12", "3.13"]

    steps:
    - name: Checkout code
@@ -79,10 +79,26 @@ jobs:
      with:
        python-version: ${{ matrix.python-version }}

-    - name: Install dependencies
+    - name: Install magic-pdf
      run: |
        python -m pip install --upgrade pip
-        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+        pip install -e .[full]
+
+  build:
+    needs: [ check-install ]
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: [ "3.10"]
+
+    steps:
+
+    - name: Checkout code
+      uses: actions/checkout@v4
+      with:
+        ref: master
+        fetch-depth: 0

    - name: Install wheel
      run: |
--- a/README.md
+++ b/README.md
@@ -10,7 +10,8 @@
 [![forks](https://img.shields.io/github/forks/opendatalab/MinerU.svg)](https://github.com/opendatalab/MinerU)
 [![open issues](https://img.shields.io/github/issues-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
 [![issue resolution](https://img.shields.io/github/issues-closed-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
-[![PyPI version](https://badge.fury.io/py/magic-pdf.svg)](https://badge.fury.io/py/magic-pdf)
+[![PyPI version](https://img.shields.io/pypi/v/magic-pdf)](https://pypi.org/project/magic-pdf/)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/magic-pdf)](https://pypi.org/project/magic-pdf/)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf)](https://pepy.tech/project/magic-pdf)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf/month)](https://pepy.tech/project/magic-pdf)

@@ -47,11 +48,20 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
 </div>

 # Changelog
- 2025/04/03 Release of 1.3.0, in this version we made many optimizations and improvements:
+- 2025/04/12 1.3.2 released
+  - Fixed the issue of incompatible dependency package versions when installing in Python 3.13 environment on Windows systems.
+  - Optimized memory usage during batch inference.
+  - Improved the parsing effect of tables rotated by 90 degrees.
+  - Enhanced the parsing accuracy for large tables in financial report samples.
+  - Fixed the occasional word concatenation issue in English text areas when OCR language is not specified.(The model needs to be updated)
+- 2025/04/08 1.3.1 released, fixed some compatibility issues
+  - Supported Python 3.13
+  - Made the final adaptation for some outdated Linux systems (e.g., CentOS 7), and no further support will be guaranteed for subsequent versions. [Installation Instructions](https://github.com/opendatalab/MinerU/issues/1004)
+- 2025/04/03 1.3.0 released, in this version we made many optimizations and improvements:
  - Installation and compatibility optimization
    - By removing the use of `layoutlmv3` in layout, resolved compatibility issues caused by `detectron2`.
    - Torch version compatibility extended to 2.2~2.6 (excluding 2.5).
-    - CUDA compatibility supports 11.8/12.4/12.6 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
+    - CUDA compatibility supports 11.8/12.4/12.6/12.8 (CUDA version determined by torch), resolving compatibility issues for some users with 50-series and H-series GPUs.
    - Python compatible versions expanded to 3.10~3.12, solving the problem of automatic downgrade to 0.6.1 during installation in non-3.10 environments.
    - Offline deployment process optimized; no internet connection required after successful deployment to download any model files.
  - Performance optimization
@@ -64,59 +74,154 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
  - Usability Optimization
    - By using `paddleocr2torch`, completely replaced the use of the `paddle` framework and `paddleocr` in the project, resolving conflicts between `paddle` and `torch`, as well as thread safety issues caused by the `paddle` framework.
    - Added a real-time progress bar during the parsing process to accurately track progress, making the wait less painful.
- 2025/03/03 1.2.1 released, fixed several bugs:
-  - Fixed the impact on punctuation marks during full-width to half-width conversion of letters and numbers
-  - Fixed caption matching inaccuracies in certain scenarios
-  - Fixed formula span loss issues in certain scenarios
- 2025/02/24 1.2.0 released. This version includes several fixes and improvements to enhance parsing efficiency and accuracy:
-  - Performance Optimization
-    - Increased classification speed for PDF documents in auto mode.
-  - Parsing Optimization
-    - Improved parsing logic for documents containing watermarks, significantly enhancing the parsing results for such documents.
-    - Enhanced the matching logic for multiple images/tables and captions within a single page, improving the accuracy of image-text matching in complex layouts.
-  - Bug Fixes
-    - Fixed an issue where image/table spans were incorrectly filled into text blocks under certain conditions.
-    - Resolved an issue where title blocks were empty in some cases.
- 2025/01/22 1.1.0 released. In this version we have focused on improving parsing accuracy and efficiency:
-  - Model capability upgrade (requires re-executing the [model download process](docs/how_to_download_models_en.md) to obtain incremental updates of model files)
-    - The layout recognition model has been upgraded to the latest `doclayout_yolo(2501)` model, improving layout recognition accuracy.
-    - The formula parsing model has been upgraded to the latest `unimernet(2501)` model, improving formula recognition accuracy.
-  - Performance optimization
-    - On devices that meet certain configuration requirements (16GB+ VRAM), by optimizing resource usage and restructuring the processing pipeline, overall parsing speed has been increased by more than 50%.
-  - Parsing effect optimization
-    - Added a new heading classification feature (testing version, enabled by default) to the online demo([mineru.net](https://mineru.net/OpenSourceTools/Extractor)/[huggingface](https://huggingface.co/spaces/opendatalab/MinerU)/[modelscope](https://www.modelscope.cn/studios/OpenDataLab/MinerU)), which supports hierarchical classification of headings, thereby enhancing document structuring.
- 2025/01/10 1.0.1 released. This is our first official release, where we have introduced a completely new API interface and enhanced compatibility through extensive refactoring, as well as a brand new automatic language identification feature:
-  - New API Interface
-    - For the data-side API, we have introduced the Dataset class, designed to provide a robust and flexible data processing framework. This framework currently supports a variety of document formats, including images (.jpg and .png), PDFs, Word documents (.doc and .docx), and PowerPoint presentations (.ppt and .pptx). It ensures effective support for data processing tasks ranging from simple to complex.
-    - For the user-side API, we have meticulously designed the MinerU processing workflow as a series of composable Stages. Each Stage represents a specific processing step, allowing users to define new Stages according to their needs and creatively combine these stages to customize their data processing workflows.
-  - Enhanced Compatibility
-    - By optimizing the dependency environment and configuration items, we ensure stable and efficient operation on ARM architecture Linux systems.
-    - We have deeply integrated with Huawei Ascend NPU acceleration, providing autonomous and controllable high-performance computing capabilities. This supports the localization and development of AI application platforms in China. [Ascend NPU Acceleration](docs/README_Ascend_NPU_Acceleration_zh_CN.md)
-  - Automatic Language Identification
-    - By introducing a new language recognition model, setting the `lang` configuration to `auto` during document parsing will automatically select the appropriate OCR language model, improving the accuracy of scanned document parsing.
- 2024/11/22 0.10.0 released. Introducing hybrid OCR text extraction capabilities,
-  - Significantly improved parsing performance in complex text distribution scenarios such as dense formulas, irregular span regions, and text represented by images.
-  - Combines the dual advantages of accurate content extraction and faster speed in text mode, and more precise span/line region recognition in OCR mode.
- 2024/11/15 0.9.3 released. Integrated [RapidTable](https://github.com/RapidAI/RapidTable) for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.
- 2024/11/06 0.9.2 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
- 2024/10/31 0.9.0 released. This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:
-  - Refactored the sorting module code to use [layoutreader](https://github.com/ppaanngggg/layoutreader) for reading order sorting, ensuring high accuracy in various layouts.
-  - Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.
-  - Refactored the list and table of contents recognition functions, significantly improving the accuracy of list blocks and table of contents blocks, as well as the parsing of corresponding text paragraphs.
-  - Refactored the matching logic for figures, tables, and descriptive text, greatly enhancing the accuracy of matching captions and footnotes to figures and tables, and reducing the loss rate of descriptive text to near zero.
-  - Added multi-language support for OCR, supporting detection and recognition of 84 languages.For the list of supported languages, see [OCR Language Support List](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/multi_languages.html#5-support-languages-and-abbreviations).
-  - Added memory recycling logic and other memory optimization measures, significantly reducing memory usage. The memory requirement for enabling all acceleration features except table acceleration (layout/formula/OCR) has been reduced from 16GB to 8GB, and the memory requirement for enabling all acceleration features has been reduced from 24GB to 10GB.
-  - Optimized configuration file feature switches, adding an independent formula detection switch to significantly improve speed and parsing results when formula detection is not needed.
-  - Integrated [PDF-Extract-Kit 1.0](https://github.com/opendatalab/PDF-Extract-Kit):
-    - Added the self-developed `doclayout_yolo` model, which speeds up processing by more than 10 times compared to the original solution while maintaining similar parsing effects, and can be freely switched with `layoutlmv3` via the configuration file.
-    - Upgraded formula parsing to `unimernet 0.2.1`, improving formula parsing accuracy while significantly reducing memory usage.
-    - Due to the repository change for `PDF-Extract-Kit 1.0`, you need to re-download the model. Please refer to [How to Download Models](docs/how_to_download_models_en.md) for detailed steps.
- 2024/09/27 Version 0.8.1 released, Fixed some bugs, and providing a [localized deployment version](projects/web_demo/README.md) of the [online demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/) and the [front-end interface](projects/web/README.md).
- 2024/09/09: Version 0.8.0 released, supporting fast deployment with Dockerfile, and launching demos on Huggingface and Modelscope.
- 2024/08/30: Version 0.7.1 released, add paddle tablemaster table recognition option
- 2024/08/09: Version 0.7.0b1 released, simplified installation process, added table recognition functionality
- 2024/08/01: Version 0.6.2b1 released, optimized dependency conflict issues and installation documentation
- 2024/07/05: Initial open-source release
+<details>
+<summary>2025/03/03 1.2.1 released</summary>
+<ul>
+  <li>Fixed the impact on punctuation marks during full-width to half-width conversion of letters and numbers</li>
+  <li>Fixed caption matching inaccuracies in certain scenarios</li>
+  <li>Fixed formula span loss issues in certain scenarios</li>
+</ul>
+</details>
+
+<details>
+<summary>2025/02/24 1.2.0 released</summary>
+<p>This version includes several fixes and improvements to enhance parsing efficiency and accuracy:</p>
+<ul>
+  <li><strong>Performance Optimization</strong>
+    <ul>
+      <li>Increased classification speed for PDF documents in auto mode.</li>
+    </ul>
+  </li>
+  <li><strong>Parsing Optimization</strong>
+    <ul>
+      <li>Improved parsing logic for documents containing watermarks, significantly enhancing the parsing results for such documents.</li>
+      <li>Enhanced the matching logic for multiple images/tables and captions within a single page, improving the accuracy of image-text matching in complex layouts.</li>
+    </ul>
+  </li>
+  <li><strong>Bug Fixes</strong>
+    <ul>
+      <li>Fixed an issue where image/table spans were incorrectly filled into text blocks under certain conditions.</li>
+      <li>Resolved an issue where title blocks were empty in some cases.</li>
+    </ul>
+  </li>
+</ul>
+</details>
+
+<details>
+<summary>2025/01/22 1.1.0 released</summary>
+<p>In this version we have focused on improving parsing accuracy and efficiency:</p>
+<ul>
+  <li><strong>Model capability upgrade</strong> (requires re-executing the <a href="https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_en.md">model download process</a> to obtain incremental updates of model files)
+    <ul>
+      <li>The layout recognition model has been upgraded to the latest <code>doclayout_yolo(2501)</code> model, improving layout recognition accuracy.</li>
+      <li>The formula parsing model has been upgraded to the latest <code>unimernet(2501)</code> model, improving formula recognition accuracy.</li>
+    </ul>
+  </li>
+  <li><strong>Performance optimization</strong>
+    <ul>
+      <li>On devices that meet certain configuration requirements (16GB+ VRAM), by optimizing resource usage and restructuring the processing pipeline, overall parsing speed has been increased by more than 50%.</li>
+    </ul>
+  </li>
+  <li><strong>Parsing effect optimization</strong>
+    <ul>
+      <li>Added a new heading classification feature (testing version, enabled by default) to the online demo (<a href="https://mineru.net/OpenSourceTools/Extractor">mineru.net</a>/<a href="https://huggingface.co/spaces/opendatalab/MinerU">huggingface</a>/<a href="https://www.modelscope.cn/studios/OpenDataLab/MinerU">modelscope</a>), which supports hierarchical classification of headings, thereby enhancing document structuring.</li>
+    </ul>
+  </li>
+</ul>
+</details>
+
+<details>
+<summary>2025/01/10 1.0.1 released</summary>
+<p>This is our first official release, where we have introduced a completely new API interface and enhanced compatibility through extensive refactoring, as well as a brand new automatic language identification feature:</p>
+<ul>
+  <li><strong>New API Interface</strong>
+    <ul>
+      <li>For the data-side API, we have introduced the Dataset class, designed to provide a robust and flexible data processing framework. This framework currently supports a variety of document formats, including images (.jpg and .png), PDFs, Word documents (.doc and .docx), and PowerPoint presentations (.ppt and .pptx). It ensures effective support for data processing tasks ranging from simple to complex.</li>
+      <li>For the user-side API, we have meticulously designed the MinerU processing workflow as a series of composable Stages. Each Stage represents a specific processing step, allowing users to define new Stages according to their needs and creatively combine these stages to customize their data processing workflows.</li>
+    </ul>
+  </li>
+  <li><strong>Enhanced Compatibility</strong>
+    <ul>
+      <li>By optimizing the dependency environment and configuration items, we ensure stable and efficient operation on ARM architecture Linux systems.</li>
+      <li>We have deeply integrated with Huawei Ascend NPU acceleration, providing autonomous and controllable high-performance computing capabilities. This supports the localization and development of AI application platforms in China. <a href="https://github.com/opendatalab/MinerU/blob/master/docs/README_Ascend_NPU_Acceleration_zh_CN.md">Ascend NPU Acceleration</a></li>
+    </ul>
+  </li>
+  <li><strong>Automatic Language Identification</strong>
+    <ul>
+      <li>By introducing a new language recognition model, setting the <code>lang</code> configuration to <code>auto</code> during document parsing will automatically select the appropriate OCR language model, improving the accuracy of scanned document parsing.</li>
+    </ul>
+  </li>
+</ul>
+</details>
+
+<details>
+<summary>2024/11/22 0.10.0 released</summary>
+<p>Introducing hybrid OCR text extraction capabilities:</p>
+<ul>
+  <li>Significantly improved parsing performance in complex text distribution scenarios such as dense formulas, irregular span regions, and text represented by images.</li>
+  <li>Combines the dual advantages of accurate content extraction and faster speed in text mode, and more precise span/line region recognition in OCR mode.</li>
+</ul>
+</details>
+
+<details>
+<summary>2024/11/15 0.9.3 released</summary>
+<p>Integrated <a href="https://github.com/RapidAI/RapidTable">RapidTable</a> for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.</p>
+</details>
+
+<details>
+<summary>2024/11/06 0.9.2 released</summary>
+<p>Integrated the <a href="https://huggingface.co/U4R/StructTable-InternVL2-1B">StructTable-InternVL2-1B</a> model for table recognition functionality.</p>
+</details>
+
+<details>
+<summary>2024/10/31 0.9.0 released</summary>
+<p>This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:</p>
+<ul>
+  <li>Refactored the sorting module code to use <a href="https://github.com/ppaanngggg/layoutreader">layoutreader</a> for reading order sorting, ensuring high accuracy in various layouts.</li>
+  <li>Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.</li>
+  <li>Refactored the list and table of contents recognition functions, significantly improving the accuracy of list blocks and table of contents blocks, as well as the parsing of corresponding text paragraphs.</li>
+  <li>Refactored the matching logic for figures, tables, and descriptive text, greatly enhancing the accuracy of matching captions and footnotes to figures and tables, and reducing the loss rate of descriptive text to near zero.</li>
+  <li>Added multi-language support for OCR, supporting detection and recognition of 84 languages. For the list of supported languages, see <a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/multi_languages.html#5-support-languages-and-abbreviations">OCR Language Support List</a>.</li>
+  <li>Added memory recycling logic and other memory optimization measures, significantly reducing memory usage. The memory requirement for enabling all acceleration features except table acceleration (layout/formula/OCR) has been reduced from 16GB to 8GB, and the memory requirement for enabling all acceleration features has been reduced from 24GB to 10GB.</li>
+  <li>Optimized configuration file feature switches, adding an independent formula detection switch to significantly improve speed and parsing results when formula detection is not needed.</li>
+  <li>Integrated <a href="https://github.com/opendatalab/PDF-Extract-Kit">PDF-Extract-Kit 1.0</a>:
+    <ul>
+      <li>Added the self-developed <code>doclayout_yolo</code> model, which speeds up processing by more than 10 times compared to the original solution while maintaining similar parsing effects, and can be freely switched with <code>layoutlmv3</code> via the configuration file.</li>
+      <li>Upgraded formula parsing to <code>unimernet 0.2.1</code>, improving formula parsing accuracy while significantly reducing memory usage.</li>
+      <li>Due to the repository change for <code>PDF-Extract-Kit 1.0</code>, you need to re-download the model. Please refer to <a href="https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_en.md">How to Download Models</a> for detailed steps.</li>
+    </ul>
+  </li>
+</ul>
+</details>
+
+<details>
+<summary>2024/09/27 Version 0.8.1 released</summary>
+<p>Fixed some bugs, and providing a <a href="https://github.com/opendatalab/MinerU/blob/master/projects/web_demo/README.md">localized deployment version</a> of the <a href="https://opendatalab.com/OpenSourceTools/Extractor/PDF/">online demo</a> and the <a href="https://github.com/opendatalab/MinerU/blob/master/projects/web/README.md">front-end interface</a>.</p>
+</details>
+
+<details>
+<summary>2024/09/09 Version 0.8.0 released</summary>
+<p>Supporting fast deployment with Dockerfile, and launching demos on Huggingface and Modelscope.</p>
+</details>
+
+<details>
+<summary>2024/08/30 Version 0.7.1 released</summary>
+<p>Add paddle tablemaster table recognition option</p>
+</details>
+
+<details>
+<summary>2024/08/09 Version 0.7.0b1 released</summary>
+<p>Simplified installation process, added table recognition functionality</p>
+</details>
+
+<details>
+<summary>2024/08/01 Version 0.6.2b1 released</summary>
+<p>Optimized dependency conflict issues and installation documentation</p>
+</details>
+
+<details>
+<summary>2024/07/05 Initial open-source release</summary>
+</details>

 <!-- TABLE OF CONTENT -->

@@ -232,7 +337,7 @@ There are three different ways to experience MinerU:
    </tr>
    <tr>
        <td colspan="3">Python Version</td>
-        <td colspan="3">3.10~3.12</td>
+        <td colspan="3">>=3.10</td>
    </tr>
    <tr>
        <td colspan="3">Nvidia Driver Version</td>
@@ -242,8 +347,8 @@ There are three different ways to experience MinerU:
    </tr>
    <tr>
        <td colspan="3">CUDA Environment</td>
-        <td>11.8/12.4/12.6</td>
-        <td>11.8/12.4/12.6</td>
+        <td>11.8/12.4/12.6/12.8</td>
+        <td>11.8/12.4/12.6/12.8</td>
        <td>None</td>
    </tr>
    <tr>
@@ -274,7 +379,7 @@ Synced with dev branch updates:
 #### 1. Install magic-pdf

 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 pip install -U "magic-pdf[full]"
 ```
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -10,7 +10,8 @@
 [![forks](https://img.shields.io/github/forks/opendatalab/MinerU.svg)](https://github.com/opendatalab/MinerU)
 [![open issues](https://img.shields.io/github/issues-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
 [![issue resolution](https://img.shields.io/github/issues-closed-raw/opendatalab/MinerU)](https://github.com/opendatalab/MinerU/issues)
-[![PyPI version](https://badge.fury.io/py/magic-pdf.svg)](https://badge.fury.io/py/magic-pdf)
+[![PyPI version](https://img.shields.io/pypi/v/magic-pdf)](https://pypi.org/project/magic-pdf/)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/magic-pdf)](https://pypi.org/project/magic-pdf/)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf)](https://pepy.tech/project/magic-pdf)
 [![Downloads](https://static.pepy.tech/badge/magic-pdf/month)](https://pepy.tech/project/magic-pdf)

@@ -46,11 +47,20 @@
 </div>

 # 更新记录
+- 2025/04/12 1.3.2 发布
+  - 修复了windows系统下，在python3.13环境安装时一些依赖包版本不兼容的问题
+  - 优化批量推理时的内存占用
+  - 优化旋转90度表格的解析效果
+  - 优化财报样本中超大表格的解析效果
+  - 修复了在未指定OCR语言时，英文文本区域偶尔出现的单词黏连问题（需要更新模型）
+- 2025/04/08 1.3.1 发布，修复了一些兼容问题
+  - 支持python 3.13
+  - 为部分过时的linux系统（如centos7）做出最后适配，并不再保证后续版本的继续支持，[安装说明](https://github.com/opendatalab/MinerU/issues/1004)
 - 2025/04/03 1.3.0 发布，在这个版本我们做出了许多优化和改进：
  - 安装与兼容性优化
    - 通过移除layout中`layoutlmv3`的使用，解决了由`detectron2`导致的兼容问题
    - torch版本兼容扩展到2.2~2.6(2.5除外)
-    - cuda兼容支持11.8/12.4/12.6（cuda版本由torch决定），解决部分用户50系显卡与H系显卡的兼容问题
+    - cuda兼容支持11.8/12.4/12.6/12.8（cuda版本由torch决定），解决部分用户50系显卡与H系显卡的兼容问题
    - python兼容版本扩展到3.10~3.12，解决了在非3.10环境下安装时自动降级到0.6.1的问题
    - 优化离线部署流程，部署成功后不需要联网下载任何模型文件
  - 性能优化
@@ -63,60 +73,143 @@
  - 易用性优化
    - 通过使用`paddleocr2torch`，完全替代`paddle`框架以及`paddleocr`在项目中的使用，解决了`paddle`和`torch`的冲突问题，和由于`paddle`框架导致的线程不安全问题
    - 解析过程增加实时进度条显示，精准把握解析进度，让等待不再痛苦
- 2025/03/03 1.2.1 发布，修复了一些问题：
-  - 修复在字母与数字的全角转半角操作时对标点符号的影响
-  - 修复在某些情况下caption的匹配不准确问题
-  - 修复在某些情况下的公式span丢失问题
- 2025/02/24 1.2.0 发布，这个版本我们修复了一些问题，提升了解析的效率与精度：
-  - 性能优化 
-    - auto模式下pdf文档的分类速度提升
-    - 在华为昇腾 NPU 加速模式下，添加高性能插件支持，常见场景下端到端加速可达 300% [申请链接](https://aicarrier.feishu.cn/share/base/form/shrcnb10VaoNQB8kQPA8DEfZC6d)
-  - 解析优化
-    - 优化对包含水印文档的解析逻辑，显著提升包含水印文档的解析效果
-    - 改进了单页内多个图像/表格与caption的匹配逻辑，提升了复杂布局下图文匹配的准确性
-  - 问题修复
-    - 修复在某些情况下图片/表格span被填充进textblock导致的异常
-    - 修复在某些情况下标题block为空的问题
- 2025/01/22 1.1.0 发布，在这个版本我们重点提升了解析的精度与效率：
-  - 模型能力升级（需重新执行[模型下载流程](docs/how_to_download_models_zh_cn.md)以获得模型文件的增量更新） 
-    - 布局识别模型升级到最新的`doclayout_yolo(2501)`模型，提升了layout识别精度
-    - 公式解析模型升级到最新的`unimernet(2501)`模型，提升了公式识别精度
-  - 性能优化
-    - 在配置满足一定条件（显存16GB+）的设备上，通过优化资源占用和重构处理流水线，整体解析速度提升50%以上
-  - 解析效果优化
-    - 在线demo（[mineru.net](https://mineru.net/OpenSourceTools/Extractor)/[huggingface](https://huggingface.co/spaces/opendatalab/MinerU)/[modelscope](https://www.modelscope.cn/studios/OpenDataLab/MinerU)）上新增标题分级功能（测试版本，默认开启），支持对标题进行分级，提升文档结构化程度
- 2025/01/10 1.0.1 发布，这是我们的第一个正式版本，在这个版本中，我们通过大量重构带来了全新的API接口和更广泛的兼容性，以及全新的自动语言识别功能：
-  - 全新API接口 
-    - 对于数据侧API，我们引入了Dataset类，旨在提供一个强大而灵活的数据处理框架。该框架当前支持包括图像（.jpg及.png）、PDF、Word（.doc及.docx）、以及PowerPoint（.ppt及.pptx）在内的多种文档格式，确保了从简单到复杂的数据处理任务都能得到有效的支持。
-    - 针对用户侧API，我们将MinerU的处理流程精心设计为一系列可组合的Stage阶段。每个Stage代表了一个特定的处理步骤，用户可以根据自身需求自由地定义新的Stage，并通过创造性地组合这些阶段来定制专属的数据处理流程。
-  - 更广泛的兼容性适配
-    - 通过优化依赖环境和配置项，确保在ARM架构的Linux系统上能够稳定高效运行。
-    - 深度适配华为昇腾NPU加速，积极响应信创要求，提供自主可控的高性能计算能力，助力人工智能应用平台的国产化应用与发展。[NPU加速教程](docs/README_Ascend_NPU_Acceleration_zh_CN.md)
-  - 自动语言识别
-    - 通过引入全新的语言识别模型， 在文档解析中将`lang`配置为`auto`，即可自动选择合适的OCR语言模型，提升扫描类文档解析的准确性。
- 2024/11/22 0.10.0发布，通过引入混合OCR文本提取能力，
-  - 在公式密集、span区域不规范、部分文本使用图像表现等复杂文本分布场景下获得解析效果的显著提升
-  - 同时具备文本模式内容提取准确、速度更快与OCR模式span/line区域识别更准的双重优势
- 2024/11/15 0.9.3发布，为表格识别功能接入了[RapidTable](https://github.com/RapidAI/RapidTable),单表解析速度提升10倍以上，准确率更高，显存占用更低
- 2024/11/06 0.9.2发布，为表格识别功能接入了[StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B)模型
- 2024/10/31 0.9.0发布，这是我们进行了大量代码重构的全新版本，解决了众多问题，提升了性能，降低了硬件需求，并提供了更丰富的易用性：
-  - 重构排序模块代码，使用 [layoutreader](https://github.com/ppaanngggg/layoutreader) 进行阅读顺序排序，确保在各种排版下都能实现极高准确率
-  - 重构段落拼接模块，在跨栏、跨页、跨图、跨表情况下均能实现良好的段落拼接效果
-  - 重构列表和目录识别功能，极大提升列表块和目录块识别的准确率及对应文本段落的解析效果
-  - 重构图、表与描述性文本的匹配逻辑，大幅提升 caption 和 footnote 与图表的匹配准确率，并将描述性文本的丢失率降至接近0
-  - 增加 OCR 的多语言支持，支持 84 种语言的检测与识别，语言支持列表详见 [OCR 语言支持列表](https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/blog/multi_languages.html#5)
-  - 增加显存回收逻辑及其他显存优化措施，大幅降低显存使用需求。开启除表格加速外的全部加速功能(layout/公式/OCR)的显存需求从16GB降至8GB，开启全部加速功能的显存需求从24GB降至10GB
-  - 优化配置文件的功能开关，增加独立的公式检测开关，无需公式检测时可大幅提升速度和解析效果
-  - 集成 [PDF-Extract-Kit 1.0](https://github.com/opendatalab/PDF-Extract-Kit)
-    - 加入自研的 `doclayout_yolo` 模型，在相近解析效果情况下比原方案提速10倍以上，可通过配置文件与 `layoutlmv3` 自由切换
-    - 公式解析升级至 `unimernet 0.2.1`，在提升公式解析准确率的同时，大幅降低显存需求
-    - 因 `PDF-Extract-Kit 1.0` 更换仓库，需要重新下载模型，步骤详见 [如何下载模型](docs/how_to_download_models_zh_cn.md)
- 2024/09/27 0.8.1发布，修复了一些bug，同时提供了[在线demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/)的[本地化部署版本](projects/web_demo/README_zh-CN.md)和[前端界面](projects/web/README_zh-CN.md)
- 2024/09/09 0.8.0发布，支持Dockerfile快速部署，同时上线了huggingface、modelscope demo
- 2024/08/30 0.7.1发布，集成了paddle tablemaster表格识别功能
- 2024/08/09 0.7.0b1发布，简化安装步骤提升易用性，加入表格识别功能
- 2024/08/01 0.6.2b1发布，优化了依赖冲突问题和安装文档
- 2024/07/05 首次开源
+<details>
+<summary>2025/03/03 1.2.1 发布，修复了一些问题</summary>
+<ul>
+    <li>修复在字母与数字的全角转半角操作时对标点符号的影响</li>
+    <li>修复在某些情况下caption的匹配不准确问题</li>
+    <li>修复在某些情况下的公式span丢失问题</li>
+</ul>
+</details>
+
+<details>
+<summary>2025/02/24 1.2.0 发布，这个版本我们修复了一些问题，提升了解析的效率与精度：</summary>
+<ul>
+    <li>性能优化
+        <ul>
+            <li>auto模式下pdf文档的分类速度提升</li>
+        </ul>
+    </li>
+    <li>解析优化
+        <ul>
+            <li>优化对包含水印文档的解析逻辑，显著提升包含水印文档的解析效果</li>
+            <li>改进了单页内多个图像/表格与caption的匹配逻辑，提升了复杂布局下图文匹配的准确性</li>
+        </ul>
+    </li>
+    <li>问题修复
+        <ul>
+            <li>修复在某些情况下图片/表格span被填充进textblock导致的异常</li>
+            <li>修复在某些情况下标题block为空的问题</li>
+        </ul>
+    </li>
+</ul>
+</details>
+
+<details>
+<summary>2025/01/22 1.1.0 发布，在这个版本我们重点提升了解析的精度与效率：</summary>
+<ul>
+    <li>模型能力升级（需重新执行 <a href="https://github.com/opendatalab/MinerU/docs/how_to_download_models_zh_cn.md">模型下载流程</a> 以获得模型文件的增量更新）
+        <ul>
+            <li>布局识别模型升级到最新的 `doclayout_yolo(2501)` 模型，提升了layout识别精度</li>
+            <li>公式解析模型升级到最新的 `unimernet(2501)` 模型，提升了公式识别精度</li>
+        </ul>
+    </li>
+    <li>性能优化
+        <ul>
+            <li>在配置满足一定条件（显存16GB+）的设备上，通过优化资源占用和重构处理流水线，整体解析速度提升50%以上</li>
+        </ul>
+    </li>
+    <li>解析效果优化
+        <ul>
+            <li>在线demo（<a href="https://mineru.net/OpenSourceTools/Extractor">mineru.net</a> / <a href="https://huggingface.co/spaces/opendatalab/MinerU">huggingface</a> / <a href="https://www.modelscope.cn/studios/OpenDataLab/MinerU">modelscope</a>）上新增标题分级功能（测试版本，默认开启），支持对标题进行分级，提升文档结构化程度</li>
+        </ul>
+    </li>
+</ul>
+</details>
+
+<details>
+<summary>2025/01/10 1.0.1 发布，这是我们的第一个正式版本，在这个版本中，我们通过大量重构带来了全新的API接口和更广泛的兼容性，以及全新的自动语言识别功能：</summary>
+<ul>
+    <li>全新API接口
+        <ul>
+            <li>对于数据侧API，我们引入了Dataset类，旨在提供一个强大而灵活的数据处理框架。该框架当前支持包括图像（.jpg及.png）、PDF、Word（.doc及.docx）、以及PowerPoint（.ppt及.pptx）在内的多种文档格式，确保了从简单到复杂的数据处理任务都能得到有效的支持。</li>
+            <li>针对用户侧API，我们将MinerU的处理流程精心设计为一系列可组合的Stage阶段。每个Stage代表了一个特定的处理步骤，用户可以根据自身需求自由地定义新的Stage，并通过创造性地组合这些阶段来定制专属的数据处理流程。</li>
+        </ul>
+    </li>
+    <li>更广泛的兼容性适配
+        <ul>
+            <li>通过优化依赖环境和配置项，确保在ARM架构的Linux系统上能够稳定高效运行。</li>
+            <li>深度适配华为昇腾NPU加速，积极响应信创要求，提供自主可控的高性能计算能力，助力人工智能应用平台的国产化应用与发展。 <a href="https://github.com/opendatalab/MinerU/docs/README_Ascend_NPU_Acceleration_zh_CN.md">NPU加速教程</a></li>
+        </ul>
+    </li>
+    <li>自动语言识别
+        <ul>
+            <li>通过引入全新的语言识别模型， 在文档解析中将 `lang` 配置为 `auto`，即可自动选择合适的OCR语言模型，提升扫描类文档解析的准确性。</li>
+        </ul>
+    </li>
+</ul>
+</details>
+
+<details>
+<summary>2024/11/22 0.10.0发布，通过引入混合OCR文本提取能力，</summary>
+<ul>
+    <li>在公式密集、span区域不规范、部分文本使用图像表现等复杂文本分布场景下获得解析效果的显著提升</li>
+    <li>同时具备文本模式内容提取准确、速度更快与OCR模式span/line区域识别更准的双重优势</li>
+</ul>
+</details>
+
+<details>
+<summary>2024/11/15 0.9.3发布，为表格识别功能接入了<a href="https://github.com/RapidAI/RapidTable">RapidTable</a>,单表解析速度提升10倍以上，准确率更高，显存占用更低</summary>
+</details>
+
+<details>
+<summary>2024/11/06 0.9.2发布，为表格识别功能接入了<a href="https://huggingface.co/U4R/StructTable-InternVL2-1B">StructTable-InternVL2-1B</a>模型</summary>
+</details>
+
+<details>
+<summary>2024/10/31 0.9.0发布，这是我们进行了大量代码重构的全新版本，解决了众多问题，提升了性能，降低了硬件需求，并提供了更丰富的易用性：</summary>
+<ul>
+    <li>重构排序模块代码，使用 <a href="https://github.com/ppaanngggg/layoutreader">layoutreader</a> 进行阅读顺序排序，确保在各种排版下都能实现极高准确率</li>
+    <li>重构段落拼接模块，在跨栏、跨页、跨图、跨表情况下均能实现良好的段落拼接效果</li>
+    <li>重构列表和目录识别功能，极大提升列表块和目录块识别的准确率及对应文本段落的解析效果</li>
+    <li>重构图、表与描述性文本的匹配逻辑，大幅提升 caption 和 footnote 与图表的匹配准确率，并将描述性文本的丢失率降至接近0</li>
+    <li>增加 OCR 的多语言支持，支持 84 种语言的检测与识别，语言支持列表详见 <a href="https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/blog/multi_languages.html#5">OCR 语言支持列表</a></li>
+    <li>增加显存回收逻辑及其他显存优化措施，大幅降低显存使用需求。开启除表格加速外的全部加速功能(layout/公式/OCR)的显存需求从16GB降至8GB，开启全部加速功能的显存需求从24GB降至10GB</li>
+    <li>优化配置文件的功能开关，增加独立的公式检测开关，无需公式检测时可大幅提升速度和解析效果</li>
+    <li>集成 <a href="https://github.com/opendatalab/PDF-Extract-Kit">PDF-Extract-Kit 1.0</a>
+        <ul>
+            <li>加入自研的 `doclayout_yolo` 模型，在相近解析效果情况下比原方案提速10倍以上，可通过配置文件与 `layoutlmv3` 自由切换</li>
+            <li>公式解析升级至 `unimernet 0.2.1`，在提升公式解析准确率的同时，大幅降低显存需求</li>
+            <li>因 `PDF-Extract-Kit 1.0` 更换仓库，需要重新下载模型，步骤详见 <a href="https://github.com/opendatalab/MinerU/docs/how_to_download_models_zh_cn.md">如何下载模型</a></li>
+        </ul>
+    </li>
+</ul>
+</details>
+
+<details>
+<summary>2024/09/27 0.8.1发布，修复了一些bug，同时提供了<a href="https://opendatalab.com/OpenSourceTools/Extractor/PDF/">在线demo</a>的<a href="https://github.com/opendatalab/MinerU/projects/web_demo/README_zh-CN.md">本地化部署版本</a>和<a href="https://github.com/opendatalab/MinerU/projects/web/README_zh-CN.md">前端界面</a></summary>
+</details>
+
+<details>
+<summary>2024/09/09 0.8.0发布，支持Dockerfile快速部署，同时上线了huggingface、modelscope demo</summary>
+</details>
+
+<details>
+<summary>2024/08/30 0.7.1发布，集成了paddle tablemaster表格识别功能</summary>
+</details>
+
+<details>
+<summary>2024/08/09 0.7.0b1发布，简化安装步骤提升易用性，加入表格识别功能</summary>
+</details>
+
+<details>
+<summary>2024/08/01 0.6.2b1发布，优化了依赖冲突问题和安装文档</summary>
+</details>
+
+<details>
+<summary>2024/07/05 首次开源</summary>
+</details>
+

 <!-- TABLE OF CONTENT -->

@@ -233,7 +326,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
    </tr>
    <tr>
        <td colspan="3">python版本</td>
-        <td colspan="3">>=3.9,<=3.12</td>
+        <td colspan="3">>=3.10</td>
    </tr>
    <tr>
        <td colspan="3">Nvidia Driver 版本</td>
@@ -243,8 +336,8 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
    </tr>
    <tr>
        <td colspan="3">CUDA环境</td>
-        <td>11.8/12.4/12.6</td>
-        <td>11.8/12.4/12.6</td>
+        <td>11.8/12.4/12.6/12.8</td>
+        <td>11.8/12.4/12.6/12.8</td>
        <td>None</td>
    </tr>
    <tr>
@@ -279,7 +372,7 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 > 最新版本国内镜像源同步可能会有延迟，请耐心等待

 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 pip install -U "magic-pdf[full]" -i https://mirrors.aliyun.com/pypi/simple
 ```
--- a/docs/README_Ascend_NPU_Acceleration_zh_CN.md
+++ b/docs/README_Ascend_NPU_Acceleration_zh_CN.md
@@ -49,25 +49,3 @@ docker run -it -u root --name mineru-npu --privileged=true \

 magic-pdf --help
 ```
-
-
-## 已知问题
-
- paddleocr使用内嵌onnx模型，仅在默认语言配置下能以较快速度对中英文进行识别
- 自定义lang参数时，paddleocr速度会存在明显下降情况
- layout模型使用layoutlmv3时会发生间歇性崩溃，建议使用默认配置的doclayout_yolo模型
- 表格解析仅适配了rapid_table模型，其他模型可能会无法使用
-
-
-## 高性能模式
-
- 在特定硬件环境可以通过插件开启高性能模式，整体速度相比默认模式提升300%以上
-
-| 系统要求           | 版本/型号        |
-|----------------|--------------|
-| 芯片类型           | 昇腾910B       |
-| CANN版本         | CANN 8.0.RC2 |
-| 驱动版本           | 24.1.rc2.1   |
-| magic-pdf 软件版本 | \> = 1.2.0   |
-
- 高性能插件需满足一定的硬件条件和资质要求，如需申请使用请填写以下表单[MinerU高性能版本合作申请表](https://aicarrier.feishu.cn/share/base/form/shrcnb10VaoNQB8kQPA8DEfZC6d)
--- a/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
+++ b/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
@@ -54,7 +54,7 @@ In the final step, enter `yes`, close the terminal, and reopen it.
 ### 4. Create an Environment Using Conda

 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```

@@ -63,14 +63,13 @@ conda activate mineru
 ```sh
 pip install -U magic-pdf[full]
 ```
-> [!IMPORTANT]
-> After installation, make sure to check the version of `magic-pdf` using the following command:
+> [!TIP]
+> After installation, you can check the version of `magic-pdf` using the following command:
 >
 > ```sh
 > magic-pdf --version
 > ```
->
-> If the version number is less than 1.3.0, please report the issue.
+

 ### 6. Download Models

--- a/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
+++ b/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
@@ -54,7 +54,7 @@ bash Anaconda3-2024.06-1-Linux-x86_64.sh
 ## 4. 使用conda 创建环境

 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```

@@ -64,14 +64,13 @@ conda activate mineru
 pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 ```

-> [!IMPORTANT]
-> 下载完成后，务必通过以下命令确认magic-pdf的版本是否正确
+> [!TIP]
+> 下载完成后，您可以通过以下命令检查`magic-pdf`的版本：
 >
 > ```bash
 > magic-pdf --version
 > ```
->
-> 如果版本号小于1.3.0，请到issue中向我们反馈
+

 ## 6. 下载模型

--- a/docs/README_Windows_CUDA_Acceleration_en_US.md
+++ b/docs/README_Windows_CUDA_Acceleration_en_US.md
@@ -17,7 +17,7 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
 ### 3. Create an Environment Using Conda

 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```

@@ -28,13 +28,12 @@ pip install -U magic-pdf[full]
 ```

 > [!IMPORTANT]
-> After installation, verify the version of `magic-pdf`:
+> After installation, you can check the version of `magic-pdf` using the following command:
 >
 > ```bash
 > magic-pdf --version
 > ```
->
-> If the version number is less than 1.3.0, please report it in the issues section.
+

 ### 5. Download Models

@@ -64,7 +63,7 @@ If your graphics card has at least 6GB of VRAM, follow these steps to test CUDA-
 1. **Overwrite the installation of torch and torchvision** supporting CUDA.(Please select the appropriate index-url based on your CUDA version. For more details, refer to the [PyTorch official website](https://pytorch.org/get-started/locally/).)

   ```
-   pip install --force-reinstall torch==2.6.0 torchvision==0.21.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
+   pip install --force-reinstall torch torchvision "numpy<=2.1.1" --index-url https://download.pytorch.org/whl/cu124
   ```

 2. **Modify the value of `"device-mode"`** in the `magic-pdf.json` configuration file located in your user directory.
--- a/docs/README_Windows_CUDA_Acceleration_zh_CN.md
+++ b/docs/README_Windows_CUDA_Acceleration_zh_CN.md
@@ -18,7 +18,7 @@ https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2024.06-1-Window
 ## 3. 使用conda 创建环境

 ```bash
-conda create -n mineru 'python<3.13' -y
+conda create -n mineru 'python>=3.10' -y
 conda activate mineru
 ```

@@ -29,13 +29,12 @@ pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 ```

 > [!IMPORTANT]
-> 下载完成后，务必通过以下命令确认magic-pdf的版本是否正确
+> 下载完成后，您可以通过以下命令检查magic-pdf的版本
 >
 > ```bash
 > magic-pdf --version
 > ```
->
-> 如果版本号小于 1.3.0 ，请到issue中向我们反馈
+

 ## 5. 下载模型

@@ -65,7 +64,7 @@ pip install -U magic-pdf[full] -i https://mirrors.aliyun.com/pypi/simple
 **1.覆盖安装支持cuda的torch和torchvision**(请根据cuda版本选择合适的index-url，具体可参考[torch官网](https://pytorch.org/get-started/locally/))

 ```bash
-pip install --force-reinstall torch==2.6.0 torchvision==0.21.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
+pip install --force-reinstall torch torchvision "numpy<=2.1.1" --index-url https://download.pytorch.org/whl/cu124
 ```

 **2.修改【用户目录】中配置文件magic-pdf.json中"device-mode"的值**
--- a/docs/how_to_download_models_en.md
+++ b/docs/how_to_download_models_en.md
@@ -18,15 +18,6 @@ The configuration file can be found in the user directory, with the filename `ma

 # How to update models previously downloaded

-## 1. Models downloaded via Git LFS
-
-> [!IMPORTANT]
-> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
->
-> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
-
-When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.
-
-## 2. Models downloaded via Hugging Face or Model Scope
+## 1. Models downloaded via Hugging Face or Model Scope

 If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.
--- a/docs/how_to_download_models_zh_cn.md
+++ b/docs/how_to_download_models_zh_cn.md
@@ -32,16 +32,6 @@ python脚本会自动下载模型文件并配置好配置文件中的模型目

 # 此前下载过模型，如何更新

-## 1. 通过git lfs下载过模型
-
-> [!IMPORTANT]
-> 由于部分用户反馈通过git lfs下载模型文件遇到下载不全和模型文件损坏情况，现已不推荐使用该方式下载。
-> 
-> 0.9.x及以后版本由于PDF-Extract-Kit 1.0更换仓库和新增layout排序模型，不能通过`git pull`命令更新，需要使用python脚本一键更新。
-
-当magic-pdf <= 0.8.1时，如此前通过 git lfs 下载过模型文件，可以进入到之前的下载目录中，通过`git pull`命令更新模型。
-
-
-## 2. 通过 Hugging Face 或 Model Scope 下载过模型
+## 1. 通过 Hugging Face 或 Model Scope 下载过模型

 如此前通过 HuggingFace 或 Model Scope 下载过模型，可以重复执行此前的模型下载python脚本，将会自动将模型目录更新到最新版本。
--- a/magic_pdf/data/batch_build_dataset.py
+++ b/magic_pdf/data/batch_build_dataset.py
@@ -103,54 +103,65 @@ def batch_build_dataset(pdf_paths, k, lang=None):
    all_images : list
        List of all processed images
    """
-    # Get page counts for each PDF
-    pdf_info = []
-    total_pages = 0

+    results = []
    for pdf_path in pdf_paths:
-        try:
-            doc = fitz.open(pdf_path)
-            num_pages = len(doc)
-            pdf_info.append((pdf_path, num_pages))
-            total_pages += num_pages
-            doc.close()
-        except Exception as e:
-            print(f'Error opening {pdf_path}: {e}')
-
-    # Partition the jobs based on page countEach job has 1 page
-    partitions = partition_array_greedy(pdf_info, k)
-
-    # Process each partition in parallel
-    all_images_h = {}
-
-    with concurrent.futures.ProcessPoolExecutor(max_workers=k) as executor:
-        # Submit one task per partition
-        futures = []
-        for sn, partition in enumerate(partitions):
-            # Get the jobs for this partition
-            partition_jobs = [pdf_info[idx] for idx in partition]
-
-            # Submit the task
-            future = executor.submit(
-                process_pdf_batch,
-                partition_jobs,
-                sn
-            )
-            futures.append(future)
-        # Process results as they complete
-        for i, future in enumerate(concurrent.futures.as_completed(futures)):
-            try:
-                idx, images = future.result()
-                all_images_h[idx] = images
-            except Exception as e:
-                print(f'Error processing partition: {e}')
-    results = [None] * len(pdf_paths)
-    for i in range(len(partitions)):
-        partition = partitions[i]
-        for j in range(len(partition)):
-            with open(pdf_info[partition[j]][0], 'rb') as f:
-                pdf_bytes = f.read()
-            dataset = PymuDocDataset(pdf_bytes, lang=lang)
-            dataset.set_images(all_images_h[i][j])
-            results[partition[j]] = dataset
+        with open(pdf_path, 'rb') as f:
+            pdf_bytes = f.read()
+        dataset = PymuDocDataset(pdf_bytes, lang=lang)
+        results.append(dataset)
    return results
+
+
+    #
+    # # Get page counts for each PDF
+    # pdf_info = []
+    # total_pages = 0
+    #
+    # for pdf_path in pdf_paths:
+    #     try:
+    #         doc = fitz.open(pdf_path)
+    #         num_pages = len(doc)
+    #         pdf_info.append((pdf_path, num_pages))
+    #         total_pages += num_pages
+    #         doc.close()
+    #     except Exception as e:
+    #         print(f'Error opening {pdf_path}: {e}')
+    #
+    # # Partition the jobs based on page countEach job has 1 page
+    # partitions = partition_array_greedy(pdf_info, k)
+    #
+    # # Process each partition in parallel
+    # all_images_h = {}
+    #
+    # with concurrent.futures.ProcessPoolExecutor(max_workers=k) as executor:
+    #     # Submit one task per partition
+    #     futures = []
+    #     for sn, partition in enumerate(partitions):
+    #         # Get the jobs for this partition
+    #         partition_jobs = [pdf_info[idx] for idx in partition]
+    #
+    #         # Submit the task
+    #         future = executor.submit(
+    #             process_pdf_batch,
+    #             partition_jobs,
+    #             sn
+    #         )
+    #         futures.append(future)
+    #     # Process results as they complete
+    #     for i, future in enumerate(concurrent.futures.as_completed(futures)):
+    #         try:
+    #             idx, images = future.result()
+    #             all_images_h[idx] = images
+    #         except Exception as e:
+    #             print(f'Error processing partition: {e}')
+    # results = [None] * len(pdf_paths)
+    # for i in range(len(partitions)):
+    #     partition = partitions[i]
+    #     for j in range(len(partition)):
+    #         with open(pdf_info[partition[j]][0], 'rb') as f:
+    #             pdf_bytes = f.read()
+    #         dataset = PymuDocDataset(pdf_bytes, lang=lang)
+    #         dataset.set_images(all_images_h[i][j])
+    #         results[partition[j]] = dataset
+    # return results
--- a/magic_pdf/data/dataset.py
+++ b/magic_pdf/data/dataset.py
@@ -150,7 +150,7 @@ class PymuDocDataset(Dataset):
        elif lang == 'auto':
            from magic_pdf.model.sub_modules.language_detection.utils import \
                auto_detect_lang
-            self._lang = auto_detect_lang(bits)
+            self._lang = auto_detect_lang(self._data_bits)
            logger.info(f'lang: {lang}, detect_lang: {self._lang}')
        else:
            self._lang = lang
@@ -232,7 +232,7 @@ class PymuDocDataset(Dataset):
            self._records[i].set_image(images[i])

 class ImageDataset(Dataset):
-    def __init__(self, bits: bytes):
+    def __init__(self, bits: bytes, lang=None):
        """Initialize the dataset, which wraps the pymudoc documents.

        Args:
@@ -244,6 +244,17 @@ class ImageDataset(Dataset):
        self._raw_data = bits
        self._data_bits = pdf_bytes

+        if lang == '':
+            self._lang = None
+        elif lang == 'auto':
+            from magic_pdf.model.sub_modules.language_detection.utils import \
+                auto_detect_lang
+            self._lang = auto_detect_lang(self._data_bits)
+            logger.info(f'lang: {lang}, detect_lang: {self._lang}')
+        else:
+            self._lang = lang
+            logger.info(f'lang: {lang}')
+
    def __len__(self) -> int:
        """The length of the dataset."""
        return len(self._records)
@@ -394,4 +405,4 @@ class Doc(PageableData):
            fontsize (int): font size of the text
            color (list[float] | None):  three element tuple which describe the RGB of the board line, None will use the default font color!
        """
-        self._doc.insert_text(coord, content, fontsize=fontsize, color=color)
+        self._doc.insert_text(coord, content, fontsize=fontsize, color=color)
--- a/magic_pdf/libs/version.py
+++ b/magic_pdf/libs/version.py
@@ -1 +1 @@
-__version__ = "1.3.0"
+__version__ = "1.3.1"
--- a/magic_pdf/model/batch_analyze.py
+++ b/magic_pdf/model/batch_analyze.py
@@ -30,8 +30,14 @@ class BatchAnalyze:
    
        images_layout_res = []
        layout_start_time = time.time()
-        _, fst_ocr, fst_lang = images_with_extra_info[0]
-        self.model = self.model_manager.get_model(fst_ocr, self.show_log, fst_lang, self.layout_model, self.formula_enable, self.table_enable)
+        self.model = self.model_manager.get_model(
+            ocr=True,
+            show_log=self.show_log,
+            lang = None,
+            layout_model = self.layout_model,
+            formula_enable = self.formula_enable,
+            table_enable = self.table_enable,
+        )

        images = [image for image, _, _ in images_with_extra_info]

@@ -143,14 +149,14 @@ class BatchAnalyze:
                if ocr_res:
                    ocr_result_list = get_ocr_result_list(ocr_res, useful_list, ocr_res_list_dict['ocr_enable'], new_image, _lang)
                    ocr_res_list_dict['layout_res'].extend(ocr_result_list)
-            det_count += len(ocr_res_list_dict['ocr_res_list'])
+
+            # det_count += len(ocr_res_list_dict['ocr_res_list'])
        # logger.info(f'ocr-det time: {round(time.time()-det_start, 2)}, image num: {det_count}')


        # 表格识别 table recognition
        if self.model.apply_table:
            table_start = time.time()
-            table_count = 0
            # for table_res_list_dict in table_res_list_all_page:
            for table_res_dict in tqdm(table_res_list_all_page, desc="Table Predict"):
                _lang = table_res_dict['lang']
@@ -241,7 +247,7 @@ class BatchAnalyze:
                    for index, layout_res_item in enumerate(need_ocr_lists_by_lang[lang]):
                        ocr_text, ocr_score = ocr_res_list[index]
                        layout_res_item['text'] = ocr_text
-                        layout_res_item['score'] = float(round(ocr_score, 2))
+                        layout_res_item['score'] = float(f"{ocr_score:.3f}")

                    total_processed += len(img_crop_list)

--- a/magic_pdf/model/doc_analyze_by_custom_model.py
+++ b/magic_pdf/model/doc_analyze_by_custom_model.py
@@ -146,10 +146,8 @@ def doc_analyze(
            img_dict = page_data.get_image()
            images.append(img_dict['img'])
            page_wh_list.append((img_dict['width'], img_dict['height']))
-    if lang is None or lang == 'auto':
-        images_with_extra_info = [(images[index], ocr, dataset._lang) for index in range(len(dataset))]
-    else:
-        images_with_extra_info = [(images[index], ocr, lang) for index in range(len(dataset))]
+
+    images_with_extra_info = [(images[index], ocr, dataset._lang) for index in range(len(dataset))]

    if len(images) >= MIN_BATCH_INFERENCE_SIZE:
        batch_size = MIN_BATCH_INFERENCE_SIZE
@@ -158,8 +156,8 @@ def doc_analyze(
        batch_images = [images_with_extra_info]

    results = []
-    for sn, batch_image in enumerate(batch_images):
-        _, result = may_batch_image_analyze(batch_image, sn, ocr, show_log,layout_model, formula_enable, table_enable)
+    for batch_image in batch_images:
+        result = may_batch_image_analyze(batch_image, ocr, show_log,layout_model, formula_enable, table_enable)
        results.extend(result)

    model_json = []
@@ -181,7 +179,7 @@ def doc_analyze(

 def batch_doc_analyze(
    datasets: list[Dataset],
-    parse_method: str,
+    parse_method: str = 'auto',
    show_log: bool = False,
    lang=None,
    layout_model=None,
@@ -190,30 +188,37 @@ def batch_doc_analyze(
 ):
    MIN_BATCH_INFERENCE_SIZE = int(os.environ.get('MINERU_MIN_BATCH_INFERENCE_SIZE', 200))
    batch_size = MIN_BATCH_INFERENCE_SIZE
-    images = []
    page_wh_list = []

    images_with_extra_info = []
    for dataset in datasets:
-        for index in range(len(dataset)):
-            if lang is None or lang == 'auto':
-                _lang = dataset._lang
-            else:
-                _lang = lang

+        ocr = False
+        if parse_method == 'auto':
+            if dataset.classify() == SupportedPdfParseMethod.TXT:
+                ocr = False
+            elif dataset.classify() == SupportedPdfParseMethod.OCR:
+                ocr = True
+        elif parse_method == 'ocr':
+            ocr = True
+        elif parse_method == 'txt':
+            ocr = False
+
+        _lang = dataset._lang
+
+        for index in range(len(dataset)):
            page_data = dataset.get_page(index)
            img_dict = page_data.get_image()
-            images.append(img_dict['img'])
            page_wh_list.append((img_dict['width'], img_dict['height']))
-            if parse_method == 'auto':
-                images_with_extra_info.append((images[-1], dataset.classify() == SupportedPdfParseMethod.OCR, _lang))
-            else:
-                images_with_extra_info.append((images[-1], parse_method == 'ocr', _lang))
+            images_with_extra_info.append((img_dict['img'], ocr, _lang))

    batch_images = [images_with_extra_info[i:i+batch_size] for i in range(0, len(images_with_extra_info), batch_size)]
    results = []
-    for sn, batch_image in enumerate(batch_images):
-        _, result = may_batch_image_analyze(batch_image, sn, True, show_log, layout_model, formula_enable, table_enable)
+    processed_images_count = 0
+    for index, batch_image in enumerate(batch_images):
+        processed_images_count += len(batch_image)
+        logger.info(f'Batch {index + 1}/{len(batch_images)}: {processed_images_count} pages/{len(images_with_extra_info)} pages')
+        result = may_batch_image_analyze(batch_image, True, show_log, layout_model, formula_enable, table_enable)
        results.extend(result)

    infer_results = []
@@ -233,7 +238,6 @@ def batch_doc_analyze(

 def may_batch_image_analyze(
        images_with_extra_info: list[(np.ndarray, bool, str)],
-        idx: int,
        ocr: bool,
        show_log: bool = False,
        layout_model=None,
@@ -255,8 +259,9 @@ def may_batch_image_analyze(
            torch.npu.set_compile_mode(jit_compile=False)

    if str(device).startswith('npu') or str(device).startswith('cuda'):
-        gpu_memory = int(os.getenv('VIRTUAL_VRAM_SIZE', round(get_vram(device))))
-        if gpu_memory is not None:
+        vram = get_vram(device)
+        if vram is not None:
+            gpu_memory = int(os.getenv('VIRTUAL_VRAM_SIZE', round(vram)))
            if gpu_memory >= 16:
                batch_ratio = 16
            elif gpu_memory >= 12:
@@ -268,6 +273,10 @@ def may_batch_image_analyze(
            else:
                batch_ratio = 1
            logger.info(f'gpu_memory: {gpu_memory} GB, batch_ratio: {batch_ratio}')
+        else:
+            # Default batch_ratio when VRAM can't be determined
+            batch_ratio = 1
+            logger.info(f'Could not determine GPU memory, using default batch_ratio: {batch_ratio}')


    # doc_analyze_start = time.time()
@@ -286,4 +295,4 @@ def may_batch_image_analyze(
    #     f'doc analyze time: {round(time.time() - doc_analyze_start, 2)},'
    #     f' speed: {doc_analyze_speed} pages/second'
    # )
-    return idx, results
+    return results
--- a/magic_pdf/model/sub_modules/model_utils.py
+++ b/magic_pdf/model/sub_modules/model_utils.py
@@ -29,22 +29,204 @@ def crop_img(input_res, input_np_img, crop_paste_x=0, crop_paste_y=0):
    return return_image, return_list


-# Select regions for OCR / formula regions / table regions
-def get_res_list_from_layout_res(layout_res):
+def get_coords_and_area(table):
+    """Extract coordinates and area from a table."""
+    xmin, ymin = int(table['poly'][0]), int(table['poly'][1])
+    xmax, ymax = int(table['poly'][4]), int(table['poly'][5])
+    area = (xmax - xmin) * (ymax - ymin)
+    return xmin, ymin, xmax, ymax, area
+
+
+def calculate_intersection(box1, box2):
+    """Calculate intersection coordinates between two boxes."""
+    intersection_xmin = max(box1[0], box2[0])
+    intersection_ymin = max(box1[1], box2[1])
+    intersection_xmax = min(box1[2], box2[2])
+    intersection_ymax = min(box1[3], box2[3])
+
+    # Check if intersection is valid
+    if intersection_xmax <= intersection_xmin or intersection_ymax <= intersection_ymin:
+        return None
+
+    return intersection_xmin, intersection_ymin, intersection_xmax, intersection_ymax
+
+
+def calculate_iou(box1, box2):
+    """Calculate IoU between two boxes."""
+    intersection = calculate_intersection(box1[:4], box2[:4])
+
+    if not intersection:
+        return 0
+
+    intersection_xmin, intersection_ymin, intersection_xmax, intersection_ymax = intersection
+    intersection_area = (intersection_xmax - intersection_xmin) * (intersection_ymax - intersection_ymin)
+
+    area1, area2 = box1[4], box2[4]
+    union_area = area1 + area2 - intersection_area
+
+    return intersection_area / union_area if union_area > 0 else 0
+
+
+def is_inside(small_box, big_box, overlap_threshold=0.8):
+    """Check if small_box is inside big_box by at least overlap_threshold."""
+    intersection = calculate_intersection(small_box[:4], big_box[:4])
+
+    if not intersection:
+        return False
+
+    intersection_xmin, intersection_ymin, intersection_xmax, intersection_ymax = intersection
+    intersection_area = (intersection_xmax - intersection_xmin) * (intersection_ymax - intersection_ymin)
+
+    # Check if overlap exceeds threshold
+    return intersection_area >= overlap_threshold * small_box[4]
+
+
+def do_overlap(box1, box2):
+    """Check if two boxes overlap."""
+    return calculate_intersection(box1[:4], box2[:4]) is not None
+
+
+def merge_high_iou_tables(table_res_list, layout_res, table_indices, iou_threshold=0.7):
+    """Merge tables with IoU > threshold."""
+    if len(table_res_list) < 2:
+        return table_res_list, table_indices
+
+    table_info = [get_coords_and_area(table) for table in table_res_list]
+    merged = True
+
+    while merged:
+        merged = False
+        i = 0
+        while i < len(table_res_list) - 1:
+            j = i + 1
+            while j < len(table_res_list):
+                iou = calculate_iou(table_info[i], table_info[j])
+
+                if iou > iou_threshold:
+                    # Merge tables by taking their union
+                    x1_min, y1_min, x1_max, y1_max, _ = table_info[i]
+                    x2_min, y2_min, x2_max, y2_max, _ = table_info[j]
+
+                    union_xmin = min(x1_min, x2_min)
+                    union_ymin = min(y1_min, y2_min)
+                    union_xmax = max(x1_max, x2_max)
+                    union_ymax = max(y1_max, y2_max)
+
+                    # Create merged table
+                    merged_table = table_res_list[i].copy()
+                    merged_table['poly'][0] = union_xmin
+                    merged_table['poly'][1] = union_ymin
+                    merged_table['poly'][2] = union_xmax
+                    merged_table['poly'][3] = union_ymin
+                    merged_table['poly'][4] = union_xmax
+                    merged_table['poly'][5] = union_ymax
+                    merged_table['poly'][6] = union_xmin
+                    merged_table['poly'][7] = union_ymax
+
+                    # Update layout_res
+                    to_remove = [table_indices[j], table_indices[i]]
+                    for idx in sorted(to_remove, reverse=True):
+                        del layout_res[idx]
+                    layout_res.append(merged_table)
+
+                    # Update tracking lists
+                    table_indices = [k if k < min(to_remove) else
+                                     k - 1 if k < max(to_remove) else
+                                     k - 2 if k > max(to_remove) else
+                                     len(layout_res) - 1
+                                     for k in table_indices
+                                     if k not in to_remove]
+                    table_indices.append(len(layout_res) - 1)
+
+                    # Update table lists
+                    table_res_list.pop(j)
+                    table_res_list.pop(i)
+                    table_res_list.append(merged_table)
+
+                    # Update table_info
+                    table_info = [get_coords_and_area(table) for table in table_res_list]
+
+                    merged = True
+                    break
+                j += 1
+
+            if merged:
+                break
+            i += 1
+
+    return table_res_list, table_indices
+
+
+def filter_nested_tables(table_res_list, overlap_threshold=0.8, area_threshold=0.8):
+    """Remove big tables containing multiple smaller tables within them."""
+    if len(table_res_list) < 3:
+        return table_res_list
+
+    table_info = [get_coords_and_area(table) for table in table_res_list]
+    big_tables_idx = []
+
+    for i in range(len(table_res_list)):
+        # Find tables inside this one
+        tables_inside = [j for j in range(len(table_res_list))
+                         if i != j and is_inside(table_info[j], table_info[i], overlap_threshold)]
+
+        # Continue if there are at least 2 tables inside
+        if len(tables_inside) >= 2:
+            # Check if inside tables overlap with each other
+            tables_overlap = any(do_overlap(table_info[tables_inside[idx1]], table_info[tables_inside[idx2]])
+                                 for idx1 in range(len(tables_inside))
+                                 for idx2 in range(idx1 + 1, len(tables_inside)))
+
+            # If no overlaps, check area condition
+            if not tables_overlap:
+                total_inside_area = sum(table_info[j][4] for j in tables_inside)
+                big_table_area = table_info[i][4]
+
+                if total_inside_area > area_threshold * big_table_area:
+                    big_tables_idx.append(i)
+
+    return [table for i, table in enumerate(table_res_list) if i not in big_tables_idx]
+
+
+def get_res_list_from_layout_res(layout_res, iou_threshold=0.7, overlap_threshold=0.8, area_threshold=0.8):
+    """Extract OCR, table and other regions from layout results."""
    ocr_res_list = []
    table_res_list = []
+    table_indices = []
    single_page_mfdetrec_res = []
-    for res in layout_res:
-        if int(res['category_id']) in [13, 14]:
+
+    # Categorize regions
+    for i, res in enumerate(layout_res):
+        category_id = int(res['category_id'])
+
+        if category_id in [13, 14]:  # Formula regions
            single_page_mfdetrec_res.append({
                "bbox": [int(res['poly'][0]), int(res['poly'][1]),
                         int(res['poly'][4]), int(res['poly'][5])],
            })
-        elif int(res['category_id']) in [0, 1, 2, 4, 6, 7]:
+        elif category_id in [0, 1, 2, 4, 6, 7]:  # OCR regions
            ocr_res_list.append(res)
-        elif int(res['category_id']) in [5]:
+        elif category_id == 5:  # Table regions
            table_res_list.append(res)
-    return ocr_res_list, table_res_list, single_page_mfdetrec_res
+            table_indices.append(i)
+
+    # Process tables: merge high IoU tables first, then filter nested tables
+    table_res_list, table_indices = merge_high_iou_tables(
+        table_res_list, layout_res, table_indices, iou_threshold)
+
+    filtered_table_res_list = filter_nested_tables(
+        table_res_list, overlap_threshold, area_threshold)
+
+    # Remove filtered out tables from layout_res
+    if len(filtered_table_res_list) < len(table_res_list):
+        kept_tables = set(id(table) for table in filtered_table_res_list)
+        to_remove = [table_indices[i] for i, table in enumerate(table_res_list)
+                     if id(table) not in kept_tables]
+
+        for idx in sorted(to_remove, reverse=True):
+            del layout_res[idx]
+
+    return ocr_res_list, filtered_table_res_list, single_page_mfdetrec_res


 def clean_vram(device, vram_threshold=8):
@@ -57,7 +239,7 @@ def clean_vram(device, vram_threshold=8):


 def get_vram(device):
-    if torch.cuda.is_available() and device != 'cpu':
+    if torch.cuda.is_available() and str(device).startswith("cuda"):
        total_memory = torch.cuda.get_device_properties(device).total_memory / (1024 ** 3)  # 将字节转换为 GB
        return total_memory
    elif str(device).startswith("npu"):
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/models_config.yml
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/models_config.yml
@@ -1,8 +1,12 @@
 lang:
-  ch:
+  ch_lite:
    det: ch_PP-OCRv3_det_infer.pth
    rec: ch_PP-OCRv4_rec_infer.pth
    dict: ppocr_keys_v1.txt
+  ch:
+    det: ch_PP-OCRv3_det_infer.pth
+    rec: ch_PP-OCRv4_rec_server_infer.pth
+    dict: ppocr_keys_v1.txt
  en:
    det: en_PP-OCRv3_det_infer.pth
    rec: en_PP-OCRv4_rec_infer.pth
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/tools/infer/predict_rec.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/tools/infer/predict_rec.py
@@ -437,4 +437,10 @@ class TextRecognizer(BaseOCRV20):
                index += 1
                pbar.update(current_batch_size)

+        # Fix NaN values in recognition results
+        for i in range(len(rec_res)):
+            text, score = rec_res[i]
+            if isinstance(score, float) and math.isnan(score):
+                rec_res[i] = (text, 0.0)
+
        return rec_res, elapse
--- a/magic_pdf/model/sub_modules/table/rapidtable/rapid_table.py
+++ b/magic_pdf/model/sub_modules/table/rapidtable/rapid_table.py
@@ -1,3 +1,5 @@
+import os
+from pathlib import Path
 import cv2
 import numpy as np
 import torch
@@ -17,7 +19,9 @@ class RapidTableModel(object):
            if torch.cuda.is_available() and table_sub_model_name == "unitable":
                input_args = RapidTableInput(model_type=table_sub_model_name, use_cuda=True, device=get_device())
            else:
-                input_args = RapidTableInput(model_type=table_sub_model_name)
+                root_dir = Path(__file__).absolute().parent.parent.parent.parent.parent
+                slanet_plus_model_path = os.path.join(root_dir, 'resources', 'slanet_plus', 'slanet-plus.onnx')
+                input_args = RapidTableInput(model_type=table_sub_model_name, model_path=slanet_plus_model_path)
        else:
            raise ValueError(f"Invalid table_sub_model_name: {table_sub_model_name}. It must be one of {sub_model_list}")

@@ -31,26 +35,63 @@ class RapidTableModel(object):
        #     from rapidocr_onnxruntime import RapidOCR
        #     self.ocr_engine = RapidOCR()

-        self.ocr_model_name = "PaddleOCR"
+        # self.ocr_model_name = "PaddleOCR"
        self.ocr_engine = ocr_engine


    def predict(self, image):
+        bgr_image = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)

-        if self.ocr_model_name == "RapidOCR":
-            ocr_result, _ = self.ocr_engine(np.asarray(image))
-        elif self.ocr_model_name == "PaddleOCR":
-            bgr_image = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)
-            ocr_result = self.ocr_engine.ocr(bgr_image)[0]
-            if ocr_result:
-                ocr_result = [[item[0], item[1][0], item[1][1]] for item in ocr_result if
-                          len(item) == 2 and isinstance(item[1], tuple)]
-            else:
-                ocr_result = None
+        # First check the overall image aspect ratio (height/width)
+        img_height, img_width = bgr_image.shape[:2]
+        img_aspect_ratio = img_height / img_width if img_width > 0 else 1.0
+        img_is_portrait = img_aspect_ratio > 1.2
+
+        if img_is_portrait:
+
+            det_res = self.ocr_engine.ocr(bgr_image, rec=False)[0]
+            # Check if table is rotated by analyzing text box aspect ratios
+            is_rotated = False
+            if det_res:
+                vertical_count = 0
+
+                for box_ocr_res in det_res:
+                    p1, p2, p3, p4 = box_ocr_res
+
+                    # Calculate width and height
+                    width = p3[0] - p1[0]
+                    height = p3[1] - p1[1]
+
+                    aspect_ratio = width / height if height > 0 else 1.0
+
+                    # Count vertical vs horizontal text boxes
+                    if aspect_ratio < 0.8:  # Taller than wide - vertical text
+                        vertical_count += 1
+                    # elif aspect_ratio > 1.2:  # Wider than tall - horizontal text
+                    #     horizontal_count += 1
+
+                # If we have more vertical text boxes than horizontal ones,
+                # and vertical ones are significant, table might be rotated
+                if vertical_count >= len(det_res) * 0.3:
+                    is_rotated = True
+
+                # logger.debug(f"Text orientation analysis: vertical={vertical_count}, det_res={len(det_res)}, rotated={is_rotated}")
+
+            # Rotate image if necessary
+            if is_rotated:
+                # logger.debug("Table appears to be in portrait orientation, rotating 90 degrees clockwise")
+                image = cv2.rotate(np.asarray(image), cv2.ROTATE_90_CLOCKWISE)
+                bgr_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
+
+        # Continue with OCR on potentially rotated image
+        ocr_result = self.ocr_engine.ocr(bgr_image)[0]
+        if ocr_result:
+            ocr_result = [[item[0], item[1][0], item[1][1]] for item in ocr_result if
+                      len(item) == 2 and isinstance(item[1], tuple)]
        else:
-            logger.error("OCR model not supported")
            ocr_result = None

+
        if ocr_result:
            table_results = self.table_model(np.asarray(image), ocr_result)
            html_code = table_results.pred_html
--- a/magic_pdf/pdf_parse_union_core_v2.py
+++ b/magic_pdf/pdf_parse_union_core_v2.py
@@ -997,7 +997,7 @@ def pdf_parse_union(
        for index, span in enumerate(need_ocr_list):
            ocr_text, ocr_score = ocr_res_list[index]
            span['content'] = ocr_text
-            span['score'] = float(round(ocr_score, 2))
+            span['score'] = float(f"{ocr_score:.3f}")
        # rec_time = time.time() - rec_start
        # logger.info(f'ocr-dynamic-rec time: {round(rec_time, 2)}, total images processed: {len(img_crop_list)}')

--- a/magic_pdf/resources/slanet_plus/slanet-plus.onnx
+++ b/magic_pdf/resources/slanet_plus/slanet-plus.onnx
--- a/magic_pdf/tools/common.py
+++ b/magic_pdf/tools/common.py
@@ -109,9 +109,7 @@ def _do_parse(
    pdf_bytes = ds._raw_data
    local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, parse_method)

-    image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(
-        local_md_dir
-    )
+    image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
    image_dir = str(os.path.basename(local_image_dir))

    if len(model_list) == 0:
@@ -317,7 +315,26 @@ def batch_do_parse(

    infer_results = batch_doc_analyze(dss, parse_method, lang=lang, layout_model=layout_model, formula_enable=formula_enable, table_enable=table_enable)
    for idx, infer_result in enumerate(infer_results):
-        _do_parse(output_dir, pdf_file_names[idx], dss[idx], infer_result.get_infer_res(), parse_method, debug_able, f_draw_span_bbox=f_draw_span_bbox, f_draw_layout_bbox=f_draw_layout_bbox, f_dump_md=f_dump_md, f_dump_middle_json=f_dump_middle_json, f_dump_model_json=f_dump_model_json, f_dump_orig_pdf=f_dump_orig_pdf, f_dump_content_list=f_dump_content_list, f_make_md_mode=f_make_md_mode, f_draw_model_bbox=f_draw_model_bbox, f_draw_line_sort_bbox=f_draw_line_sort_bbox, f_draw_char_bbox=f_draw_char_bbox, lang=lang)
+        _do_parse(
+            output_dir = output_dir,
+            pdf_file_name = pdf_file_names[idx],
+            pdf_bytes_or_dataset = dss[idx],
+            model_list = infer_result.get_infer_res(),
+            parse_method = parse_method,
+            debug_able = debug_able,
+            f_draw_span_bbox = f_draw_span_bbox,
+            f_draw_layout_bbox = f_draw_layout_bbox,
+            f_dump_md=f_dump_md,
+            f_dump_middle_json=f_dump_middle_json,
+            f_dump_model_json=f_dump_model_json,
+            f_dump_orig_pdf=f_dump_orig_pdf,
+            f_dump_content_list=f_dump_content_list,
+            f_make_md_mode=MakeMode.MM_MD,
+            f_draw_model_bbox=f_draw_model_bbox,
+            f_draw_line_sort_bbox=f_draw_line_sort_bbox,
+            f_draw_char_bbox=f_draw_char_bbox,
+            lang=lang,
+        )


 parse_pdf_methods = click.Choice(['ocr', 'txt', 'auto'])
--- a/next_docs/en/user_guide/install/boost_with_cuda.rst
+++ b/next_docs/en/user_guide/install/boost_with_cuda.rst
@@ -80,7 +80,7 @@ Specify Python version 3.10.

 .. code:: sh

-    conda create -n mineru 'python<3.13' -y
+    conda create -n mineru 'python>=3.10' -y
    conda activate mineru

 5. Install Applications
@@ -90,16 +90,15 @@ Specify Python version 3.10.

   pip install -U magic-pdf[full]

-.. admonition:: Important
+.. admonition:: TIP
    :class: tip

-    ❗ After installation, make sure to check the version of ``magic-pdf`` using the following command:
+    After installation, you can check the version of ``magic-pdf`` using the following command:

 .. code:: sh

   magic-pdf --version

-If the version number is less than 1.3.0, please report the issue.

 6. Download Models
 ~~~~~~~~~~~~~~~~~~
@@ -178,7 +177,7 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86

 ::

-    conda create -n mineru 'python<3.13' -y
+    conda create -n mineru 'python>=3.10' -y
    conda activate mineru

 4. Install Applications
@@ -188,16 +187,15 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86

   pip install -U magic-pdf[full]

-.. admonition:: Important
+.. admonition:: Tip
    :class: tip

-    ❗️After installation, verify the version of ``magic-pdf``:
+    After installation, you can check the version of ``magic-pdf``:

    .. code:: bash

      magic-pdf --version

-    If the version number is less than 1.3.0, please report it in the issues section.

 5. Download Models
 ~~~~~~~~~~~~~~~~~~
@@ -237,7 +235,7 @@ test CUDA-accelerated parsing performance.

 .. code:: sh

-   pip install --force-reinstall torch==2.6.0 torchvision==0.21.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu124
+   pip install --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/cu124


 2. **Modify the value of ``"device-mode"``** in the ``magic-pdf.json``
--- a/next_docs/en/user_guide/install/config.rst
+++ b/next_docs/en/user_guide/install/config.rst
@@ -28,7 +28,7 @@ magic-pdf.json
        "layoutreader-model-dir":"/tmp/layoutreader",
        "device-mode":"cpu",
        "layout-config": {
-            "model": "layoutlmv3"
+            "model": "doclayout_yolo"
        },
        "formula-config": {
            "mfd_model": "yolo_v8_mfd",
@@ -37,7 +37,7 @@ magic-pdf.json
        },
        "table-config": {
            "model": "rapid_table",
-            "enable": false,
+            "enable": true,
            "max_time": 400    
        },
        "config_version": "1.0.0"
@@ -88,10 +88,10 @@ layout-config
 .. code:: json

    {
-        "model": "layoutlmv3"  
+        "model": "doclayout_yolo"
    }

-layout model can not be disabled now, And we have only kind of layout model currently.
+layout model can not be disabled now.


 formula-config
@@ -132,14 +132,14 @@ table-config

   {
        "model": "rapid_table",
-        "enable": false,
+        "enable": true,
        "max_time": 400    
    }

 model
 """"""""

-Specify the table inference model, options are ['rapid_table', 'tablemaster', 'struct_eqtable']
+Specify the table inference model, options are ['rapid_table']


 max_time
--- a/next_docs/en/user_guide/install/download_model_weight_files.rst
+++ b/next_docs/en/user_guide/install/download_model_weight_files.rst
@@ -29,18 +29,7 @@ filename ``magic-pdf.json``.
 How to update models previously downloaded
 -----------------------------------------

-1. Models downloaded via Git LFS
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-   Due to feedback from some users that downloading model files using
-   git lfs was incomplete or resulted in corrupted model files, this
-   method is no longer recommended.
-
-If you previously downloaded model files via git lfs, you can navigate
-to the previous download directory and use the ``git pull`` command to
-update the model.
-
-2. Models downloaded via Hugging Face or Model Scope
+1. Models downloaded via Hugging Face or Model Scope
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 If you previously downloaded models via Hugging Face or Model Scope, you
--- a/next_docs/en/user_guide/install/install.rst
+++ b/next_docs/en/user_guide/install/install.rst
@@ -71,8 +71,8 @@ Also you can try `online demo <https://www.modelscope.cn/studios/OpenDataLab/Min
    </tr>
    <tr>
        <td colspan="3">CUDA Environment</td>
-        <td>11.8/12.4/12.6</td>
-        <td>11.8/12.4/12.6</td>
+        <td>11.8/12.4/12.6/12.8</td>
+        <td>11.8/12.4/12.6/12.8</td>
        <td>None</td>
    </tr>
    <tr>
@@ -97,7 +97,7 @@ Create an environment

 .. code-block:: shell

-    conda create -n mineru 'python<3.13' -y
+    conda create -n mineru 'python>=3.10' -y
    conda activate mineru
    pip install -U "magic-pdf[full]"

--- a/projects/gradio_app/app.py
+++ b/projects/gradio_app/app.py
@@ -159,9 +159,12 @@ devanagari_lang = [
        'sa', 'bgc'
 ]
 other_lang = ['ch', 'en', 'korean', 'japan', 'chinese_cht', 'ta', 'te', 'ka']
+add_lang = ['latin', 'arabic', 'cyrillic', 'devanagari']

-all_lang = ['', 'auto']
-all_lang.extend([*other_lang, *latin_lang, *arabic_lang, *cyrillic_lang, *devanagari_lang])
+# all_lang = ['', 'auto']
+all_lang = []
+# all_lang.extend([*other_lang, *latin_lang, *arabic_lang, *cyrillic_lang, *devanagari_lang])
+all_lang.extend([*other_lang, *add_lang])


 def to_pdf(file_path):
@@ -192,8 +195,8 @@ if __name__ == '__main__':
                file = gr.File(label='Please upload a PDF or image', file_types=['.pdf', '.png', '.jpeg', '.jpg'])
                max_pages = gr.Slider(1, 20, 10, step=1, label='Max convert pages')
                with gr.Row():
-                    layout_mode = gr.Dropdown(['layoutlmv3', 'doclayout_yolo'], label='Layout model', value='doclayout_yolo')
-                    language = gr.Dropdown(all_lang, label='Language', value='auto')
+                    layout_mode = gr.Dropdown(['doclayout_yolo'], label='Layout model', value='doclayout_yolo')
+                    language = gr.Dropdown(all_lang, label='Language', value='ch')
                with gr.Row():
                    formula_enable = gr.Checkbox(label='Enable formula recognition', value=True)
                    is_ocr = gr.Checkbox(label='Force enable OCR', value=False)
--- a/requirements.txt
+++ b/requirements.txt
@@ -7,9 +7,9 @@ numpy>=1.21.6
 pydantic>=2.7.2,<2.11
 PyMuPDF>=1.24.9,<1.25.0
 scikit-learn>=1.0.2
-torch>=2.2.2,!=2.5.0,!=2.5.1,<=2.6.0
+torch>=2.2.2,!=2.5.0,!=2.5.1
 torchvision
-transformers>=4.49.0,<5.0.0
+transformers>=4.49.0,!=4.51.0,<5.0.0
 pdfminer.six==20231228
 tqdm>=4.67.1
 # The requirements.txt must ensure that only necessary external dependencies are introduced. If there are new dependencies to add, please contact the project administrator.
--- a/setup.py
+++ b/setup.py
@@ -26,6 +26,7 @@ if __name__ == '__main__':
    setup(
        name="magic_pdf",  # 项目名
        version=__version__,  # 自动从tag中获取版本号
+        license="AGPL-3.0",
        packages=find_packages() + ["magic_pdf.resources"] + ["magic_pdf.model.sub_modules.ocr.paddleocr2pytorch.pytorchocr.utils.resources"],  # 包含所有的包
        package_data={
            "magic_pdf.resources": ["**"],  # 包含magic_pdf.resources目录下的所有文件
@@ -33,33 +34,54 @@ if __name__ == '__main__':
        },
        install_requires=parse_requirements('requirements.txt'),  # 项目依赖的第三方库
        extras_require={
-            "lite": ["paddleocr==2.7.3",
-                     "paddlepaddle==3.0.0b1;platform_system=='Linux'",
-                     "paddlepaddle==2.6.1;platform_system=='Windows' or platform_system=='Darwin'",
-                     ],
+            "lite": [
+                    "paddleocr==2.7.3",
+                    "paddlepaddle==3.0.0b1;platform_system=='Linux'",
+                    "paddlepaddle==2.6.1;platform_system=='Windows' or platform_system=='Darwin'",
+            ],
            "full": [
-                     "matplotlib<=3.9.0;platform_system=='Windows'",  # 3.9.1及之后不提供windows的预编译包，避免一些没有编译环境的windows设备安装失败
-                     "matplotlib>=3.10;platform_system=='Linux' or platform_system=='Darwin'",  # linux 和 macos 不应限制matplotlib的最高版本，以避免无法更新导致的一些bug
-                     "ultralytics>=8.3.48",  # yolov8,公式检测
+                     "matplotlib>=3.10,<4",
+                     "ultralytics>=8.3.48,<9",  # yolov8,公式检测
                     "doclayout_yolo==0.0.2b1",  # doclayout_yolo
                     "dill>=0.3.9,<1",  # doclayout_yolo
-                     "rapid_table>=1.0.3,<2.0.0",  # rapid_table
+                     "rapid_table>=1.0.5,<2.0.0",  # rapid_table
                     "PyYAML>=6.0.2,<7",  # yaml
-                     "ftfy>=6.3.1,<7", # unimernet_hf
+                     "ftfy>=6.3.1,<7",  # unimernet_hf
                     "openai>=1.70.0,<2",  # openai SDK
                     "shapely>=2.0.7,<3",  # imgaug-paddleocr2pytorch
                     "pyclipper>=1.3.0,<2",  # paddleocr2pytorch
                     "omegaconf>=2.3.0,<3",  # paddleocr2pytorch
-                     ],
-            "old_linux":[
-                "albumentations<=1.4.20", # 1.4.21引入的simsimd不支持2019年及更早的linux系统
+            ],
+            "full_old_linux": [
+                    "matplotlib>=3.10,<=3.10.1",
+                    "ultralytics>=8.3.48,<=8.3.104",  # yolov8,公式检测
+                    "doclayout_yolo==0.0.2b1",  # doclayout_yolo
+                    "dill==0.3.9",  # doclayout_yolo
+                    "PyYAML==6.0.2",  # yaml
+                    "ftfy==6.3.1",  # unimernet_hf
+                    "openai==1.71.0",  # openai SDK
+                    "shapely==2.1.0",  # imgaug-paddleocr2pytorch
+                    "pyclipper==1.3.0.post6",  # paddleocr2pytorch
+                    "omegaconf==2.3.0",  # paddleocr2pytorch
+                    "albumentations==1.4.20", # 1.4.21引入的simsimd不支持2019年及更早的linux系统
+                    "rapid_table==1.0.3",  # rapid_table新版本依赖的onnxruntime不支持2019年及更早的linux系统
            ],
        },
        description="A practical tool for converting PDF to Markdown",  # 简短描述
        long_description=long_description,  # 详细描述
        long_description_content_type="text/markdown",  # 如果README是Markdown格式
-        url="https://github.com/opendatalab/MinerU",
-        python_requires=">=3.9",  # 项目依赖的 Python 版本
+        project_urls={
+            "Home": "https://mineru.net/",
+            "Repository": "https://github.com/opendatalab/MinerU",
+        },
+        keywords=["magic-pdf, mineru, MinerU, convert, pdf, markdown"],
+        classifiers=[
+            "Programming Language :: Python :: 3.10",
+            "Programming Language :: Python :: 3.11",
+            "Programming Language :: Python :: 3.12",
+            "Programming Language :: Python :: 3.13",
+        ],
+        python_requires=">=3.10,<4",  # 项目依赖的 Python 版本
        entry_points={
            "console_scripts": [
                "magic-pdf = magic_pdf.tools.cli:cli",
Author	SHA1	Message	Date
Xiaomeng Zhao	d0ed731b9e	Merge pull request #2199 from opendatalab/release-1.3.2 Release 1.3.2	2025-04-12 18:58:15 +08:00
Xiaomeng Zhao	d6c5700af2	Merge pull request #2198 from opendatalab/dev Dev	2025-04-12 18:52:52 +08:00
Xiaomeng Zhao	196da7ea0e	Merge pull request #2197 from myhloli/dev docs(readme): update release notes for English and Chinese README files	2025-04-12 18:52:30 +08:00
myhloli	a69b97c9dd	docs(readme): update release notes for English and Chinese README files - Update version history in both English and Chinese README files - Add note about model update required for fixing word concatenation issue- Ensure consistency between English and Chinese versions	2025-04-12 18:51:39 +08:00
Xiaomeng Zhao	b951a6ccd5	Merge pull request #2196 from opendatalab/dev Dev	2025-04-12 18:48:11 +08:00
Xiaomeng Zhao	15b9146e8d	Merge pull request #2195 from myhloli/dev docs(README): update version history and installation instructions	2025-04-12 18:47:28 +08:00
myhloli	437311f5bd	docs(README): update version history and installation instructions - Update version history in README.md and README_zh-CN.md - Add details for 1.3.2 release and previous versions - Update Windows CUDA acceleration installation instructions - Refactor changelog entries for better readability and organization	2025-04-12 18:44:55 +08:00
Xiaomeng Zhao	983e8e6824	Merge pull request #2194 from myhloli/dev feat(magic_pdf): add logging for batch image processing	2025-04-12 17:05:14 +08:00
myhloli	afe1b02c3d	feat(magic_pdf): add logging for batch image processing - Add batch processing logs to track the progress of image analysis - Display the current batch number, total batches, and the number of processed pages	2025-04-12 16:59:06 +08:00
Xiaomeng Zhao	af030021d2	Merge pull request #2193 from myhloli/dev build(setup): update package versions and constraints	2025-04-12 16:39:59 +08:00
myhloli	15467730cf	Merge remote-tracking branch 'origin/dev' into dev	2025-04-12 16:38:49 +08:00
myhloli	1b611a2e55	build(setup): update package versions and constraints - Update matplotlib version range to >=3.10, <4 - Add version上限 for ultralytics: <9 - Remove redundant version ranges for full_old_linux	2025-04-12 16:38:33 +08:00
Xiaomeng Zhao	852b841ab1	Merge pull request #2189 from myhloli/dev refactor(model): optimize batch processing and inference	2025-04-11 19:25:37 +08:00
myhloli	54ce594bf6	refactor(tools): improve code readability and maintainability - Remove unnecessary line breaks and adjust indentation - Update function call to use named arguments for better readability - Modify _do_parse function call to use MakeMode.MM_MD instead of	2025-04-11 11:12:30 +08:00
myhloli	d2fc9dabf4	refactor(model): optimize batch processing and inference - Update batch processing logic for improved efficiency - Refactor image analysis and inference methods - Optimize dataset handling and image retrieval - Improve error handling and logging in batch processes	2025-04-11 10:59:38 +08:00
myhloli	930bc47fe4	build(dependencies): update torch version requirements - Remove upper version limit for torch - This change allows for greater flexibility in installing compatible torch versions	2025-04-11 10:29:37 +08:00
Xiaomeng Zhao	1c7f41dd7c	Merge pull request #2178 from myhloli/dev feat(gui): update language options and default settings	2025-04-10 17:53:11 +08:00
Xiaomeng Zhao	e32704f102	Merge branch 'opendatalab:dev' into dev	2025-04-10 17:52:07 +08:00
Xiaomeng Zhao	a881ee89f6	Merge pull request #2177 from icecraft/feat/iterator_inference feat: inference with iter style	2025-04-10 17:51:46 +08:00
icecraft	43164533fa	feat: inference with iter style	2025-04-10 17:45:19 +08:00
myhloli	786da939e5	feat(gui): update language options and default settings - Remove unused 'layoutlmv3' model option - Update language options to include new 'add_lang' list - Set default language to 'ch' (Chinese) - Comment out old 'all_lang' definition for future reference	2025-04-10 15:39:51 +08:00
Xiaomeng Zhao	ce212da14b	Merge pull request #2174 from myhloli/dev refactor(ocr): comment out det_count update and update OCR models	2025-04-09 23:59:35 +08:00
myhloli	f8323ae07c	refactor(ocr): comment out det_count update and update OCR models - Comment out the line that updates det_count in batch_analyze.py - Add a new OCR model configuration for Chinese (ch_lite) in models_config.yml- Update the Chinese OCR model configuration to use a different recognition model	2025-04-09 23:56:47 +08:00
Xiaomeng Zhao	e5b74ae724	Merge pull request #2173 from myhloli/dev fix(dataset): correct variable for language detection	2025-04-09 22:34:12 +08:00
myhloli	814bd4ea50	fix(dataset): correct variable for language detection - Change `bits` to `self._data_bits` for language detection - This fixes the TypeError when opening PDF files	2025-04-09 22:32:31 +08:00
Xiaomeng Zhao	1db6f89dcd	Merge pull request #2172 from myhloli/dev fix(ocr): handle NaN values in recognition scores, feat(table): add orientation detection and rotation for portrait tables	2025-04-09 19:06:20 +08:00
myhloli	4afdba3626	perf(table): optimize aspect ratio calculation for text boxes - Simplify aspect ratio calculation using direct coordinate subtraction - Remove unnecessary list append operation - Improve code readability and performance in table rotation detection	2025-04-09 19:05:05 +08:00
myhloli	ac893f325a	feat(table): add orientation detection and rotation for portrait tables - Implement table orientation detection to identify if a table is in portrait mode - Add rotation logic to turn portrait tables 90 degrees clockwise before OCR - Update OCR processing to work with potentially rotated images - Improve text box analysis to determine if a table is rotated	2025-04-09 18:47:53 +08:00
myhloli	c97959e4f5	fix(ocr): handle NaN values in recognition scores - Update predict_rec.py to check for NaN values in recognition results - Replace NaN scores with 0.0 to ensure stability and consistency	2025-04-09 18:00:30 +08:00
Xiaomeng Zhao	8aa61b0e9f	Merge pull request #2166 from icecraft/fix/doc_analyze fix: support page range	2025-04-09 17:18:42 +08:00
Xiaomeng Zhao	8e8103a8ce	Merge pull request #2170 from myhloli/dev feat(model): improve table recognition by merging and filtering tables	2025-04-09 17:17:21 +08:00
myhloli	df7ae4042d	feat(model): improve table recognition by merging and filtering tables - Add functions to calculate IoU, check if tables are inside each other, and merge tables - Implement table merging for high IoU tables - Add filtering to remove nested tables that don't overlap but cover a large area - Update table_res_list and layout_res to reflect these changes	2025-04-09 17:14:33 +08:00
icecraft	29c42a1add	fix: support page range	2025-04-09 15:04:07 +08:00
Xiaomeng Zhao	b60166a541	Merge pull request #2157 from opendatalab/release-1.3.1 Release 1.3.1	2025-04-08 18:16:33 +08:00
Xiaomeng Zhao	ccf2ea04cb	Merge pull request #2156 from opendatalab/dev Dev	2025-04-08 18:16:07 +08:00
Xiaomeng Zhao	564991512c	Merge branch 'release-1.3.1' into dev	2025-04-08 18:16:01 +08:00
Xiaomeng Zhao	a1595f1912	Merge pull request #2155 from myhloli/dev docs: update version number in README files	2025-04-08 18:15:17 +08:00
myhloli	bc0ff1acb0	docs: update version number in README files - Correct version number from 1.3.2 to 1.3.1 in both README.md and README_zh-CN.md - Update changelog entries for the latest release	2025-04-08 18:14:29 +08:00
Xiaomeng Zhao	cb9c2e7616	Merge pull request #2154 from opendatalab/release-1.3.2 Release 1.3.2	2025-04-08 18:11:26 +08:00
Xiaomeng Zhao	b3ac3ac148	Merge branch 'master' into release-1.3.2	2025-04-08 18:11:16 +08:00
Xiaomeng Zhao	2c7094ff3d	Merge pull request #2153 from opendatalab/dev Dev	2025-04-08 18:10:16 +08:00
Xiaomeng Zhao	0ed231cb8b	Merge pull request #2152 from myhloli/dev docs(README): update version number and changelog in README files	2025-04-08 18:09:53 +08:00
myhloli	bd4728aaeb	docs(README): update version number and changelog in README files - Update version number from 1.3.1 to 1.3.2	2025-04-08 18:09:05 +08:00
Xiaomeng Zhao	2813e59905	Merge pull request #2151 from myhloli/dev refactor(ocr): improve OCR score precision to three decimal places	2025-04-08 18:06:31 +08:00
myhloli	ea730ae2e9	refactor(ocr): improve OCR score precision to three decimal places - Update OCR score formatting in batch_analyze.py and pdf_parse_union_core_v2.py - Change score rounding method to preserve three decimal places - Enhance accuracy representation without significantly altering the score value	2025-04-08 18:02:03 +08:00
myhloli	0ab29cdbee	docs(README): update version number in release notes - Update version from1.3.1 to 1.3.2 in both English and Chinese README files - Keep other content unchanged	2025-04-08 17:37:39 +08:00
Xiaomeng Zhao	44665d3966	Update python-package.yml	2025-04-08 17:35:39 +08:00
myhloli	79feb926b7	Update version.py with new version	2025-04-08 09:23:09 +00:00
Xiaomeng Zhao	a2cde43b57	Merge pull request #2146 from opendatalab/release-1.3.1 Release 1.3.1	2025-04-08 17:20:21 +08:00
Xiaomeng Zhao	b8856ca96a	Merge pull request #2148 from opendatalab/dev Dev	2025-04-08 17:03:27 +08:00
Xiaomeng Zhao	098cf1df60	Merge pull request #2147 from myhloli/dev docs: update badges and project URLs- Update PyPI version badge to us…	2025-04-08 17:02:43 +08:00
myhloli	90f0e7370a	docs: update badges and project URLs- Update PyPI version badge to use shields.io - Add project URLs in setup.py for better discoverability - Make consistent changes across README.md and README_zh-CN.md	2025-04-08 17:01:41 +08:00
Xiaomeng Zhao	714504864e	Update python-package.yml	2025-04-08 16:49:56 +08:00
Xiaomeng Zhao	87fd4c2806	Update bug_report.yml	2025-04-08 16:49:02 +08:00
Xiaomeng Zhao	3251c73250	Merge pull request #2145 from opendatalab/dev fix(table): add model path for slanet-plus to resolve RapidTableError	2025-04-08 16:47:45 +08:00
Xiaomeng Zhao	697da27cf7	Merge pull request #2144 from myhloli/dev fix(table): add model path for slanet-plus to resolve RapidTableError	2025-04-08 16:47:09 +08:00
myhloli	e327e9bad5	fix(table): add model path for slanet-plus to resolve RapidTableError - Import os and pathlib modules to handle file paths - Define the path to the slanet-plus model - Update RapidTableInput initialization to include the model path	2025-04-08 16:46:01 +08:00
Xiaomeng Zhao	99d5c022c4	Merge pull request #2142 from myhloli/dev update 1.3.1	2025-04-08 16:13:28 +08:00
myhloli	7b61b418a3	ci: update Python version support and installation process - Add support for Python3.11, 3.12, and 3.13 - Replace requirements.txt based installation with editable install	2025-04-08 16:10:07 +08:00
myhloli	4fd8d626c4	docs(install): update Python version requirements and simplify torch installation - Update Python version requirements to >=3.10 - Simplify torch installation command- Remove numpy version restriction - Update CUDA compatibility information - Adjust environment creation commands across multiple documentation files	2025-04-08 16:06:02 +08:00
myhloli	cf6fa12767	build(setup): remove rapid_table dependency - Remove rapid_table from install_requires in setup.py	2025-04-08 14:24:15 +08:00
myhloli	de4bc5a32d	ci: update issue template options for Python version and dependency version - Add "3.13" option for Python version - Remove "3.9" option for Python version - Update dependency version options: - Remove "0.8.x", "0.9.x", "0.10.x" - Add "1.1.x", "1.2.x", "1.3.x"	2025-04-08 14:22:06 +08:00
myhloli	9b5d2796f8	build(deps): update dependencies and add support for old Linux systems - Update transformers to exclude version 4.51.0 due to compatibility issues- Rapid table version range expanded to >=1.0.5,<2.0.0 - Add separate 'full_old_linux' extras_require for better support of older Linux systems - Update matplotlib version requirements for different platforms - Remove platform-specific paddlepaddle versions,	2025-04-08 14:18:49 +08:00
myhloli	0f0591cf8f	build(old_linux): add rapid_table dependency for PDF conversion - Add rapid_table==1.0.3 to old_linux specific dependencies - This version is compatible with Linux systems from 2019 and earlier - Newer versions of rapid_table depend on onnxruntime, which is not supported on older Linux systems	2025-04-08 11:58:38 +08:00
Xiaomeng Zhao	cf6ffc6b1e	Merge pull request #2128 from myhloli/dev fix(model): improve VRAM detection and handling	2025-04-07 18:18:09 +08:00
myhloli	d32a63cada	fix(model): improve VRAM detection and handling - Refactor VRAM detection logic for better readability and efficiency - Add fallback mechanism for unknown VRAM sizes - Improve device checking in get_vram function	2025-04-07 18:15:37 +08:00
Xiaomeng Zhao	dfb3cbfb17	Merge pull request #2126 from icecraft/fix/image_ds_add_lang fix: image dataset add lang field	2025-04-07 16:57:49 +08:00
icecraft	e36a083dc3	fix: image dataset add lang field	2025-04-07 15:40:06 +08:00
Xiaomeng Zhao	980f5c8cd7	Merge pull request #2125 from opendatalab/dev docs: update torchvision version in CUDA installation guide	2025-04-07 15:26:13 +08:00
Xiaomeng Zhao	f442adfc95	Merge pull request #2124 from myhloli/dev docs: update torchvision version in CUDA installation guide	2025-04-07 14:54:30 +08:00
myhloli	d4cda0a8c2	docs: update torchvision version in CUDA installation guide - Update torchvision version from0.21.1 to0.21.0 in Windows CUDA acceleration guides - Update both English and Chinese versions of the documentation	2025-04-07 14:53:25 +08:00
Xiaomeng Zhao	60fdf851a4	Merge pull request #2115 from myhloli/dev build: remove accelerate dependency	2025-04-06 22:25:01 +08:00
myhloli	a10b9aec74	build: remove accelerate dependency - Remove accelerate package from requirements.txt - This change ensures only necessary external dependencies are introduced	2025-04-06 22:24:23 +08:00
Xiaomeng Zhao	e3261b0eea	Merge pull request #2114 from myhloli/dev build(deps): add accelerate package and update requirements https://github.com/opendatalab/MinerU/issues/2112	2025-04-06 22:17:20 +08:00
myhloli	09632dddc1	build(deps): add accelerate package and update requirements - Add accelerate package to support model training acceleration - Update requirements.txt to include new dependency	2025-04-06 22:16:01 +08:00
Xiaomeng Zhao	c5329a0722	Merge pull request #2093 from opendatalab/master master -> dev	2025-04-03 23:33:35 +08:00