diff --git a/README.md b/README.md index f749d7f4..52492792 100644 --- a/README.md +++ b/README.md @@ -43,48 +43,122 @@ # Changelog -- 2025/08/01 2.1.10 Released - - Fixed an issue in the `pipeline` backend where block overlap caused the parsing results to deviate from expectations #3232 -- 2025/07/30 2.1.9 Released - - `transformers` 4.54.1 version adaptation -- 2025/07/28 2.1.8 Released - - `sglang` 0.4.9.post5 version adaptation -- 2025/07/27 2.1.7 Released - - `transformers` 4.54.0 version adaptation -- 2025/07/26 2.1.6 Released - - Fixed table parsing issues in handwritten documents when using `vlm` backend - - Fixed visualization box position drift issue when document is rotated #3175 -- 2025/07/24 2.1.5 Released - - `sglang` 0.4.9 version adaptation, synchronously upgrading the dockerfile base image to sglang 0.4.9.post3 -- 2025/07/23 2.1.4 Released - - Bug Fixes - - Fixed the issue of excessive memory consumption during the `MFR` step in the `pipeline` backend under certain scenarios #2771 - - Fixed the inaccurate matching between `image`/`table` and `caption`/`footnote` under certain conditions #3129 -- 2025/07/16 2.1.1 Released - - Bug fixes - - Fixed text block content loss issue that could occur in certain `pipeline` scenarios #3005 - - Fixed issue where `sglang-client` required unnecessary packages like `torch` #2968 - - Updated `dockerfile` to fix incomplete text content parsing due to missing fonts in Linux #2915 - - Usability improvements - - Updated `compose.yaml` to facilitate direct startup of `sglang-server`, `mineru-api`, and `mineru-gradio` services - - Launched brand new [online documentation site](https://opendatalab.github.io/MinerU/), simplified readme, providing better documentation experience -- 2025/07/05 Version 2.1.0 Released - - This is the first major update of MinerU 2, which includes a large number of new features and improvements, covering significant performance optimizations, user experience enhancements, and bug fixes. The detailed update contents are as follows: - - **Performance Optimizations:** - - Significantly improved preprocessing speed for documents with specific resolutions (around 2000 pixels on the long side). - - Greatly enhanced post-processing speed when the `pipeline` backend handles batch processing of documents with fewer pages (<10 pages). - - Layout analysis speed of the `pipeline` backend has been increased by approximately 20%. - - **Experience Enhancements:** - - Built-in ready-to-use `fastapi service` and `gradio webui`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#advanced-usage-via-api-webui-sglang-clientserver). - - Adapted to `sglang` version `0.4.8`, significantly reducing the GPU memory requirements for the `vlm-sglang` backend. It can now run on graphics cards with as little as `8GB GPU memory` (Turing architecture or newer). - - Added transparent parameter passing for all commands related to `sglang`, allowing the `sglang-engine` backend to receive all `sglang` parameters consistently with the `sglang-server`. - - Supports feature extensions based on configuration files, including `custom formula delimiters`, `enabling heading classification`, and `customizing local model directories`. For detailed usage instructions, please refer to [Documentation](https://opendatalab.github.io/MinerU/usage/quick_usage/#extending-mineru-functionality-with-configuration-files). - - **New Features:** - - Updated the `pipeline` backend with the PP-OCRv5 multilingual text recognition model, supporting text recognition in 37 languages such as French, Spanish, Portuguese, Russian, and Korean, with an average accuracy improvement of over 30%. [Details](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5_multi_languages.html) - - Introduced limited support for vertical text layout in the `pipeline` backend. + +- 2025/09/05 2.2.0 Released + - Major Updates + - In this version, we focused on improving table parsing accuracy by introducing a new [wired table recognition model](https://github.com/RapidAI/TableStructureRec) and a brand-new hybrid table structure parsing algorithm, significantly enhancing the table recognition capabilities of the `pipeline` backend. + - We also added support for cross-page table merging, which is supported by both `pipeline` and `vlm` backends, further improving the completeness and accuracy of table parsing. + - Other Updates + - The `pipeline` backend now supports 270-degree rotated table parsing, bringing support for table parsing in 0/90/270-degree orientations + - `pipeline` added OCR capability support for Thai and Greek, and updated the English OCR model to the latest version. English recognition accuracy improved by 11%, Thai recognition model accuracy is 82.68%, and Greek recognition model accuracy is 89.28% (by PPOCRv5) + - Added `bbox` field (mapped to 0-1000 range) in the output `content_list.json`, making it convenient for users to directly obtain position information for each content block +
History Log + +
+ 2025/08/01 2.1.10 Released + +
+ +
+ 2025/07/30 2.1.9 Released + +
+ +
+ 2025/07/28 2.1.8 Released + +
+ +
+ 2025/07/27 2.1.7 Released + +
+ +
+ 2025/07/26 2.1.6 Released + +
+ +
+ 2025/07/24 2.1.5 Released + +
+ +
+ 2025/07/23 2.1.4 Released + +
+ +
+ 2025/07/16 2.1.1 Released + +
+ +
+ 2025/07/05 2.1.0 Released + +
+
2025/06/20 2.0.6 Released