[](https://github.com/opendatalab/MinerU)
[](https://github.com/opendatalab/MinerU)
[](https://github.com/opendatalab/MinerU/issues)
[](https://github.com/opendatalab/MinerU/issues)
[](https://pypi.org/project/mineru/)
[](https://pypi.org/project/mineru/)
[](https://pepy.tech/project/mineru)
[](https://pepy.tech/project/mineru)
[](https://mineru.net/OpenSourceTools/Extractor?source=github)
[](https://huggingface.co/spaces/opendatalab/MinerU)
[](https://www.modelscope.cn/studios/OpenDataLab/MinerU)
[](https://colab.research.google.com/gist/myhloli/a3cb16570ab3cfeadf9d8f0ac91b4fca/mineru_demo.ipynb)
[](https://arxiv.org/abs/2409.18839)
[](https://arxiv.org/abs/2509.22186)
[](https://deepwiki.com/opendatalab/MinerU)

[English](README.md) | [简体中文](README_zh-CN.md)
🚀Access MinerU Now→✅ Zero-Install Web Version ✅ Full-Featured Desktop Client ✅ Instant API Access; Skip deployment headaches – get all product formats in one click. Developers, dive in!
👋 join us on Discord and WeChat
# Changelog
- 2026/02/06 2.7.6 Release
- Added support for the domestic computing platforms Kunlunxin and Tecorigin; currently, the domestic computing platforms that have been adapted and supported by the official team and vendors include:
- [Ascend](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Ascend)
- [T-Head](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/THead)
- [METAX](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/METAX)
- [Hygon](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Hygon/)
- [Enflame](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Enflame/)
- [MooreThreads](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/MooreThreads/)
- [IluvatarCorex](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/IluvatarCorex/)
- [Cambricon](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Cambricon/)
- [Kunlunxin](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Kunlunxin/)
- [Tecorigin](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Tecorigin/)
- [Biren](https://opendatalab.github.io/MinerU/zh/usage/acceleration_cards/Biren/)
- MinerU continues to support domestic hardware platforms and mainstream chip architectures. With secure and reliable technology, it helps research, government, and enterprise users reach new heights in document digitization!
- 2026/01/30 2.7.4 Release
- Added support for domestic computing platforms IluvatarCorex and Cambricon.
- 2026/01/23 2.7.2 Release
- Added support for domestic computing platforms Hygon, Enflame, and Moore Threads.
- Cross-page table merging optimization, improving merge success rate and merge quality.
- 2026/01/06 2.7.1 Release
- fix bug: #4300
- Updated pdfminer.six dependency version to resolve [CVE-2025-64512](https://github.com/advisories/GHSA-wf5f-4jwr-ppcp)
- Support automatic correction of input image exif orientation to improve OCR recognition accuracy #4283
- 2025/12/30 2.7.0 Release
- Simplified installation process. No need to separately install `vlm` acceleration engine dependencies. Using `uv pip install mineru[all]` during installation will install all optional backend dependencies.
- Added new `hybrid` backend, which combines the advantages of `pipeline` and `vlm` backends. Built on vlm, it integrates some capabilities of pipeline, adding extra extensibility on top of high accuracy:
- Directly extracts text from text PDFs, natively supports multi-language recognition in text PDF scenarios, and greatly reduces parsing hallucinations;
- Supports text recognition in 109 languages for scanned PDF scenarios by specifying OCR language;
- Independent inline formula recognition switch, which can be disabled separately when inline formula recognition is not needed, improving the visual effect of parsing results.
- Simplified engine selection logic for `vlm/hybrid` backends. Users only need to specify the backend as `*-auto-engine`, and the system will automatically select the appropriate engine for inference acceleration based on the current environment, improving usability.
- Switched default parsing backend from `pipeline` to `hybrid-auto-engine`, improving out-of-the-box result consistency for new users and avoiding cognitive differences in parsing results.
- Added i18n support to gradio application, supporting switching between Chinese and English languages.
> 📝 View the complete [Changelog](https://opendatalab.github.io/MinerU/reference/changelog/) for more historical version information
# MinerU
## Project Introduction
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format.
MinerU was born during the pre-training process of [InternLM](https://github.com/InternLM/InternLM). We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models.
Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on [issue](https://github.com/opendatalab/MinerU/issues) and **attach the relevant PDF**.
https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
## Key Features
- Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.
- Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.
- Preserve the structure of the original document, including headings, paragraphs, lists, etc.
- Extract images, image descriptions, tables, table titles, and footnotes.
- Automatically recognize and convert formulas in the document to LaTeX format.
- Automatically recognize and convert tables in the document to HTML format.
- Automatically detect scanned PDFs and garbled PDFs and enable OCR functionality.
- OCR supports detection and recognition of 109 languages.
- Supports multiple output formats, such as multimodal and NLP Markdown, JSON sorted by reading order, and rich intermediate formats.
- Supports various visualization results, including layout visualization and span visualization, for efficient confirmation of output quality.
- Supports running in a pure CPU environment, and also supports GPU(CUDA)/NPU(CANN)/MPS acceleration
- Compatible with Windows, Linux, and Mac platforms.
# Quick Start
If you encounter any installation issues, please first consult the