mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-03-27 11:08:32 +07:00
9906052d6fcb536e81ac68363eebf1b242e3352e
Magic-PDF
Introduction
Magic-PDF is a tool designed to convert PDF documents into markdown format, capable of processing files stored locally or on object storage supporting S3 protocol.
Key features include:
- Support for multiple front-end model inputs
- Removal of headers, footers, footnotes, and page numbers
- Human-readable layout formatting
- Extraction and display of images and tables within markdown
- Conversion of equations into LaTeX format
- Automatic detection and conversion of garbled PDFs
- Compatibility with CPU and GPU environments
- Available for Windows, Linux, and macOS platforms
Getting Started
Requirements
- Python 3.9 or newer
Usage Instructions
- Install Magic-PDF
pip install magic-pdf[cpu] # Install the CPU version
or
pip install magic-pdf[gpu] # Install the GPU version
- Usage via Command Line
magic-pdf --help
License Information
See LICENSE.md for details.
Acknowledgments
Description
Languages
Python
98.9%
Dockerfile
1.1%