2024-06-20 17:17:59 +08:00
2024-05-06 19:04:08 +08:00
2024-03-04 16:02:46 +08:00
2024-05-22 14:57:31 +08:00
2024-05-22 13:51:58 +08:00
2024-03-29 17:29:44 +08:00
2024-03-04 15:43:30 +08:00
2024-06-20 17:17:59 +08:00
2024-06-20 17:17:59 +08:00
2024-06-20 16:42:48 +08:00
2024-06-04 11:48:24 +08:00

Magic-PDF

Introduction

Magic-PDF is a tool designed to convert PDF documents into markdown format, capable of processing files stored locally or on object storage supporting S3 protocol.

Key features include:

  • Support for multiple front-end model inputs
  • Removal of headers, footers, footnotes, and page numbers
  • Human-readable layout formatting
  • Extraction and display of images and tables within markdown
  • Conversion of equations into LaTeX format
  • Automatic detection and conversion of garbled PDFs
  • Compatibility with CPU and GPU environments
  • Available for Windows, Linux, and macOS platforms

Getting Started

Requirements

  • Python 3.9 or newer

Usage Instructions

  1. Install Magic-PDF
pip install magic-pdf[cpu] # Install the CPU version 
or
pip install magic-pdf[gpu] # Install the GPU version
  1. Usage via Command Line
magic-pdf --help

License Information

See LICENSE.md for details.

Acknowledgments

Languages
Python 98.9%
Dockerfile 1.1%