mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-03-27 11:08:32 +07:00
MinerU v2.0 Multi-GPU Server
A streamlined multi-GPU server implementation.
Quick Start
1. install MinerU
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
uv pip install litserve aiohttp loguru
2. Start the Server
python server.py
3. Start the Client
python client.py
Now, pdf files under folder demo will be processed in parallel. Assuming you have 2 gpus, if you change the workers_per_device to 2, 4 pdf files will be processed at the same time!
Customize
Server
Example showing how to start the server with custom settings:
server = ls.LitServer(
MinerUAPI(output_dir='/tmp/mineru_output'),
accelerator='auto', # You can specify 'cuda'
devices='auto', # "auto" uses all available GPUs
workers_per_device=1, # One worker instance per GPU
timeout=False # Disable timeout for long processing
)
server.run(port=8000, generate_client_file=False)
Client
The client supports both synchronous and asynchronous processing:
import asyncio
import aiohttp
from client import mineru_parse_async
async def process_documents():
async with aiohttp.ClientSession() as session:
# Basic usage
result = await mineru_parse_async(session, 'document.pdf')
# With custom options
result = await mineru_parse_async(
session,
'document.pdf',
backend='pipeline',
lang='ch',
formula_enable=True,
table_enable=True
)
# Run async processing
asyncio.run(process_documents())
Concurrent Processing
Process multiple files simultaneously:
async def process_multiple_files():
files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']
async with aiohttp.ClientSession() as session:
tasks = [mineru_parse_async(session, file) for file in files]
results = await asyncio.gather(*tasks)
return results