mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-03-27 11:08:32 +07:00
build(docker): update docker build step (#471)
* build(docker): update base image to Ubuntu 22.04 and install PaddlePaddleUpgrade the Docker base image from ubuntu:latest to ubuntu:22.04 for improved performance and stability. Additionally, integrate PaddlePaddle GPU version 3.0.0b1 into the Docker build for enhanced AI capabilities. The MinIO configuration file has also been updated to the latest version. * build(dockerfile): Updated the Dockerfile * build(Dockerfile): update Dockerfile * docs(docker): add instructions for quick deployment with Docker Include Docker-based deployment instructions in the README for both English and Chinese locales. This update provides users a quick-start guide to using Docker for deployment, with notes on GPU VRAM requirements and default acceleration features. * build(docker): Layer the installation of dependencies, downloading the model, and the setup of the program itself. * build(docker): Layer the installation of dependencies, downloading the model, and the setup of the program itself.
This commit is contained in:
26
Dockerfile
26
Dockerfile
@@ -1,5 +1,5 @@
|
||||
# Use the official Ubuntu base image
|
||||
FROM ubuntu:latest
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Set environment variables to non-interactive to avoid prompts during installation
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
@@ -29,17 +29,23 @@ RUN python3 -m venv /opt/mineru_venv
|
||||
|
||||
# Activate the virtual environment and install necessary Python packages
|
||||
RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
|
||||
pip install --upgrade pip && \
|
||||
pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/"
|
||||
pip3 install --upgrade pip && \
|
||||
wget https://gitee.com/myhloli/MinerU/raw/master/requirements-docker.txt && \
|
||||
pip3 install -r requirements-docker.txt --extra-index-url https://wheels.myhloli.com -i https://pypi.tuna.tsinghua.edu.cn/simple && \
|
||||
pip3 install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/"
|
||||
|
||||
# Copy the configuration file template and set up the model directory
|
||||
COPY magic-pdf.template.json /root/magic-pdf.json
|
||||
# Copy the configuration file template and install magic-pdf latest
|
||||
RUN /bin/bash -c "wget https://gitee.com/myhloli/MinerU/raw/master/magic-pdf.template.json && \
|
||||
cp magic-pdf.template.json /root/magic-pdf.json && \
|
||||
source /opt/mineru_venv/bin/activate && \
|
||||
pip3 install magic-pdf==0.7.0b1"
|
||||
|
||||
# Set the models directory in the configuration file (adjust the path as needed)
|
||||
RUN sed -i 's|/tmp/models|/opt/models|g' /root/magic-pdf.json
|
||||
|
||||
# Create the models directory
|
||||
RUN mkdir -p /opt/models
|
||||
# Download models and update the configuration file
|
||||
RUN /bin/bash -c "pip3 install modelscope && \
|
||||
wget https://gitee.com/myhloli/MinerU/raw/master/docs/download_models.py && \
|
||||
python3 download_models.py && \
|
||||
sed -i 's|/tmp/models|/root/.cache/modelscope/hub/wanderkid/PDF-Extract-Kit/models|g' /root/magic-pdf.json && \
|
||||
sed -i 's|cpu|cuda|g' /root/magic-pdf.json"
|
||||
|
||||
# Set the entry point to activate the virtual environment and run the command line tool
|
||||
ENTRYPOINT ["/bin/bash", "-c", "source /opt/mineru_venv/bin/activate && exec \"$@\"", "--"]
|
||||
|
||||
@@ -227,6 +227,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
|
||||
|
||||
- [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md)
|
||||
- [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
|
||||
- Quick Deployment with Docker
|
||||
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
|
||||
```bash
|
||||
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
|
||||
docker build -t mineru:0.7.0b1 .
|
||||
docker run --rm -it --gpus=all mineru:0.7.0b1 /bin/bash
|
||||
magic-pdf --help
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
|
||||
@@ -230,6 +230,15 @@ cp magic-pdf.template.json ~/magic-pdf.json
|
||||
|
||||
- [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md)
|
||||
- [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
|
||||
- 使用Docker快速部署
|
||||
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
|
||||
```bash
|
||||
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
|
||||
docker build -t mineru:0.7.0b1 .
|
||||
docker run --rm -it --gpus=all mineru:0.7.0b1 /bin/bash
|
||||
magic-pdf --help
|
||||
```
|
||||
|
||||
|
||||
## 使用
|
||||
|
||||
|
||||
18
requirements-docker.txt
Normal file
18
requirements-docker.txt
Normal file
@@ -0,0 +1,18 @@
|
||||
boto3>=1.28.43
|
||||
Brotli>=1.1.0
|
||||
click>=8.1.7
|
||||
PyMuPDF>=1.24.9
|
||||
loguru>=0.6.0
|
||||
numpy>=1.21.6,<2.0.0
|
||||
fast-langdetect==0.2.0
|
||||
wordninja>=2.0.0
|
||||
scikit-learn>=1.0.2
|
||||
pdfminer.six==20231228
|
||||
unimernet==0.1.6
|
||||
matplotlib
|
||||
ultralytics
|
||||
paddleocr==2.7.3
|
||||
paddlepaddle==3.0.0b1
|
||||
pypandoc
|
||||
struct-eqtable==0.1.0
|
||||
detectron2
|
||||
Reference in New Issue
Block a user