docs: update readme

This commit is contained in:
myhloli
2024-07-13 20:05:53 +08:00
parent 19fd0a401e
commit 4d6dcb008a
2 changed files with 24 additions and 26 deletions

View File

@@ -94,9 +94,9 @@ Alternatively, for built-in high-precision model parsing capabilities, use:
```bash
pip install magic-pdf[full-cpu]
```
The high-precision models depend on detectron2, which requires a compiled installation.
If you need to compile it yourself, refer to https://github.com/facebookresearch/detectron2/issues/5114
Or directly use our pre-compiled wheel packages (limited to python 3.10):
The high-precision models depend on detectron2, which requires a compiled installation.
If you need to compile it yourself, refer to https://github.com/facebookresearch/detectron2/issues/5114
Or directly use our pre-compiled wheel packages (limited to python 3.10):
```bash
pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
```
@@ -104,7 +104,7 @@ pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
#### 2. Downloading model weights files
For detailed references, please see below[how_to_download_models](docs/how_to_download_models_en.md)
For detailed references, please see below [how_to_download_models](docs/how_to_download_models_en.md)
After downloading the model weights, move the 'models' directory to a directory on a larger disk space, preferably an SSD.
@@ -130,9 +130,9 @@ In magic-pdf.json, configure "models-dir" to point to the directory where the mo
```bash
magic-pdf pdf-command --pdf "pdf_path" --inside_model true
```
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
You can find the corresponding xxx_model.json file in the markdown directory.
If you intend to do secondary development on the post-processing pipeline, you can use the command:
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
You can find the corresponding xxx_model.json file in the markdown directory.
If you intend to do secondary development on the post-processing pipeline, you can use the command:
```bash
magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path"
```
@@ -150,12 +150,12 @@ magic-pdf --help
##### CUDA
You need to install the corresponding PyTorch version according to your CUDA version.
You need to install the corresponding PyTorch version according to your CUDA version.
This example installs the CUDA 11.8 version.More information https://pytorch.org/get-started/locally/
```bash
# When using the GPU solution, you need to reinstall PyTorch for the corresponding CUDA version. This example installs the CUDA 11.8 version.
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
```
Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json.
Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```json
{
"device-mode":"cuda"
@@ -164,9 +164,8 @@ Also, you need to modify the value of "device-mode" in the configuration file ma
##### MPS
For macOS users with M-series chip devices, you can use MPS for inference acceleration.
You also need to modify the value of "device-mode" in the configuration file magic-pdf.json.
For macOS users with M-series chip devices, you can use MPS for inference acceleration.
You also need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```json
{
"device-mode":"mps"

View File

@@ -70,7 +70,7 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3
python >= 3.9
推荐使用虚拟环境以避免可能发生的依赖冲突venv和conda均可使用。
推荐使用虚拟环境以避免可能发生的依赖冲突venv和conda均可使用。
例如:
```bash
conda create -n MinerU python=3.10
@@ -90,19 +90,19 @@ pip install magic-pdf
```bash
pip install magic-pdf[full-cpu]
```
高精度模型依赖于detectron2该库需要编译安装如需自行编译请参考https://github.com/facebookresearch/detectron2/issues/5114
或是直接使用我们预编译的whl包(仅限python 3.10)
高精度模型依赖于detectron2该库需要编译安装如需自行编译请参考 https://github.com/facebookresearch/detectron2/issues/5114
或是直接使用我们预编译的whl包(仅限python 3.10)
```bash
pip install detectron2 --extra-index-url https://myhloli.github.io/wheels/
```
#### 2. 下载模型权重文件
详细参考[如何下载模型文件](docs/how_to_download_models_zh_cn.md)
下载后请将models目录移动到空间较大的ssd磁盘目录
详细参考 [如何下载模型文件](docs/how_to_download_models_zh_cn.md)
下载后请将models目录移动到空间较大的ssd磁盘目录
#### 3. 拷贝配置文件并进行配置
在仓库根目录可以获得[magic-pdf.template.json](magic-pdf.template.json)文件
在仓库根目录可以获得 [magic-pdf.template.json](magic-pdf.template.json) 文件
```bash
cp magic-pdf.template.json ~/magic-pdf.json
```
@@ -120,8 +120,8 @@ cp magic-pdf.template.json ~/magic-pdf.json
```bash
magic-pdf pdf-command --pdf "pdf_path" --inside_model true
```
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件markdown目录中可以找到对应的xxx_model.json文件
如果您有意对后处理pipeline进行二次开发可以使用命令
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件markdown目录中可以找到对应的xxx_model.json文件
如果您有意对后处理pipeline进行二次开发可以使用命令
```bash
magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path"
```
@@ -138,9 +138,9 @@ magic-pdf --help
###### CUDA
需要根据自己的CUDA版本安装对应的pytorch版本
需要根据自己的CUDA版本安装对应的pytorch版本
以下是对应CUDA 11.8版本的安装命令,更多信息请参考 https://pytorch.org/get-started/locally/
```bash
# 使用gpu方案时需要重新安装对应cuda版本的pytorch例子是安装CUDA 11.8版本的
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
```
@@ -152,9 +152,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
```
###### MPS
使用macOS(M系列芯片设备)可以使用MPS进行推理加速
需要修改配置文件magic-pdf.json中"device-mode"的值
使用macOS(M系列芯片设备)可以使用MPS进行推理加速
需要修改配置文件magic-pdf.json中"device-mode"的值
```json
{
"device-mode":"mps"