mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-04-02 14:08:34 +07:00
Compare commits
8 Commits
mineru-3.0
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ede8d95bf1 | ||
|
|
54b68d4bf1 | ||
|
|
1b478c24cf | ||
|
|
39b62cc76a | ||
|
|
13465ff43f | ||
|
|
d18b7df766 | ||
|
|
a97753c86f | ||
|
|
a3b65470cf |
@@ -43,12 +43,12 @@ If you need to adjust parsing options through custom parameters, you can also ch
|
||||
>- API outputs are controlled by the server and written to `./output` by default
|
||||
>- Uploads currently support `PDF`, image, and `DOCX` files
|
||||
>
|
||||
>`POST /tasks` returns immediately with a `task_id`. `POST /file_parse` uses the same task manager internally, waits for the task to finish, and then returns the final result synchronously.
|
||||
>When a task is waiting in the queue, both the submission response and task-status response may include `queued_ahead` to indicate how many tasks are ahead of it.
|
||||
>Tasks are tracked only in-process for a single `mineru-api` instance. Task status is not preserved across service restarts, `--reload`, or multi-process deployments.
|
||||
>Completed or failed tasks are retained for 24 hours by default, then their task state and output directory are cleaned automatically. After cleanup, task status and result endpoints return `404`.
|
||||
>Use `MINERU_API_TASK_RETENTION_SECONDS` and `MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS` to adjust retention and cleanup polling intervals.
|
||||
>Use `--enable-vlm-preload true` to warm up the local VLM model during service startup instead of waiting for the first VLM or hybrid request.
|
||||
>- `POST /tasks` returns immediately with a `task_id`. `POST /file_parse` uses the same task manager internally, waits for the task to finish, and then returns the final result synchronously.
|
||||
>- When a task is waiting in the queue, both the submission response and task-status response may include `queued_ahead` to indicate how many tasks are ahead of it.
|
||||
>- Tasks are tracked only in-process for a single `mineru-api` instance. Task status is not preserved across service restarts, `--reload`, or multi-process deployments.
|
||||
>- Completed or failed tasks are retained for 24 hours by default, then their task state and output directory are cleaned automatically. After cleanup, task status and result endpoints return `404`.
|
||||
>- Use `MINERU_API_TASK_RETENTION_SECONDS` and `MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS` to adjust retention and cleanup polling intervals.
|
||||
>- Use `--enable-vlm-preload true` to warm up the local VLM model during service startup instead of waiting for the first VLM or hybrid request.
|
||||
>
|
||||
>Asynchronous task submission example:
|
||||
>```bash
|
||||
|
||||
@@ -43,12 +43,12 @@ mineru -p <input_path> -o <output_path>
|
||||
>- API 输出目录由服务端固定控制,默认写入 `./output`
|
||||
>- 上传文件当前支持 `PDF`、图片与 `DOCX`
|
||||
>
|
||||
>`POST /tasks` 会立即返回 `task_id`;`POST /file_parse` 会在内部提交到同一个任务管理器,等待任务完成后同步返回最终结果。
|
||||
>当任务处于排队状态时,任务提交结果和状态查询结果中可能会返回 `queued_ahead` 字段,用于表示前方排队任务数。
|
||||
>任务为单进程、进程内状态实现,服务重启、`--reload` 热重载或多进程部署后不保证仍可查询历史任务状态。
|
||||
>默认任务完成或失败后保留 24 小时,随后自动清理任务状态和输出目录;清理后访问任务状态或结果会返回 `404`。
|
||||
>可通过环境变量 `MINERU_API_TASK_RETENTION_SECONDS` 和 `MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS` 调整保留时长与清理轮询间隔。
|
||||
>可通过 `--enable-vlm-preload true` 在服务启动阶段预热本地 VLM 模型,避免首次 VLM 或 hybrid 请求时再初始化。
|
||||
>- `POST /tasks` 会立即返回 `task_id`;`POST /file_parse` 会在内部提交到同一个任务管理器,等待任务完成后同步返回最终结果。
|
||||
>- 当任务处于排队状态时,任务提交结果和状态查询结果中可能会返回 `queued_ahead` 字段,用于表示前方排队任务数。
|
||||
>- 任务为单进程、进程内状态实现,服务重启、`--reload` 热重载或多进程部署后不保证仍可查询历史任务状态。
|
||||
>- 默认任务完成或失败后保留 24 小时,随后自动清理任务状态和输出目录;清理后访问任务状态或结果会返回 `404`。
|
||||
>- 可通过环境变量 `MINERU_API_TASK_RETENTION_SECONDS` 和 `MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS` 调整保留时长与清理轮询间隔。
|
||||
>- 可通过 `--enable-vlm-preload true` 在服务启动阶段预热本地 VLM 模型,避免首次 VLM 或 hybrid 请求时再初始化。
|
||||
>
|
||||
>异步任务提交示例:
|
||||
>```bash
|
||||
|
||||
@@ -701,7 +701,7 @@ def mk_blocks_to_markdown(para_blocks, make_mode, img_buket_path='', page_idx=No
|
||||
continue
|
||||
else:
|
||||
# page_markdown.append(para_text.strip())
|
||||
page_markdown.append(para_text)
|
||||
page_markdown.append(para_text.strip('\r\n'))
|
||||
|
||||
return page_markdown
|
||||
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = "3.0.5"
|
||||
__version__ = "3.0.7"
|
||||
|
||||
Reference in New Issue
Block a user