132 Commits

Author SHA1 Message Date
myhloli
c4efdc53be feat: update README to clarify project submission status and improve layout 2026-03-20 19:19:34 +08:00
myhloli
1d96a77a19 Refactor FastAPI to add asynchronous task upload and download interfaces. 2026-03-19 19:05:54 +08:00
Xiaomeng Zhao
30c5d10e05 Archive MinerU Project List and update notes
Updated README to indicate the project is archived and added a note about community contributions.
2026-01-09 12:02:21 +08:00
Xiaomeng Zhao
4c3be9273c Fix typo in README_zh-CN.md 2026-01-09 12:00:26 +08:00
Xiaomeng Zhao
1833163b97 Mark MinerU project as archived in README
Updated README to indicate the project is archived and added a note about community contributions.
2026-01-09 12:00:10 +08:00
Xiaomeng Zhao
23c292409f Update projects/mineru_tianshu/api_server.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-02 03:39:13 +08:00
myhloli
bcb30fe79c fix: simplify VRAM size retrieval and improve error handling in memory management 2025-12-01 18:31:07 +08:00
zyileven
ab2c67d477 Fix MinIO API calls and improve error handling 2025-11-28 15:02:26 +08:00
zyileven
d800df6ae5 Add an interface for obtaining more Mineru processing data 2025-11-26 14:28:16 +08:00
Xiaomeng Zhao
dad59f7d52 Merge pull request #3760 from magicyuan876/master
feat(tianshu): v2.0 架构升级 - Worker主动拉取模式
2025-10-17 18:31:38 +08:00
Magic_yuan
cedc62a728 完善markitdown依赖 2025-10-17 16:17:03 +08:00
Magic_yuan
e7d8bf097a 修复codereview建议 2025-10-17 13:04:49 +08:00
Magic_yuan
08a89aeca1 feat(tianshu): v2.0 架构升级 - Worker主动拉取模式
主要改进:
- Worker主动拉取任务,响应速度提升10-20倍 (5-10s → 0.5s)
- 数据库并发安全增强,使用原子操作防止任务重复
- 调度器变为可选监控组件,默认不启动
- 修复多GPU显存占用问题,完全隔离各进程

新增功能:
- API自动返回解析内容
- 结果文件自动清理(可配置)
- 支持图片上传MinIO
2025-10-17 11:46:42 +08:00
myhloli
a36118f8ba Add mineru_tianshu project to README files for version 2.0 compatibility 2025-10-16 17:38:57 +08:00
Xiaomeng Zhao
504fe6ada3 Merge pull request #3742 from magicyuan876/master
feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务
2025-10-16 17:33:54 +08:00
Magic_yuan
484ff5a6f9 修复codereview问题 2025-10-16 16:04:42 +08:00
Magic_yuan
3bf50d5267 feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务
项目简介:
天枢(Tianshu)是基于 MinerU 的文档解析服务,采用 SQLite 任务队列 +
LitServe GPU 负载均衡架构,支持异步处理、任务持久化和多格式文档智能解析。

核心功能:
- 异步任务处理:客户端立即响应,后台处理任务
- 智能解析器:PDF/图片使用 MinerU(GPU加速),Office/文本使用 MarkItDown
- GPU 负载均衡:基于 LitServe 实现多GPU自动调度
- 任务持久化:SQLite 存储,服务重启任务不丢失
- 优先级队列:支持任务优先级设置
- RESTful API:完整的任务管理接口
- MinIO 集成:支持图片上传到对象存储

项目架构:
- api_server.py: FastAPI Web 服务器,提供 RESTful API
- task_db.py: SQLite 任务数据库管理器
- litserve_worker.py: LitServe Worker Pool,GPU 负载均衡
- task_scheduler.py: 异步任务调度器
- start_all.py: 统一启动脚本
- client_example.py: Python 客户端示例

技术栈:
FastAPI, LitServe, SQLite, MinerU, MarkItDown, MinIO, Loguru
2025-10-16 08:41:51 +08:00
myhloli
44fdeb663f Refactor async function and improve output directory handling in prediction 2025-10-13 11:32:28 +08:00
myhloli
3ec6479462 fix: update backend comment to reflect renaming from sglang-engine to vlm-vllm-engine 2025-09-15 02:00:58 +08:00
zhanluxianshen
1671e68367 fix error logs for multi_gpu endpoint.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-08-26 10:26:10 +08:00
Xiaomeng Zhao
d3f6736e0a Update _config_endpoint.py 2025-07-05 04:33:49 +08:00
Xiaomeng Zhao
07b4cbc0ec Update projects/multi_gpu_v2/_config_endpoint.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-05 04:32:31 +08:00
Xiaomeng Zhao
c08a86d6c7 Update projects/multi_gpu_v2/server.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-05 04:29:36 +08:00
Xiaomeng Zhao
919280aaa1 Merge branch 'dev' into multi_gpu_v2 2025-07-05 04:20:52 +08:00
Xiaomeng Zhao
ea9336c0c1 Update server.py 2025-07-05 04:14:58 +08:00
myhloli
802ccd938c refactor: remove multi_gpu project reference from README files 2025-07-05 01:50:47 +08:00
myhloli
8ffdbe6a41 fix: update sglang version requirement in error message and clean up README files 2025-06-30 23:06:27 +08:00
yuanzhou
d3156f76ad Update compatible projects list 2025-06-18 22:02:52 +08:00
ca1yz
3f32f2a587 Update README.md 2025-06-18 19:08:39 +08:00
ca1yz
dbfd392f05 add updated example project based on 2.0 2025-06-18 19:07:53 +08:00
myhloli
4338b63337 feat: add MCP server documentation to README and README_zh-CN 2025-06-13 20:27:11 +08:00
Xiaomeng Zhao
bcbbee8cbd Merge pull request #2622 from myhloli/dev
Dev
2025-06-13 20:03:51 +08:00
myhloli
148e4660a3 feat: update README to correct link for gradio_app to point to English version 2025-06-13 15:56:54 +08:00
myhloli
dff1170053 feat: update project list in README files to reflect compatibility with version 2.0 2025-06-13 15:54:51 +08:00
Xiaomeng Zhao
0c7a08829b Merge pull request #2611 from myhloli/dev
Dev
2025-06-12 11:26:22 +08:00
myhloli
8078219eff refactor: remove unused uuid import to clean up code 2025-06-11 18:54:39 +08:00
myhloli
7d27726eb2 refactor: improve file naming logic and enhance unique filename generation 2025-06-11 18:48:14 +08:00
myhloli
02898cdd81 refactor: simplify file reading function and improve input validation 2025-06-11 00:32:51 +08:00
myhloli
7eed5ee9c8 refactor: streamline PDF parsing and enhance formula recognition handling 2025-06-11 00:10:24 +08:00
AdrianWang
ee79dd659e feat(mcp): 更新版本号至1.0.0并更新安装说明
将项目版本号更新为1.0.0,并在README中更新安装命令以反映新版本。
2025-06-09 14:26:20 +08:00
AdrianWang
2ef7f9deee feat(mcp): 添加mineru的mcp-server 2025-06-05 19:14:08 +08:00
myhloli
bd9279198c refactor: rename init file and update app.py to enable parsing method 2025-05-27 11:14:28 +08:00
Xiaomeng Zhao
6c9645aa0c Merge pull request #2437 from myhloli/dev
docs(README): reorder installation commands for clarity
2025-05-08 18:56:34 +08:00
myhloli
71a429a32e docs(README): reorder installation commands for clarity 2025-05-08 18:54:39 +08:00
Wang Yubo
862891e294 Update app.py: Fix parameter parsing in /file_parse endpoint
I have updated the `/file_parse` endpoint in `app.py` to correctly handle boolean and string parameters when they are sent via `multipart/form-data` requests (commonly used for file uploads). Previously, these parameters were not being properly parsed because FastAPI expects them to be passed as query or JSON body parameters by default.

### Changes Made:
- Added `Form(...)` to all non-file parameters (`parse_method`, `is_json_md_dump`, `output_dir`, and return flags like `return_layout`, etc.).
- This ensures that FastAPI correctly reads these fields from form-data, allowing clients to send both files and structured configuration options in the same request.

### Why This Change Was Needed:
- When using `requests.post(..., data=data, files=files)`, the `data` dictionary is sent as form-encoded data.
- Without explicitly declaring these fields with `Form(...)`, FastAPI does not bind them correctly, leading to default values always being used (e.g., `False` for boolean flags).
- This change allows the API to accurately reflect the client's intent and enables features like `return_layout`, `return_images`, etc., to work as expected.

This update improves compatibility with HTTP clients that rely on standard form-based file upload mechanisms while preserving the existing behavior of the API.
2025-04-30 17:15:54 +08:00
myhloli
100e9c17a5 feat(latex): enhance LaTeX delimiter support and configurability
- Add support for \(\) and \[\] delimiters in addition to $$ and $$- Make LaTeX delimiter configuration more flexible and user-defined
- Update configuration file to include LaTeX delimiter settings
- Modify OCR content generation to use configurable delimiters
2025-04-28 14:35:39 +08:00
myhloli
4f88fcaa51 feat(ocr): add new Chinese OCR model and update language support
- Add new Chinese OCR model (ch_PP-OCRv4_rec_server_doc_infer) for server-side use
- Update language support in app.py to include new Chinese model
- Modify models_config.yml to add new model configuration
2025-04-23 18:06:12 +08:00
myhloli
fcb5660f6a feat: add support for JPEG images and update documentation
- Add '.jpeg' to the list of supported image extensions in app.py and read_api.py
- Update projects READMEs to indicate that web_demo is deprecated
2025-04-21 14:22:23 +08:00
myhloli
786da939e5 feat(gui): update language options and default settings
- Remove unused 'layoutlmv3' model option
- Update language options to include new 'add_lang' list
- Set default language to 'ch' (Chinese)
- Comment out old 'all_lang' definition for future reference
2025-04-10 15:39:51 +08:00
myhloli
3a820305c8 feat(web_api): update configuration and remove unused code
- Comment out PaddlePaddle GPU installation in Dockerfile
- Add OCR model download URL in download_models.py
- Update config version in magic-pdf.json
- Remove outdated information and simplify README.md
- Remove volume creation for PaddleOCR models in Dockerfile
2025-04-03 16:43:48 +08:00