MinerU

mirror of https://github.com/opendatalab/MinerU.git synced 2026-03-27 11:08:32 +07:00

Author	SHA1	Message	Date
myhloli	c4efdc53be	feat: update README to clarify project submission status and improve layout	2026-03-20 19:19:34 +08:00
myhloli	1d96a77a19	Refactor FastAPI to add asynchronous task upload and download interfaces.	2026-03-19 19:05:54 +08:00
Xiaomeng Zhao	30c5d10e05	Archive MinerU Project List and update notes Updated README to indicate the project is archived and added a note about community contributions.	2026-01-09 12:02:21 +08:00
Xiaomeng Zhao	4c3be9273c	Fix typo in README_zh-CN.md	2026-01-09 12:00:26 +08:00
Xiaomeng Zhao	1833163b97	Mark MinerU project as archived in README Updated README to indicate the project is archived and added a note about community contributions.	2026-01-09 12:00:10 +08:00
Xiaomeng Zhao	23c292409f	Update projects/mineru_tianshu/api_server.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-02 03:39:13 +08:00
myhloli	bcb30fe79c	fix: simplify VRAM size retrieval and improve error handling in memory management	2025-12-01 18:31:07 +08:00
zyileven	ab2c67d477	Fix MinIO API calls and improve error handling	2025-11-28 15:02:26 +08:00
zyileven	d800df6ae5	Add an interface for obtaining more Mineru processing data	2025-11-26 14:28:16 +08:00
Xiaomeng Zhao	dad59f7d52	Merge pull request #3760 from magicyuan876/master feat(tianshu): v2.0 架构升级 - Worker主动拉取模式	2025-10-17 18:31:38 +08:00
Magic_yuan	cedc62a728	完善markitdown依赖	2025-10-17 16:17:03 +08:00
Magic_yuan	e7d8bf097a	修复codereview建议	2025-10-17 13:04:49 +08:00
Magic_yuan	08a89aeca1	feat(tianshu): v2.0 架构升级 - Worker主动拉取模式主要改进: - Worker主动拉取任务，响应速度提升10-20倍 (5-10s → 0.5s) - 数据库并发安全增强，使用原子操作防止任务重复 - 调度器变为可选监控组件，默认不启动 - 修复多GPU显存占用问题，完全隔离各进程新增功能: - API自动返回解析内容 - 结果文件自动清理（可配置） - 支持图片上传MinIO	2025-10-17 11:46:42 +08:00
myhloli	a36118f8ba	Add mineru_tianshu project to README files for version 2.0 compatibility	2025-10-16 17:38:57 +08:00
Xiaomeng Zhao	504fe6ada3	Merge pull request #3742 from magicyuan876/master feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务	2025-10-16 17:33:54 +08:00
Magic_yuan	484ff5a6f9	修复codereview问题	2025-10-16 16:04:42 +08:00
Magic_yuan	3bf50d5267	feat: MinerU Tianshu 项目 - 开箱即用的多GPU文档解析服务项目简介：天枢（Tianshu）是基于 MinerU 的文档解析服务，采用 SQLite 任务队列 + LitServe GPU 负载均衡架构，支持异步处理、任务持久化和多格式文档智能解析。核心功能： - 异步任务处理：客户端立即响应，后台处理任务 - 智能解析器：PDF/图片使用 MinerU（GPU加速），Office/文本使用 MarkItDown - GPU 负载均衡：基于 LitServe 实现多GPU自动调度 - 任务持久化：SQLite 存储，服务重启任务不丢失 - 优先级队列：支持任务优先级设置 - RESTful API：完整的任务管理接口 - MinIO 集成：支持图片上传到对象存储项目架构： - api_server.py: FastAPI Web 服务器，提供 RESTful API - task_db.py: SQLite 任务数据库管理器 - litserve_worker.py: LitServe Worker Pool，GPU 负载均衡 - task_scheduler.py: 异步任务调度器 - start_all.py: 统一启动脚本 - client_example.py: Python 客户端示例技术栈： FastAPI, LitServe, SQLite, MinerU, MarkItDown, MinIO, Loguru	2025-10-16 08:41:51 +08:00
myhloli	44fdeb663f	Refactor async function and improve output directory handling in prediction	2025-10-13 11:32:28 +08:00
myhloli	3ec6479462	fix: update backend comment to reflect renaming from sglang-engine to vlm-vllm-engine	2025-09-15 02:00:58 +08:00
zhanluxianshen	1671e68367	fix error logs for multi_gpu endpoint. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-08-26 10:26:10 +08:00
Xiaomeng Zhao	d3f6736e0a	Update _config_endpoint.py	2025-07-05 04:33:49 +08:00
Xiaomeng Zhao	07b4cbc0ec	Update projects/multi_gpu_v2/_config_endpoint.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-05 04:32:31 +08:00
Xiaomeng Zhao	c08a86d6c7	Update projects/multi_gpu_v2/server.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-05 04:29:36 +08:00
Xiaomeng Zhao	919280aaa1	Merge branch 'dev' into multi_gpu_v2	2025-07-05 04:20:52 +08:00
Xiaomeng Zhao	ea9336c0c1	Update server.py	2025-07-05 04:14:58 +08:00
myhloli	802ccd938c	refactor: remove multi_gpu project reference from README files	2025-07-05 01:50:47 +08:00
myhloli	8ffdbe6a41	fix: update sglang version requirement in error message and clean up README files	2025-06-30 23:06:27 +08:00
yuanzhou	d3156f76ad	Update compatible projects list	2025-06-18 22:02:52 +08:00
ca1yz	3f32f2a587	Update README.md	2025-06-18 19:08:39 +08:00
ca1yz	dbfd392f05	add updated example project based on 2.0	2025-06-18 19:07:53 +08:00
myhloli	4338b63337	feat: add MCP server documentation to README and README_zh-CN	2025-06-13 20:27:11 +08:00
Xiaomeng Zhao	bcbbee8cbd	Merge pull request #2622 from myhloli/dev Dev	2025-06-13 20:03:51 +08:00
myhloli	148e4660a3	feat: update README to correct link for gradio_app to point to English version	2025-06-13 15:56:54 +08:00
myhloli	dff1170053	feat: update project list in README files to reflect compatibility with version 2.0	2025-06-13 15:54:51 +08:00
Xiaomeng Zhao	0c7a08829b	Merge pull request #2611 from myhloli/dev Dev	2025-06-12 11:26:22 +08:00
myhloli	8078219eff	refactor: remove unused uuid import to clean up code	2025-06-11 18:54:39 +08:00
myhloli	7d27726eb2	refactor: improve file naming logic and enhance unique filename generation	2025-06-11 18:48:14 +08:00
myhloli	02898cdd81	refactor: simplify file reading function and improve input validation	2025-06-11 00:32:51 +08:00
myhloli	7eed5ee9c8	refactor: streamline PDF parsing and enhance formula recognition handling	2025-06-11 00:10:24 +08:00
AdrianWang	ee79dd659e	feat(mcp): 更新版本号至1.0.0并更新安装说明将项目版本号更新为1.0.0，并在README中更新安装命令以反映新版本。	2025-06-09 14:26:20 +08:00
AdrianWang	2ef7f9deee	feat(mcp): 添加mineru的mcp-server	2025-06-05 19:14:08 +08:00
myhloli	bd9279198c	refactor: rename init file and update app.py to enable parsing method	2025-05-27 11:14:28 +08:00
Xiaomeng Zhao	6c9645aa0c	Merge pull request #2437 from myhloli/dev docs(README): reorder installation commands for clarity	2025-05-08 18:56:34 +08:00
myhloli	71a429a32e	docs(README): reorder installation commands for clarity	2025-05-08 18:54:39 +08:00
Wang Yubo	862891e294	Update app.py: Fix parameter parsing in /file_parse endpoint I have updated the `/file_parse` endpoint in `app.py` to correctly handle boolean and string parameters when they are sent via `multipart/form-data` requests (commonly used for file uploads). Previously, these parameters were not being properly parsed because FastAPI expects them to be passed as query or JSON body parameters by default. ### Changes Made: - Added `Form(...)` to all non-file parameters (`parse_method`, `is_json_md_dump`, `output_dir`, and return flags like `return_layout`, etc.). - This ensures that FastAPI correctly reads these fields from form-data, allowing clients to send both files and structured configuration options in the same request. ### Why This Change Was Needed: - When using `requests.post(..., data=data, files=files)`, the `data` dictionary is sent as form-encoded data. - Without explicitly declaring these fields with `Form(...)`, FastAPI does not bind them correctly, leading to default values always being used (e.g., `False` for boolean flags). - This change allows the API to accurately reflect the client's intent and enables features like `return_layout`, `return_images`, etc., to work as expected. This update improves compatibility with HTTP clients that rely on standard form-based file upload mechanisms while preserving the existing behavior of the API.	2025-04-30 17:15:54 +08:00
myhloli	100e9c17a5	feat(latex): enhance LaTeX delimiter support and configurability - Add support for and \[\] delimiters in addition to $$ and $$- Make LaTeX delimiter configuration more flexible and user-defined - Update configuration file to include LaTeX delimiter settings - Modify OCR content generation to use configurable delimiters	2025-04-28 14:35:39 +08:00
myhloli	4f88fcaa51	feat(ocr): add new Chinese OCR model and update language support - Add new Chinese OCR model (ch_PP-OCRv4_rec_server_doc_infer) for server-side use - Update language support in app.py to include new Chinese model - Modify models_config.yml to add new model configuration	2025-04-23 18:06:12 +08:00
myhloli	fcb5660f6a	feat: add support for JPEG images and update documentation - Add '.jpeg' to the list of supported image extensions in app.py and read_api.py - Update projects READMEs to indicate that web_demo is deprecated	2025-04-21 14:22:23 +08:00
myhloli	786da939e5	feat(gui): update language options and default settings - Remove unused 'layoutlmv3' model option - Update language options to include new 'add_lang' list - Set default language to 'ch' (Chinese) - Comment out old 'all_lang' definition for future reference	2025-04-10 15:39:51 +08:00
myhloli	3a820305c8	feat(web_api): update configuration and remove unused code - Comment out PaddlePaddle GPU installation in Dockerfile - Add OCR model download URL in download_models.py - Update config version in magic-pdf.json - Remove outdated information and simplify README.md - Remove volume creation for PaddleOCR models in Dockerfile	2025-04-03 16:43:48 +08:00

1 2 3

132 Commits