MinerU

mirror of https://github.com/opendatalab/MinerU.git synced 2026-04-01 13:38:33 +07:00

Author	SHA1	Message	Date
myhloli	9a96362db7	build(deps): update torch and torchvision version requirements - Specify torch==2.3.1 and torchvision==0.18.1 for Windows CUDA installation - Add torch and torchvision version constraints in setup.py: - torch>=2.2.2,<=2.3.1 - torchvision>=0.17.2,<=0.18.1 - Update installation instructions in both English and Chinese README files	2024-12-11 14:31:49 +08:00
myhloli	a296ea41f9	refactor(magic_pdf): optimize environment setup and dependencies - Add environment variables to disable albumentations and yolo updates - Import torchtext and disable deprecation warnings - Update unimernet to 0.2.2 - Specify ultralytics version as >=8.3.48 - Remove upper version limit for torch	2024-12-09 18:08:27 +08:00
myhloli	2ae1039408	build(deps): update dependency versions - Update ultralytics to >=8.3.47	2024-12-09 14:26:48 +08:00
myhloli	1f1335c290	build(deps): specify minimum version for ultralytics - Update `ultralytics` dependency to version >= 8.3.43 - This change ensures compatibility with yolov8 for formula detection	2024-12-06 17:28:24 +08:00
myhloli	d0f633e2d5	build(setup): add old_linux specific dependencies - Add albumentations package with version <=1.4.20 for old_linux - This version is compatible with Linux systems from 2019 and earlier - Version 1.4.21 and above introduced simsimd which is not supported on older Linux systems	2024-11-18 22:34:09 +08:00
myhloli	08f46125a0	refactor(model): rename and restructure model modules	2024-11-15 18:50:05 +08:00
myhloli	fe2c2c0d8e	feat(table): add RapidOCR support for RapidTable model - Integrate RapidOCR with RapidTable model for table recognition - Improve memory management for devices with <= 8GB VRAM - Update table recognition process to use RapidOCR for RapidTable - Add rapidocr-paddle dependency in setup.py	2024-11-09 00:59:59 +08:00
myhloli	240fe99e3c	feat(table): integrate RapidTable model for table recognition - Add RapidTable model support for table recognition - Update table model configuration and initialization - Modify table recognition process to use RapidTable when specified - Add RapidTable dependency to setup.py	2024-11-08 18:26:00 +08:00
myhloli	11f23843b1	feat(table): upgrade StructEqTable model and integrate into PDF Extract Kit - Update StructTableModel to use the latest struct-eqtable library - Add support for HTML table extraction in PDF Extract Kit - Improve error handling and model initialization - Update dependencies in setup.py for struct-eqtable	2024-11-04 17:08:19 +08:00
myhloli	73fe8914cb	build(setup): add doclayout_yolo dependency - Add doclayout_yolo==0.0.2 to the list of dependencies in setup.py	2024-10-23 17:32:07 +08:00
Xiaomeng Zhao	20212a3763	Update setup.py update UniMERNet to 0.2.1	2024-09-10 22:03:05 +08:00
myhloli	3e9bc7a457	refactor(pdf_extract_kit): update model config and weight paths for UniMERNet-0.2.0 Update the paths to model weights and configuration files for the UniMERNet architecture in both the demo.yaml and model_configs.yaml files. Adjust the mfr_model_init function toreflect the new weight and configuration paths. The changes include specifying more detailed paths to the unimernet_base directory and changing the weight file extension to .pth.	2024-09-10 16:11:58 +08:00
myhloli	252139099b	fix(setup): allow latest matplotlib versions on non-Windows platforms The restriction on the matplotlib version has been updated to only apply on Windows platforms, where precompiled packages are not available starting from version 3.9.1. This change enables users on Linux and macOS to install newer versions of matplotlib, addressing compatibility issues with recent bug fixes.	2024-08-04 20:29:39 +08:00
myhloli	9ececf3a1e	fix(dependencies): remove unnecessary pypandoc and struct-eqtable packages;fix matplotlib>=3.9.1 not support Windows system without compilation environment.	2024-08-04 20:23:21 +08:00
icecraft	40e0827e60	Feat/impl cli (#264 ) * feat: refractor cli command * feat: add docs to describe the output files of cli * feat: resove review comments * feat: updat docs about middle.json --------- Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>	2024-08-01 19:21:15 +08:00
myhloli	2c09109ef0	fix(setup): pin unimernet version to 0.1.6 for compatibility	2024-07-30 10:50:03 +08:00
myhloli	46d7549926	fix(setup): update PyMuPDF and paddlepaddle dependencies	2024-07-28 15:49:35 +08:00
myhloli	5c963168fb	feat(setup.py): restructure extras_require options for clarity Refactor the `extras_require` section in `setup.py` to simplify and clarify the available options. Consolidate CPU and GPU requirements into single "lite" and "full" options to streamline installation for users.	2024-07-23 23:45:26 +08:00
myhloli	61fab96eae	fix(setup): specify paddleocr version to fix compatibility issue	2024-07-12 19:57:02 +08:00
myhloli	d458b705aa	feat(setup.py): include package data for magic_pdf.resources Update the setup.py file to explicitly include the package data for the magic_pdf.resources directory. This ensures that all files within thisdirectory are packaged and available for use with the magic_pdf package.	2024-07-12 13:06:12 +08:00
myhloli	bc0f69321a	feat(model): add model mode selection for PDF analysis Introduce a new feature that allows users to choose between a "lite" and a "full" model mode for PDF document analysis. The "lite" mode uses a faster, less accurate model, while the "full" mode employs a higher-precision model at the cost of speed. This selection can be made through the CLI or API, providing flexibility for different use cases.	2024-07-11 17:10:14 +08:00
myhloli	1cedf4572e	update: Update the homepage link	2024-07-08 11:18:30 +08:00
赵小蒙	3aa8ccdceb	update requirements and setup	2024-06-25 19:05:42 +08:00
赵小蒙	129288aae6	update setup config	2024-06-20 17:43:45 +08:00
赵小蒙	756792a3f6	update: add entry points can exec in shell	2024-06-20 16:42:48 +08:00
赵小蒙	9dc5033cf7	update requirements	2024-06-18 14:51:06 +08:00
赵小蒙	9b5b116369	fix: change garbled_rate 0.1 -> 0.02	2024-06-05 15:21:14 +08:00
赵小蒙	07f6c49707	chanage update version logic	2024-06-04 11:33:57 +08:00
赵小蒙	1de37e4c65	add version_name to middle json	2024-06-04 11:15:52 +08:00
赵小蒙	bd1834284e	add version_name to middle json	2024-06-03 18:51:38 +08:00
赵小蒙	75478eda89	update setup	2024-05-30 10:26:10 +08:00
赵小蒙	3f3edc39f5	update setup	2024-05-30 10:25:02 +08:00
赵小蒙	a706743372	setup从tag中自动获取版本号	2024-03-05 15:05:51 +08:00
赵小蒙	7242a4a76e	更新模块版本号	2024-03-05 12:17:02 +08:00
赵小蒙	6cbf7fabcf	更新模块版本号	2024-03-05 12:03:12 +08:00
赵小蒙	779d2e8aaf	修正一些依赖库的版本，兼容spark环境	2024-03-04 17:07:01 +08:00
赵小蒙	044b7de34b	0.1.0 版本released	2024-03-04 12:30:07 +08:00
赵小蒙	03bd97c54f	0.1.2 版本released	2024-03-04 12:25:55 +08:00
赵小蒙	38c7dc100a	0.1.1 版本released	2024-03-04 12:23:52 +08:00
赵小蒙	518005abeb	0.1版本 release	2024-03-04 12:12:51 +08:00
赵小蒙	7228545841	更新版本号	2024-03-01 18:23:55 +08:00
赵小蒙	4033ab154d	更新工作流配置	2024-03-01 18:03:11 +08:00
赵小蒙	d2380d5a14	更新release配置	2024-03-01 17:51:06 +08:00
赵小蒙	ec51cd8e6b	setup.py从requirements.txt获取依赖	2024-03-01 17:07:50 +08:00
赵小蒙	1bbab88165	修改打包项目名称	2024-03-01 16:16:53 +08:00
赵小蒙	d5dbed7325	目录重构	2024-03-01 16:07:51 +08:00
赵小蒙	33e2922ae6	更新依赖包配置和打包配置	2024-03-01 15:17:42 +08:00
赵小蒙	9e7f7550de	配置打包参数	2024-02-29 19:17:36 +08:00

48 Commits