Commit Graph

  • 6a75b39940 feat: add support for MUSA and NPU devices in device management functions myhloli 2026-01-22 15:45:26 +08:00
  • 78e83d00a7 feat: refine DOCX processing to adjust title level handling and streamline list item references myhloli 2026-01-21 18:48:40 +08:00
  • 7bcd3afb86 feat: refactor DOCX processing to unify block type handling for captions and list items myhloli 2026-01-21 17:21:34 +08:00
  • 517257e058 feat: enhance DOCX text processing to include equations and hyperlinks in content generation myhloli 2026-01-21 17:04:22 +08:00
  • 313ec8afa0 feat: add MinerU configuration options for vlm/hybrid backend myhloli 2026-01-21 11:16:29 +08:00
  • ee4065ffd5 feat: refactor line object construction in DOCX processing to streamline block handling myhloli 2026-01-21 11:08:19 +08:00
  • 940289d083 feat: add Enflame setup guide and Dockerfile for GCU support myhloli 2026-01-20 18:55:29 +08:00
  • 21c9267a93 feat: enhance memory management for GCU device support in clean_memory function myhloli 2026-01-20 17:34:27 +08:00
  • 10d996d14e feat: add 2.7.2 release notes with cross-page table merging optimization and new device support myhloli 2026-01-20 17:26:21 +08:00
  • 1d697c20bf feat: update region area calculation for compatibility with skimage version 0.26.0 myhloli 2026-01-20 17:07:53 +08:00
  • 333f6d3a32 Merge branch 'opendatalab:dev' into dev Xiaomeng Zhao 2026-01-20 16:26:22 +08:00
  • 7620bd4ccc feat: add GCU device support for bf16 and memory calculations myhloli 2026-01-20 16:06:20 +08:00
  • 5026faa458 feat: update DOCX processing to change test document path and enhance chart handling with improved content structure myhloli 2026-01-19 20:18:24 +08:00
  • 8381e61f0c feat: update DOCX processing to change test document path and enhance chart handling with improved content structure myhloli 2026-01-19 18:37:56 +08:00
  • a8c4b6c2fe Merge pull request #4388 from myhloli/add_docx Xiaomeng Zhao 2026-01-19 17:02:34 +08:00
  • a0b0eb704c Merge branch 'opendatalab:add_docx' into add_docx Xiaomeng Zhao 2026-01-19 17:01:40 +08:00
  • 1ed570a205 Merge pull request #4386 from Sidney233/add_docx Xiaomeng Zhao 2026-01-19 17:00:58 +08:00
  • 21ebf6bdb1 Merge branch 'opendatalab:add_docx' into add_docx Sidney233 2026-01-19 16:28:13 +08:00
  • 11513dd44c feat: 添加图表处理excel表格 Sidney233 2026-01-19 16:26:27 +08:00
  • 5706011633 feat: update device handling in YOLO model initialization for improved compatibility myhloli 2026-01-19 15:58:10 +08:00
  • df07baea6c feat: enhance table merging logic with effective column calculations and visual consistency checks myhloli 2026-01-16 18:58:10 +08:00
  • c73c1d3847 feat: add support for Chinese continuation marker in table merging logic myhloli 2026-01-16 17:17:47 +08:00
  • 32592cd27f feat: enhance DOCX header and footer processing by adding deduplication for inline equations and hyperlinks myhloli 2026-01-16 17:16:51 +08:00
  • 9137f84591 feat: enhance DOCX header and footer processing by adding deduplication for inline equations and hyperlinks myhloli 2026-01-16 16:14:04 +08:00
  • 56c3bb3570 feat: enhance DOCX processing by adding hyperlink support and improving text formatting with equations myhloli 2026-01-16 16:04:45 +08:00
  • 23e3a73f33 feat: enhance DOCX processing by adding hyperlink support and improving text formatting with equations myhloli 2026-01-16 15:22:30 +08:00
  • e7c67a95b6 feat: enhance DOCX processing by adding main execution block and improving text content handling with equations myhloli 2026-01-15 20:01:22 +08:00
  • ea6bb2ede9 feat: update result_to_middle_json function to streamline parameters and enhance JSON output structure myhloli 2026-01-14 19:50:38 +08:00
  • c6543b4aeb Merge pull request #4368 from myhloli/dev Xiaomeng Zhao 2026-01-14 17:01:59 +08:00
  • 5116192d32 feat: replace Gradio app script with iframe for improved integration myhloli 2026-01-14 17:01:11 +08:00
  • 810717b42a feat: refactor DOCX processing by consolidating image handling and introducing MagicModel for block management myhloli 2026-01-14 16:56:49 +08:00
  • 7554127ff7 Merge pull request #4367 from Sidney233/add_docx Xiaomeng Zhao 2026-01-14 16:47:31 +08:00
  • 201ba86072 Merge pull request #4365 from tommygood/docs/fix-typos-spans-pdf Xiaomeng Zhao 2026-01-14 16:44:38 +08:00
  • 087d3686c5 @tommygood has signed the CLA in opendatalab/MinerU#4365 github-actions[bot] 2026-01-14 08:06:04 +00:00
  • d629ede38a 修复list_item重复统计,过滤掉内容为空的block,页眉页脚的纯数字内容不添加进page,caption识别 Sidney233 2026-01-14 16:04:47 +08:00
  • 11252a5636 docs: correct file naming format to use '_span.pdf' tommygood 2026-01-14 15:48:31 +08:00
  • db40932e6d Merge pull request #4359 from opendatalab/dev Xiaomeng Zhao 2026-01-13 19:51:36 +08:00
  • 03698c656e Merge pull request #4358 from myhloli/dev Xiaomeng Zhao 2026-01-13 19:50:19 +08:00
  • 48ded6b06c feat: add Hygon entry to acceleration cards list myhloli 2026-01-13 19:47:08 +08:00
  • 4e66217909 Merge pull request #4357 from myhloli/dev Xiaomeng Zhao 2026-01-13 19:46:07 +08:00
  • bdec40487e feat: add Dockerfile for vLLM inference environment and Hygon platform documentation myhloli 2026-01-13 19:45:18 +08:00
  • ec9b05003d fix: add support for ellipsis continuation marker in table merging logic myhloli 2026-01-13 15:50:15 +08:00
  • 6501ad878d Merge pull request #4353 from myhloli/add_docx Xiaomeng Zhao 2026-01-13 15:15:15 +08:00
  • 6c8fa9776f feat: simplify element handling in DOCX processing by removing unnecessary references and improving structure myhloli 2026-01-13 15:13:38 +08:00
  • 1d93aa8ab9 Merge branch 'opendatalab:add_docx' into add_docx Xiaomeng Zhao 2026-01-12 19:08:34 +08:00
  • 9aba297545 Merge pull request #4349 from Sidney233/add_docx Xiaomeng Zhao 2026-01-12 19:08:01 +08:00
  • dec84a9b5a feat: disable model output dumping in DOCX processing for improved performance myhloli 2026-01-12 18:33:13 +08:00
  • cbe39f4a5a Merge branch 'opendatalab:add_docx' into add_docx Sidney233 2026-01-12 17:10:51 +08:00
  • e042384953 分节与页眉页脚处理完成 Sidney233 2026-01-12 17:08:37 +08:00
  • a644a8a074 分节处理问题的TODO Sidney233 2026-01-09 17:39:16 +08:00
  • c10f248721 Merge pull request #4330 from opendatalab/master Xiaomeng Zhao 2026-01-09 12:04:07 +08:00
  • 30c5d10e05 Archive MinerU Project List and update notes Xiaomeng Zhao 2026-01-09 12:02:21 +08:00
  • 4c3be9273c Fix typo in README_zh-CN.md Xiaomeng Zhao 2026-01-09 12:00:26 +08:00
  • 1833163b97 Mark MinerU project as archived in README Xiaomeng Zhao 2026-01-09 12:00:10 +08:00
  • eb55029adf Merge pull request #4318 from myhloli/dev Xiaomeng Zhao 2026-01-07 20:27:38 +08:00
  • 2eef53a9f0 fix: improve continuation marker handling in table caption merging logic myhloli 2026-01-07 20:17:13 +08:00
  • 9e6e2bde85 fix: refine caption merging logic to improve handling of continuation markers myhloli 2026-01-07 20:05:26 +08:00
  • c73e93bec0 fix: enhance table merging logic to handle footnotes more effectively myhloli 2026-01-07 19:50:46 +08:00
  • 07db6839b8 feat: refactor logging to use loguru and add BlockType and ContentBlock classes for structured content handling myhloli 2026-01-07 19:24:18 +08:00
  • 17394682e2 feat: add lxml dependency for enhanced XML processing in DOCX handling myhloli 2026-01-07 14:22:57 +08:00
  • 97bd2a2b94 Merge pull request #4310 from myhloli/add_docx Xiaomeng Zhao 2026-01-07 10:38:52 +08:00
  • ad175df3d2 feat: enhance DOCX processing by refining image handling and improving logging for inference timing myhloli 2026-01-06 20:04:06 +08:00
  • 0cbe965d97 feat: enhance DOCX processing by adding support for office file types and refactoring related functions myhloli 2026-01-06 19:49:40 +08:00
  • 74f6d4d0e7 Merge pull request #4309 from myhloli/add_docx Xiaomeng Zhao 2026-01-06 17:19:32 +08:00
  • 648fb1f7cf feat: update array formatting in latex_dict.py for improved LaTeX output myhloli 2026-01-06 17:19:01 +08:00
  • b6fc07cf9e feat: replace logging with loguru for improved logging functionality in OMML processing myhloli 2026-01-06 17:11:06 +08:00
  • 57be6926a9 feat: enhance OMML processing with additional LaTeX functions and improve unicode handling myhloli 2026-01-06 16:22:05 +08:00
  • 7abcfa39a0 Merge pull request #4308 from myhloli/add_docx Xiaomeng Zhao 2026-01-06 16:17:19 +08:00
  • 6f76664141 feat: refactor DOCX utilities and update dependencies for improved processing myhloli 2026-01-06 16:15:44 +08:00
  • 1dfbea157a Merge pull request #4306 from opendatalab/master Xiaomeng Zhao 2026-01-06 15:03:27 +08:00
  • 96840733c4 Update version.py with new version myhloli 2026-01-06 06:55:29 +00:00
  • 45f8ad1d5c Merge pull request #4305 from opendatalab/release-2.7.1 mineru-2.7.1-released Xiaomeng Zhao 2026-01-06 14:47:23 +08:00
  • b69191ba2b Merge pull request #4304 from opendatalab/dev release-2.7.1 Xiaomeng Zhao 2026-01-06 14:46:18 +08:00
  • 0028514ced Merge pull request #4303 from myhloli/dev Xiaomeng Zhao 2026-01-06 14:45:35 +08:00
  • 8d8daf6851 fix: add qwen-vl-utils dependency to pyproject.toml myhloli 2026-01-06 14:44:53 +08:00
  • 815280dd23 fix: update pdfminer.six dependency to resolve CVE-2025-64512 and improve EXIF handling myhloli 2026-01-06 14:42:48 +08:00
  • 7b52f92aea fix: update pdfminer.six dependency to resolve CVE-2025-64512 and improve EXIF handling myhloli 2026-01-06 14:41:47 +08:00
  • 33543b76c9 Merge pull request #4301 from myhloli/dev Xiaomeng Zhao 2026-01-06 14:10:08 +08:00
  • ea5f8e98dd fix: update pdfminer.six version to 20251230 in pyproject.toml myhloli 2026-01-06 11:54:17 +08:00
  • 8996e06448 fix: restore hybrid analyze imports in common.py for backend processing myhloli 2026-01-06 11:51:31 +08:00
  • 23bc263b85 Merge pull request #4299 from myhloli/add_docx Xiaomeng Zhao 2026-01-06 11:27:20 +08:00
  • 53fb1cd055 feat: implement DOCX processing in the converter myhloli 2026-01-06 11:26:18 +08:00
  • f0ce905c7d fix: update pdfminer.six version to 20251230 in pyproject.toml myhloli 2026-01-05 19:08:25 +08:00
  • df33d483de Merge remote-tracking branch 'origin/add_docx' into add_docx myhloli 2026-01-05 17:44:38 +08:00
  • f44fb174ea feat: add support for DOCX file format in converter myhloli 2026-01-05 17:44:17 +08:00
  • 70b1e73606 Merge pull request #4293 from Sidney233/docx-dev Xiaomeng Zhao 2026-01-05 17:43:10 +08:00
  • 11fb0a0199 fix: add util files Sidney233 2026-01-05 17:40:37 +08:00
  • 66f8f0e93a Merge pull request #4292 from Sidney233/docx-dev Xiaomeng Zhao 2026-01-05 14:10:21 +08:00
  • 942c1693c7 Merge branch 'add_docx' into docx-dev Xiaomeng Zhao 2026-01-05 14:10:04 +08:00
  • 7387797b17 work review Sidney233 2026-01-05 13:33:55 +08:00
  • bfb304ef1f fix: improve EXIF handling and save PDF logic in pdf_image_tools.py myhloli 2026-01-05 00:27:01 +08:00
  • 17e6016b58 Merge pull request #4283 from kingdomad/fix/image-exif-rotation Xiaomeng Zhao 2026-01-04 18:31:06 +08:00
  • ba06cd14ef Update pdf_image_tools.py Xiaomeng Zhao 2026-01-04 18:29:51 +08:00
  • 0209ada8d0 Merge pull request #4287 from myhloli/dev Xiaomeng Zhao 2026-01-04 15:26:16 +08:00
  • e2140222bc docs: update VastAI.md with new version numbers and improved instructions myhloli 2026-01-04 15:24:23 +08:00
  • d679d99192 docs: update heading from '快速开始' to '快速入门' for consistency myhloli 2026-01-04 15:16:15 +08:00
  • 4bfcc0b808 Merge pull request #4286 from opendatalab/master Xiaomeng Zhao 2026-01-04 15:12:00 +08:00
  • ead29489ff Merge pull request #4285 from myhloli/dev Xiaomeng Zhao 2026-01-04 15:11:29 +08:00
  • c01e35b4c6 docs: update navigation and terminology in documentation for clarity myhloli 2026-01-04 15:10:37 +08:00
  • a89249069c Merge pull request #4284 from myhloli/dev Xiaomeng Zhao 2026-01-04 14:34:15 +08:00