mirror of
https://github.com/opendatalab/MinerU.git
synced 2026-03-27 11:08:32 +07:00
- Replace pdfminer with PyMuPDF for character detection - Implement new method detect_invalid_chars_by_pymupdf - Update check_invalid_chars in pdf_meta_scan.py to use new method - Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters - Remove unused imports and update requirements.txt
14 lines
391 B
Plaintext
14 lines
391 B
Plaintext
boto3>=1.28.43
|
|
Brotli>=1.1.0
|
|
click>=8.1.7
|
|
fast-langdetect==0.2.0
|
|
loguru>=0.6.0
|
|
numpy>=1.21.6,<2.0.0
|
|
pydantic>=2.7.2,<2.8.0
|
|
PyMuPDF>=1.24.9
|
|
scikit-learn>=1.0.2
|
|
torch>=2.2.2,<=2.3.1
|
|
transformers
|
|
# pdfminer.six==20231228
|
|
# The requirements.txt must ensure that only necessary external dependencies are introduced. If there are new dependencies to add, please contact the project administrator.
|