Files
MinerU/requirements.txt
myhloli ac88815620 refactor(pdf_check): improve character detection using PyMuPDF
- Replace pdfminer with PyMuPDF for character detection
- Implement new method detect_invalid_chars_by_pymupdf
- Update check_invalid_chars in pdf_meta_scan.py to use new method
- Add __replace_0xfffd function in pdf_parse_union_core_v2.py to handle special characters
- Remove unused imports and update requirements.txt
2024-11-28 22:34:23 +08:00

14 lines
391 B
Plaintext

boto3>=1.28.43
Brotli>=1.1.0
click>=8.1.7
fast-langdetect==0.2.0
loguru>=0.6.0
numpy>=1.21.6,<2.0.0
pydantic>=2.7.2,<2.8.0
PyMuPDF>=1.24.9
scikit-learn>=1.0.2
torch>=2.2.2,<=2.3.1
transformers
# pdfminer.six==20231228
# The requirements.txt must ensure that only necessary external dependencies are introduced. If there are new dependencies to add, please contact the project administrator.