- Refactor remove_outside_spans function to filter spans more accurately
- Add image_footnote, index, and list block types to output file documentation
- Update draw_span_bbox to use preproc_blocks instead of para_blocks
- Bump version to 0.9.0
- Update remove_outside_spans function to handle all content types
- Add processing for text and equation spans
- Improve overlap calculation for better accuracy
- Update remove_outside_spans function to handle all content types
- Add processing for text and equation spans
- Improve overlap calculation for better accuracy
- Update remove_outside_spans function to handle all content types
- Add processing for text and equation spans
- Improve overlap calculation for better accuracy
- Add new function `remove_outside_spans` to filter spans based on image and table blocks
- Reorder span processing steps to improve efficiency
- Update imports to include `calculate_overlap_area_in_bbox1_area_ratio`
- Add check for 'image_path' in spans to avoid errors when it's missing
- Update image handling in both paragraph text and content dictionary
- Improve error handling and make the code more robust
- Update image content extraction to iterate through all spans in a block
- Add support for extracting table content from spans within a block
- Handle multiple content types within table spans (latex, html, image)
- Refactor code to be more modular and easier to maintain
- Update PyPI mirror from Tsinghua to Aliyun in multiple Dockerfiles and installation scripts
- This change may improve package download speed and reliability for users in China
- Update README.md and README_zh-CN.md to include new model download instructions
- Provide detailed steps on how to download models after PDF-Extract-Kit 1.0 repository change
- Emphasize the need to re-download models due to repository change
- Update README.md and README_zh-CN.md to include new model download instructions
- Provide detailed steps on how to download models after PDF-Extract-Kit 1.0 repository change
- Emphasize the need to re-download models due to repository change
- Remove import and usage of StructTableModel- Add support for TableMaster model- Update table model initialization logic to support TableMaster
- Log error and exit if StructEqTable is selected, as it's under upgrade
- Update README files to reflect changes in table parsing capabilities
- Change the logo path from 'docs/images/MinerU-logo.png' to 'old_docs/images/MinerU-logo.png' in both README.md and README_zh-CN.md- This update ensures that the correct logo is displayed in the project's README files
- Add changelog for v0.9.0 release with major refactoring and improvements
- Update key features list to include new functionalities
- Modify system requirements and hardware support information
- Add section for deploying derived projects
- Update known issues and TODO list
- Modify the logic for splitting wide blocks exceeding 0.4 page width
- Remove the specific case for blocks exceeding 0.25 page width
- Add comments to explain the reasoning behind different splitting strategies
- Update model download instructions for versions 0.9.x and later
- Simplify demo scripts by removing unnecessary model configuration
- Add visualization function to draw bounding boxes
- Update CLI help message with new URL