myhloli
05b6ed3d8d
feat: enhance logging by adding dynamic log level configuration and performance metrics
2025-12-30 16:43:24 +08:00
myhloli
984b303dfa
refactor: update default backend to hybrid-auto-engine and enhance documentation for parsing options
2025-12-25 19:17:08 +08:00
myhloli
5c743dc169
fix: update device handling and backend configuration in analysis scripts
2025-11-11 11:40:52 +08:00
myhloli
2f1369a877
feat: add Mac environment checks and support for Apple Silicon in backend selection
2025-10-28 17:03:56 +08:00
myhloli
8d178b2b7e
feat: enhance file type detection by using guess_suffix_by_path for document parsing
2025-09-18 22:41:58 +08:00
myhloli
de5449fd40
refactor: consolidate output processing into a single _process_output function
2025-09-15 11:24:21 +08:00
myhloli
6608615012
docs: update demo.py to reflect changes in backend naming from sglang to vllm
2025-09-15 01:52:14 +08:00
myhloli
54f065d00c
refactor: standardize parameter names for formula and table parsing in demo.py
2025-07-05 04:17:24 +08:00
zhanluxianshen
9ba95ba1ec
clean demo function notes.
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com >
2025-06-25 11:56:22 +08:00
myhloli
b7d7a1bf99
fix: correct syntax error in demo.py for VLM client backend
2025-06-15 18:19:19 +08:00
myhloli
d41179da84
feat: update parse_doc function to support backend options and add environment variable instructions
2025-06-13 15:48:34 +08:00
myhloli
8737ebb2e2
feat: remove model path parameter from vlm_doc_analyze and streamline model loading
2025-06-13 15:34:42 +08:00
myhloli
91defbb09f
feat: enhance PDF parsing functionality with new backend options and improved output handling
2025-06-12 21:50:16 +08:00
myhloli
b0e220c5f0
refactor(demo): simplify batch_demo.py and update demo.py
...
- Remove unnecessary imports and code in batch_demo.py
- Update demo.py to use relative paths and improve code structure
- Adjust output directory structure in both scripts
- Remove redundant code and simplify functions
2025-04-02 23:58:17 +08:00
icecraft
d91367159f
feat: add batch example
2025-04-02 19:40:33 +08:00
myhloli
52efe94da8
feat(api): simplify markdown and content list generation
...
- Remove DropMode and MakeMode imports from user code
- Set default drop_mode to DropMode.NONE in get_markdown and get_content_list methods
- Remove md_make_mode parameter from get_content_list method
- Add dump_middle_json method to PipeResult
- Update examples in API documentation and demo script
2025-01-07 10:39:53 +08:00
Xiaomeng Zhao
15db6fe95c
Update demo.py
2025-01-07 10:18:26 +08:00
Xiaomeng Zhao
3e8d8a3a3b
Update demo.py
2025-01-07 10:14:55 +08:00
myhloli
d6a291623b
feat(demo): add demo script for PDF processing
...
- Create demo.py script for PDF file processing
- Implement PDF reading, classification, and inference usingOpendatalab's magic_pdf library- Add pipelines for OCR and text modes
- Include result visualization and markdown export
2024-12-19 18:23:52 +08:00
icecraft
9ec5afaf92
fix: remove deprecated code
2024-12-12 14:39:50 +08:00
myhloli
e11e6b3255
test: batch process demo PDFs- Update test block to iterate through multiple demo PDF files
...
- Use os.path.join to construct file paths for better cross-platform compatibility
- Remove hardcoded file path
2024-11-25 11:29:12 +08:00
myhloli
17ef5c0f69
feat(demo): add visualization bbox parameter and refactor parsing process
...
- Add is_draw_visualization_bbox parameter to enable/disable visualization of bounding boxes
- Refactor the parsing process to improve code readability and maintainability
- Update function documentation to reflect new parameter
- Simplify test code by using a more generic variable name
2024-11-25 10:57:46 +08:00
icecraft
ae379e6b59
fix: rewrite projects/ and demos with new data api
2024-11-24 16:07:48 +08:00
icecraft
b1adde8e66
fix: rewrite projects/ and demos with new data api
2024-11-24 16:06:55 +08:00
myhloli
1fc053d57a
refactor(magic_pdf_parse_main): optimize model data handling and JSON output
...
- Add orig_model_list parameter to maintain original model data
- Deep copy model_json and pipe.model_list to preserve data integrity
- Update json_md_dump function call to include orig_model_list
- Improve condition check for empty model_json
2024-11-08 18:49:59 +08:00
myhloli
acab8de50f
docs: update model download instructions and simplify demo scripts
...
- Update model download instructions for versions 0.9.x and later
- Simplify demo scripts by removing unnecessary model configuration
- Add visualization function to draw bounding boxes
- Update CLI help message with new URL
2024-10-27 12:12:56 +08:00
myhloli
7bca348d57
upload ocr_demo pdf
2024-08-01 19:32:30 +08:00
yzz
79fa23f876
add a new file to use MinerU
2024-07-25 18:48:27 +08:00
myhloli
720db843c5
fix(demo): add fallback to internal model when external model data is missingIf no valid model data is provided, the system now checks if an internal model
...
should be used. This enhances the robustness of the demo pipeline by providing
a default behavior when essential data is not available.
2024-07-18 14:41:40 +08:00
Xiaomeng Zhao
30f06136ec
更新 demo.py
2024-07-16 16:23:27 +08:00
Xiaomeng Zhao
b77ac57676
更新 demo.py
2024-07-16 14:59:35 +08:00
myhloli
6b76f5cbd8
update(readme): Optimizing the Installation Process
2024-07-15 19:41:42 +08:00
myhloli
1e73b9fca0
fix: fasttext not support numpy>=2.0.0
2024-07-07 22:06:02 +08:00
赵小蒙
63a4a06255
update demo model json and code
2024-06-25 17:38:11 +08:00
赵小蒙
8e537ed554
add demo pdf
2024-06-25 15:08:23 +08:00
赵小蒙
c9af3457f5
delete useless files
2024-06-25 11:15:50 +08:00
赵小蒙
4adc761b2e
remove old demo
2024-05-06 19:04:08 +08:00
赵小蒙
709a65008a
中间态dict结构调整
...
部分函数重构
2024-04-15 18:51:58 +08:00
赵小蒙
1b9d65b3d3
1、Trace类的key增加前置下划线
...
2、实现UNIPipe
2024-04-11 17:43:00 +08:00
myhloli
c8b06ad589
Merge branch 'master' into master
2024-04-10 17:48:18 +08:00
kernel.h@qq.com
c3b8f6d7bb
OCR line的左右侧如果超过layoutbox,那么让layoutbox截断左右侧
2024-04-10 16:45:45 +08:00
赵小蒙
00f16239c6
实现parse_ocr_pdf api,切图逻辑s3使用平铺地址,本地使用层级地址,删除预设s3_image_save_path
2024-04-10 15:21:33 +08:00
赵小蒙
c81f699e68
更新libs/config_reader,删除spark/s3.py
...
pipeline_cor.py pipeline_txt.py, pipeline.py 移动到code_clean并修复一些依赖关系
2024-04-09 15:25:16 +08:00
赵小蒙
f65be6e094
pdf_parse_by_model.py ---> pdf_parse_by_txt.py
2024-04-08 15:12:26 +08:00
赵小蒙
f52c6249be
更新路径输入和markdown输出逻辑
2024-04-08 14:56:13 +08:00
赵小蒙
016cde3ece
修复init错误
2024-03-29 17:29:44 +08:00
赵小蒙
575ca00e01
app.common依赖删除,pipeline_ocr重构
2024-03-29 14:04:57 +08:00
赵小蒙
7f0c734ff6
pipeline重构
2024-03-28 19:02:03 +08:00
赵小蒙
7fcbae01fe
demo重构
2024-03-28 17:01:53 +08:00
赵小蒙
8ebb79a43a
standard_format dump逻辑更新
2024-03-26 16:37:38 +08:00