78 Commits

Author SHA1 Message Date
myhloli
05b6ed3d8d feat: enhance logging by adding dynamic log level configuration and performance metrics 2025-12-30 16:43:24 +08:00
myhloli
984b303dfa refactor: update default backend to hybrid-auto-engine and enhance documentation for parsing options 2025-12-25 19:17:08 +08:00
myhloli
5c743dc169 fix: update device handling and backend configuration in analysis scripts 2025-11-11 11:40:52 +08:00
myhloli
2f1369a877 feat: add Mac environment checks and support for Apple Silicon in backend selection 2025-10-28 17:03:56 +08:00
myhloli
8d178b2b7e feat: enhance file type detection by using guess_suffix_by_path for document parsing 2025-09-18 22:41:58 +08:00
myhloli
de5449fd40 refactor: consolidate output processing into a single _process_output function 2025-09-15 11:24:21 +08:00
myhloli
6608615012 docs: update demo.py to reflect changes in backend naming from sglang to vllm 2025-09-15 01:52:14 +08:00
myhloli
54f065d00c refactor: standardize parameter names for formula and table parsing in demo.py 2025-07-05 04:17:24 +08:00
zhanluxianshen
9ba95ba1ec clean demo function notes.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-06-25 11:56:22 +08:00
myhloli
b7d7a1bf99 fix: correct syntax error in demo.py for VLM client backend 2025-06-15 18:19:19 +08:00
myhloli
d41179da84 feat: update parse_doc function to support backend options and add environment variable instructions 2025-06-13 15:48:34 +08:00
myhloli
8737ebb2e2 feat: remove model path parameter from vlm_doc_analyze and streamline model loading 2025-06-13 15:34:42 +08:00
myhloli
91defbb09f feat: enhance PDF parsing functionality with new backend options and improved output handling 2025-06-12 21:50:16 +08:00
myhloli
b0e220c5f0 refactor(demo): simplify batch_demo.py and update demo.py
- Remove unnecessary imports and code in batch_demo.py
- Update demo.py to use relative paths and improve code structure
- Adjust output directory structure in both scripts
- Remove redundant code and simplify functions
2025-04-02 23:58:17 +08:00
icecraft
d91367159f feat: add batch example 2025-04-02 19:40:33 +08:00
myhloli
52efe94da8 feat(api): simplify markdown and content list generation
- Remove DropMode and MakeMode imports from user code
- Set default drop_mode to DropMode.NONE in get_markdown and get_content_list methods
- Remove md_make_mode parameter from get_content_list method
- Add dump_middle_json method to PipeResult
- Update examples in API documentation and demo script
2025-01-07 10:39:53 +08:00
Xiaomeng Zhao
15db6fe95c Update demo.py 2025-01-07 10:18:26 +08:00
Xiaomeng Zhao
3e8d8a3a3b Update demo.py 2025-01-07 10:14:55 +08:00
myhloli
d6a291623b feat(demo): add demo script for PDF processing
- Create demo.py script for PDF file processing
- Implement PDF reading, classification, and inference usingOpendatalab's magic_pdf library- Add pipelines for OCR and text modes
- Include result visualization and markdown export
2024-12-19 18:23:52 +08:00
icecraft
9ec5afaf92 fix: remove deprecated code 2024-12-12 14:39:50 +08:00
myhloli
e11e6b3255 test: batch process demo PDFs- Update test block to iterate through multiple demo PDF files
- Use os.path.join to construct file paths for better cross-platform compatibility
- Remove hardcoded file path
2024-11-25 11:29:12 +08:00
myhloli
17ef5c0f69 feat(demo): add visualization bbox parameter and refactor parsing process
- Add is_draw_visualization_bbox parameter to enable/disable visualization of bounding boxes
- Refactor the parsing process to improve code readability and maintainability
- Update function documentation to reflect new parameter
- Simplify test code by using a more generic variable name
2024-11-25 10:57:46 +08:00
icecraft
ae379e6b59 fix: rewrite projects/ and demos with new data api 2024-11-24 16:07:48 +08:00
icecraft
b1adde8e66 fix: rewrite projects/ and demos with new data api 2024-11-24 16:06:55 +08:00
myhloli
1fc053d57a refactor(magic_pdf_parse_main): optimize model data handling and JSON output
- Add orig_model_list parameter to maintain original model data
- Deep copy model_json and pipe.model_list to preserve data integrity
- Update json_md_dump function call to include orig_model_list
- Improve condition check for empty model_json
2024-11-08 18:49:59 +08:00
myhloli
acab8de50f docs: update model download instructions and simplify demo scripts
- Update model download instructions for versions 0.9.x and later
- Simplify demo scripts by removing unnecessary model configuration
- Add visualization function to draw bounding boxes
- Update CLI help message with new URL
2024-10-27 12:12:56 +08:00
myhloli
7bca348d57 upload ocr_demo pdf 2024-08-01 19:32:30 +08:00
yzz
79fa23f876 add a new file to use MinerU 2024-07-25 18:48:27 +08:00
myhloli
720db843c5 fix(demo): add fallback to internal model when external model data is missingIf no valid model data is provided, the system now checks if an internal model
should be used. This enhances the robustness of the demo pipeline by providing
a default behavior when essential data is not available.
2024-07-18 14:41:40 +08:00
Xiaomeng Zhao
30f06136ec 更新 demo.py 2024-07-16 16:23:27 +08:00
Xiaomeng Zhao
b77ac57676 更新 demo.py 2024-07-16 14:59:35 +08:00
myhloli
6b76f5cbd8 update(readme): Optimizing the Installation Process 2024-07-15 19:41:42 +08:00
myhloli
1e73b9fca0 fix: fasttext not support numpy>=2.0.0 2024-07-07 22:06:02 +08:00
赵小蒙
63a4a06255 update demo model json and code 2024-06-25 17:38:11 +08:00
赵小蒙
8e537ed554 add demo pdf 2024-06-25 15:08:23 +08:00
赵小蒙
c9af3457f5 delete useless files 2024-06-25 11:15:50 +08:00
赵小蒙
4adc761b2e remove old demo 2024-05-06 19:04:08 +08:00
赵小蒙
709a65008a 中间态dict结构调整
部分函数重构
2024-04-15 18:51:58 +08:00
赵小蒙
1b9d65b3d3 1、Trace类的key增加前置下划线
2、实现UNIPipe
2024-04-11 17:43:00 +08:00
myhloli
c8b06ad589 Merge branch 'master' into master 2024-04-10 17:48:18 +08:00
kernel.h@qq.com
c3b8f6d7bb OCR line的左右侧如果超过layoutbox,那么让layoutbox截断左右侧 2024-04-10 16:45:45 +08:00
赵小蒙
00f16239c6 实现parse_ocr_pdf api,切图逻辑s3使用平铺地址,本地使用层级地址,删除预设s3_image_save_path 2024-04-10 15:21:33 +08:00
赵小蒙
c81f699e68 更新libs/config_reader,删除spark/s3.py
pipeline_cor.py pipeline_txt.py, pipeline.py 移动到code_clean并修复一些依赖关系
2024-04-09 15:25:16 +08:00
赵小蒙
f65be6e094 pdf_parse_by_model.py ---> pdf_parse_by_txt.py 2024-04-08 15:12:26 +08:00
赵小蒙
f52c6249be 更新路径输入和markdown输出逻辑 2024-04-08 14:56:13 +08:00
赵小蒙
016cde3ece 修复init错误 2024-03-29 17:29:44 +08:00
赵小蒙
575ca00e01 app.common依赖删除,pipeline_ocr重构 2024-03-29 14:04:57 +08:00
赵小蒙
7f0c734ff6 pipeline重构 2024-03-28 19:02:03 +08:00
赵小蒙
7fcbae01fe demo重构 2024-03-28 17:01:53 +08:00
赵小蒙
8ebb79a43a standard_format dump逻辑更新 2024-03-26 16:37:38 +08:00