Commit Graph

217 Commits

Author SHA1 Message Date
赵小蒙
892f522aea update 2024-04-07 10:12:20 +08:00
赵小蒙
26e19fd220 mk_nlp_markdown中table的拼接方式更新 magic_pdf-0.3.37-released 2024-03-29 17:55:58 +08:00
赵小蒙
cd8b2d2c78 修复import错误 magic_pdf-0.3.36-released 2024-03-29 17:48:41 +08:00
赵小蒙
d35c49268d 修复import错误 magic_pdf-0.3.35-released 2024-03-29 17:41:53 +08:00
赵小蒙
016cde3ece 修复init错误 magic_pdf-0.3.34-released 2024-03-29 17:29:44 +08:00
赵小蒙
4b8dbd7cfb ocr_pdf_intermediate_dict_to_markdown_with_para支持mm和nlp双模式 magic_pdf-0.3.33-released 2024-03-29 17:18:32 +08:00
赵小蒙
d6a5724b26 table_latex支持 2024-03-29 15:56:18 +08:00
赵小蒙
50a543ce0e s3配置信息路径更换 2024-03-29 14:57:12 +08:00
赵小蒙
575ca00e01 app.common依赖删除,pipeline_ocr重构 2024-03-29 14:04:57 +08:00
赵小蒙
7f0c734ff6 pipeline重构 2024-03-28 19:02:03 +08:00
赵小蒙
872cd73f4a pipeline重构 2024-03-28 17:02:16 +08:00
赵小蒙
7fcbae01fe demo重构 2024-03-28 17:01:53 +08:00
赵小蒙
752d620a0c Merge remote-tracking branch 'origin/master' 2024-03-28 14:45:11 +08:00
赵小蒙
fc10772503 ocr_construct_page_component 位置移动 2024-03-28 14:45:00 +08:00
liusilu
fd616c5778 Merge branch 'master' of https://github.com/myhloli/Magic-PDF 2024-03-28 13:35:22 +08:00
liusilu
acb9cbd6d2 add pdf tools 2024-03-28 13:35:12 +08:00
kernel.h@qq.com
433684c646 实现多模态markdown拼装 magic_pdf-0.3.30-released 2024-03-27 14:46:56 +08:00
liusilu
fffee0ae97 Merge branch 'master' of https://github.com/myhloli/Magic-PDF 2024-03-27 10:03:46 +08:00
liusilu
e73606250e add pdf tools 2024-03-27 10:03:20 +08:00
kernel.h@qq.com
7162debc38 实现文本拼PDF解析结果装标准格式 2024-03-26 21:19:19 +08:00
赵小蒙
a343175d66 恢复pipeline magic_pdf-0.3.29-released 2024-03-26 18:10:53 +08:00
赵小蒙
671ce1d97c Merge remote-tracking branch 'origin/master' magic_pdf-0.3.28-released 2024-03-26 16:52:57 +08:00
赵小蒙
6f80beaa31 原pipeline拆分 2024-03-26 16:51:58 +08:00
许瑞
cb1b02e716 feat: disable auto include table title magic_pdf-0.3.27-released 2024-03-26 16:46:05 +08:00
赵小蒙
8ebb79a43a standard_format dump逻辑更新 2024-03-26 16:37:38 +08:00
赵小蒙
154eed1ade footnote drop逻辑更新 2024-03-26 16:37:07 +08:00
赵小蒙
b7652171ea make_standard_format_with_para逻辑更新 2024-03-26 16:36:45 +08:00
许瑞
f0c463ed6d Merge branch 'master' of https://github.com/myhloli/Magic-PDF 2024-03-26 10:17:05 +08:00
赵小蒙
3d2fcc9dce 删除无用代码 magic_pdf-0.3.26-released 2024-03-25 19:10:22 +08:00
赵小蒙
d3c9cb84f8 分段部分log限定在debug模式下才能输出 2024-03-25 17:07:51 +08:00
赵小蒙
8c089976ed 更新注释 2024-03-25 16:09:29 +08:00
赵小蒙
473a0a7de0 拼接markdown时,如果para_text为空则跳过拼接 2024-03-25 14:26:15 +08:00
kernel.h@qq.com
61c970f7da 修复list index错误 2024-03-25 13:24:48 +08:00
赵小蒙
d3ee9abbab 更新ocr_mk_mm_markdown_with_para_core逻辑 magic_pdf-0.3.25-released 2024-03-24 20:39:48 +08:00
赵小蒙
07e4f115e6 ocr_pdf_intermediate_dict_to_markdown_with_para输出nlp格式的markdown magic_pdf-0.3.24-released 2024-03-24 19:40:41 +08:00
赵小蒙
bf8d8e217d 新增ocr_mk_nlp_markdown_with_para方法 magic_pdf-0.3.23-released 2024-03-24 19:31:58 +08:00
kernel.h@qq.com
744b3f75eb 合并居中显示、想同行高的文字 2024-03-23 19:32:02 +08:00
kernel.h@qq.com
2e772467ee 连接跨页的list 2024-03-23 16:19:23 +08:00
许瑞
efed5faa53 feat: modify foot note bbox tmp magic_pdf-0.3.22-released 2024-03-23 14:34:25 +08:00
xu rui
05161c6e62 feat: backup footnote_bbox_tmp magic_pdf-0.3.21-released 2024-03-23 14:11:50 +08:00
xu rui
15c8830416 feat: comment parse_title magic_pdf-0.3.20-released 2024-03-23 13:15:32 +08:00
xu rui
432e1ae5e3 feat: process title and footnote magic_pdf-0.3.19-released 2024-03-22 18:11:44 +08:00
kernel.h@qq.com
e3e125baef 跨layout的列表合并 magic_pdf-0.3.18-released 2024-03-22 16:35:04 +08:00
赵小蒙
2277e31ff4 ocr_demo main函数精简 magic_pdf-0.3.17-released 2024-03-22 16:33:54 +08:00
赵小蒙
7d010e1969 ocr_mk_mm_markdown_with_para和ocr_mk_mm_markdown_with_para_and_pagination逻辑优化 2024-03-22 16:11:55 +08:00
赵小蒙
dbe79ba1b2 ocr_mk_mm_markdown_with_para_and_pagination逻辑更新 2024-03-22 15:41:59 +08:00
kernel.h@qq.com
f36c26565e 使用面积占比方式判断一行文本是不是在一个layoutbox里 2024-03-22 14:48:42 +08:00
赵小蒙
a36ef4f8d5 更新pymupdf依赖版本 magic_pdf-0.3.16-released 2024-03-22 14:30:17 +08:00
赵小蒙
e9aa103cae ocr增加分页markdown输出格式 2024-03-22 11:21:40 +08:00
赵小蒙
27c080a944 pipeline调整 magic_pdf-0.3.15-released 2024-03-21 17:39:30 +08:00