赵小蒙
|
97153fabb8
|
(统一格式)修复中文语境下长文本因分词导致文本丢失问题
(统一格式)修复中文语境content间被增加额外空格的问题
公式内容被转义问题
|
2024-04-08 11:42:55 +08:00 |
|
赵小蒙
|
05fe0548b1
|
修复中文语境下长文本因分词导致文本丢失和content间被增加额外空格的问题
|
2024-04-08 11:14:29 +08:00 |
|
kernel.h@qq.com
|
02d805ea9b
|
增加重构函数位置
|
2024-04-07 17:00:17 +08:00 |
|
kernel.h@qq.com
|
47d5ea96e5
|
update
|
2024-04-07 13:43:00 +08:00 |
|
kernel.h@qq.com
|
044bd0191b
|
delete 无用字段
|
2024-04-07 13:43:00 +08:00 |
|
myhloli
|
696906ed02
|
Update README.md
|
2024-04-07 10:30:28 +08:00 |
|
赵小蒙
|
892f522aea
|
update
|
2024-04-07 10:12:20 +08:00 |
|
赵小蒙
|
26e19fd220
|
mk_nlp_markdown中table的拼接方式更新
magic_pdf-0.3.37-released
|
2024-03-29 17:55:58 +08:00 |
|
赵小蒙
|
cd8b2d2c78
|
修复import错误
magic_pdf-0.3.36-released
|
2024-03-29 17:48:41 +08:00 |
|
赵小蒙
|
d35c49268d
|
修复import错误
magic_pdf-0.3.35-released
|
2024-03-29 17:41:53 +08:00 |
|
赵小蒙
|
016cde3ece
|
修复init错误
magic_pdf-0.3.34-released
|
2024-03-29 17:29:44 +08:00 |
|
赵小蒙
|
4b8dbd7cfb
|
ocr_pdf_intermediate_dict_to_markdown_with_para支持mm和nlp双模式
magic_pdf-0.3.33-released
|
2024-03-29 17:18:32 +08:00 |
|
赵小蒙
|
d6a5724b26
|
table_latex支持
|
2024-03-29 15:56:18 +08:00 |
|
赵小蒙
|
50a543ce0e
|
s3配置信息路径更换
|
2024-03-29 14:57:12 +08:00 |
|
赵小蒙
|
575ca00e01
|
app.common依赖删除,pipeline_ocr重构
|
2024-03-29 14:04:57 +08:00 |
|
赵小蒙
|
7f0c734ff6
|
pipeline重构
|
2024-03-28 19:02:03 +08:00 |
|
赵小蒙
|
872cd73f4a
|
pipeline重构
|
2024-03-28 17:02:16 +08:00 |
|
赵小蒙
|
7fcbae01fe
|
demo重构
|
2024-03-28 17:01:53 +08:00 |
|
赵小蒙
|
752d620a0c
|
Merge remote-tracking branch 'origin/master'
|
2024-03-28 14:45:11 +08:00 |
|
赵小蒙
|
fc10772503
|
ocr_construct_page_component 位置移动
|
2024-03-28 14:45:00 +08:00 |
|
liusilu
|
fd616c5778
|
Merge branch 'master' of https://github.com/myhloli/Magic-PDF
|
2024-03-28 13:35:22 +08:00 |
|
liusilu
|
acb9cbd6d2
|
add pdf tools
|
2024-03-28 13:35:12 +08:00 |
|
kernel.h@qq.com
|
433684c646
|
实现多模态markdown拼装
magic_pdf-0.3.30-released
|
2024-03-27 14:46:56 +08:00 |
|
liusilu
|
fffee0ae97
|
Merge branch 'master' of https://github.com/myhloli/Magic-PDF
|
2024-03-27 10:03:46 +08:00 |
|
liusilu
|
e73606250e
|
add pdf tools
|
2024-03-27 10:03:20 +08:00 |
|
kernel.h@qq.com
|
7162debc38
|
实现文本拼PDF解析结果装标准格式
|
2024-03-26 21:19:19 +08:00 |
|
赵小蒙
|
a343175d66
|
恢复pipeline
magic_pdf-0.3.29-released
|
2024-03-26 18:10:53 +08:00 |
|
赵小蒙
|
671ce1d97c
|
Merge remote-tracking branch 'origin/master'
magic_pdf-0.3.28-released
|
2024-03-26 16:52:57 +08:00 |
|
赵小蒙
|
6f80beaa31
|
原pipeline拆分
|
2024-03-26 16:51:58 +08:00 |
|
许瑞
|
cb1b02e716
|
feat: disable auto include table title
magic_pdf-0.3.27-released
|
2024-03-26 16:46:05 +08:00 |
|
赵小蒙
|
8ebb79a43a
|
standard_format dump逻辑更新
|
2024-03-26 16:37:38 +08:00 |
|
赵小蒙
|
154eed1ade
|
footnote drop逻辑更新
|
2024-03-26 16:37:07 +08:00 |
|
赵小蒙
|
b7652171ea
|
make_standard_format_with_para逻辑更新
|
2024-03-26 16:36:45 +08:00 |
|
许瑞
|
f0c463ed6d
|
Merge branch 'master' of https://github.com/myhloli/Magic-PDF
|
2024-03-26 10:17:05 +08:00 |
|
赵小蒙
|
3d2fcc9dce
|
删除无用代码
magic_pdf-0.3.26-released
|
2024-03-25 19:10:22 +08:00 |
|
赵小蒙
|
d3c9cb84f8
|
分段部分log限定在debug模式下才能输出
|
2024-03-25 17:07:51 +08:00 |
|
赵小蒙
|
8c089976ed
|
更新注释
|
2024-03-25 16:09:29 +08:00 |
|
赵小蒙
|
473a0a7de0
|
拼接markdown时,如果para_text为空则跳过拼接
|
2024-03-25 14:26:15 +08:00 |
|
kernel.h@qq.com
|
61c970f7da
|
修复list index错误
|
2024-03-25 13:24:48 +08:00 |
|
赵小蒙
|
d3ee9abbab
|
更新ocr_mk_mm_markdown_with_para_core逻辑
magic_pdf-0.3.25-released
|
2024-03-24 20:39:48 +08:00 |
|
赵小蒙
|
07e4f115e6
|
ocr_pdf_intermediate_dict_to_markdown_with_para输出nlp格式的markdown
magic_pdf-0.3.24-released
|
2024-03-24 19:40:41 +08:00 |
|
赵小蒙
|
bf8d8e217d
|
新增ocr_mk_nlp_markdown_with_para方法
magic_pdf-0.3.23-released
|
2024-03-24 19:31:58 +08:00 |
|
kernel.h@qq.com
|
744b3f75eb
|
合并居中显示、想同行高的文字
|
2024-03-23 19:32:02 +08:00 |
|
kernel.h@qq.com
|
2e772467ee
|
连接跨页的list
|
2024-03-23 16:19:23 +08:00 |
|
许瑞
|
efed5faa53
|
feat: modify foot note bbox tmp
magic_pdf-0.3.22-released
|
2024-03-23 14:34:25 +08:00 |
|
xu rui
|
05161c6e62
|
feat: backup footnote_bbox_tmp
magic_pdf-0.3.21-released
|
2024-03-23 14:11:50 +08:00 |
|
xu rui
|
15c8830416
|
feat: comment parse_title
magic_pdf-0.3.20-released
|
2024-03-23 13:15:32 +08:00 |
|
xu rui
|
432e1ae5e3
|
feat: process title and footnote
magic_pdf-0.3.19-released
|
2024-03-22 18:11:44 +08:00 |
|
kernel.h@qq.com
|
e3e125baef
|
跨layout的列表合并
magic_pdf-0.3.18-released
|
2024-03-22 16:35:04 +08:00 |
|
赵小蒙
|
2277e31ff4
|
ocr_demo main函数精简
magic_pdf-0.3.17-released
|
2024-03-22 16:33:54 +08:00 |
|