Merge pull request #2514 from opendatalab/release-1.3.12

Release 1.3.12
Merge pull request #2513 from myhloli/dev
2026-03-27 11:08:32 +07:00 · 2025-05-24 16:02:43 +08:00 · 2025-05-24 15:55:39 +08:00 · 2025-05-24 15:47:31 +08:00 · 2025-05-24 13:46:17 +08:00 · 2025-05-24 13:39:34 +08:00
15 changed files with 19385 additions and 39 deletions
--- a/README.md
+++ b/README.md
@@ -48,6 +48,20 @@ Easier to use: Just grab MinerU Desktop. No coding, no login, just a simple inte
 </div>

 # Changelog
+- 2025/05/24 1.3.12 Released
+  - Added support for ppocrv5 model, updated `ch_server` model to `PP-OCRv5_rec_server` and `ch_lite` model to `PP-OCRv5_rec_mobile` (model update required)
+    - In testing, we found that ppocrv5(server) shows some improvement for handwritten documents, but slightly lower accuracy than v4_server_doc for other document types. Therefore, the default ch model remains unchanged as `PP-OCRv4_server_rec_doc`.
+    - Since ppocrv5 enhances recognition capabilities for handwritten text and special characters, you can manually select ppocrv5 models for Japanese, traditional Chinese mixed scenarios and handwritten document scenarios
+    - You can select the appropriate model through the lang parameter `lang='ch_server'` (python api) or `--lang ch_server` (command line):
+      - `ch`: `PP-OCRv4_rec_server_doc` (default) (Chinese, English, Japanese, Traditional Chinese mixed/15k dictionary)
+      - `ch_server`: `PP-OCRv5_rec_server` (Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)
+      - `ch_lite`: `PP-OCRv5_rec_mobile` (Chinese, English, Japanese, Traditional Chinese mixed + handwriting/18k dictionary)
+      - `ch_server_v4`: `PP-OCRv4_rec_server` (Chinese, English mixed/6k dictionary)
+      - `ch_lite_v4`: `PP-OCRv4_rec_mobile` (Chinese, English mixed/6k dictionary)
+  - Added support for handwritten documents by optimizing layout recognition of handwritten text areas
+    - This feature is supported by default, no additional configuration needed
+    - You can refer to the instructions above to manually select ppocrv5 model for better handwritten document parsing
+  - The demos on `huggingface` and `modelscope` have been updated to support handwriting recognition and ppocrv5 models, which you can experience online
 - 2025/04/29 1.3.10 Released
  - Support for custom formula delimiters can be achieved by modifying the `latex-delimiter-config` item in the `magic-pdf.json` file under the user directory.
 - 2025/04/27 1.3.9 Released  
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -47,6 +47,20 @@
 </div>

 # 更新记录
+- 2025/05/24 1.3.12 发布
+  - 增加ppocrv5模型的支持，将`ch_server`模型更新为`PP-OCRv5_rec_server`，`ch_lite`模型更新为`PP-OCRv5_rec_mobile`（需更新模型）
+    - 在测试中，发现ppocrv5(server)对手写文档效果有一定提升，但在其余类别文档的精度略差于v4_server_doc，因此默认的ch模型保持不变，仍为`PP-OCRv4_server_rec_doc`。
+    - 由于ppocrv5强化了手写场景和特殊字符的识别能力，因此您可以在日繁混合场景以及手写文档场景下手动选择使用ppocrv5模型
+    - 您可通过lang参数`lang='ch_server'`(python api)或`--lang ch_server`(命令行)自行选择相应的模型：
+      - `ch` ：`PP-OCRv4_rec_server_doc`（默认）（中英日繁混合/1.5w字典）
+      - `ch_server` ：`PP-OCRv5_rec_server`（中英日繁混合+手写场景/1.8w字典）
+      - `ch_lite` ：`PP-OCRv5_rec_mobile`（中英日繁混合+手写场景/1.8w字典）
+      - `ch_server_v4` ：`PP-OCRv4_rec_server`（中英混合/6k字典）
+      - `ch_lite_v4` ：`PP-OCRv4_rec_mobile`（中英混合/6k字典）
+  - 增加手写文档的支持，通过优化layout对手写文本区域的识别，现已支持手写文档的解析
+    - 默认支持此功能，无需额外配置 
+    - 可以参考上述说明，手动选择ppocrv5模型以获得更好的手写文档解析效果
+  - `huggingface`和`modelscope`的demo已更新为支持手写识别和ppocrv5模型的版本，可自行在线体验
 - 2025/04/29 1.3.10 发布
  - 支持使用自定义公式标识符，可通过修改用户目录下的`magic-pdf.json`文件中的`latex-delimiter-config`项实现。
 - 2025/04/27 1.3.9 发布
--- a/magic_pdf/data/utils.py
+++ b/magic_pdf/data/utils.py
@@ -10,22 +10,22 @@ from loguru import logger



-def fitz_doc_to_image(doc, dpi=200) -> dict:
+def fitz_doc_to_image(page, dpi=200) -> dict:
    """Convert fitz.Document to image, Then convert the image to numpy array.

    Args:
-        doc (_type_): pymudoc page
+        page (_type_): pymudoc page
        dpi (int, optional): reset the dpi of dpi. Defaults to 200.

    Returns:
        dict:  {'img': numpy array, 'width': width, 'height': height }
    """
    mat = fitz.Matrix(dpi / 72, dpi / 72)
-    pm = doc.get_pixmap(matrix=mat, alpha=False)
+    pm = page.get_pixmap(matrix=mat, alpha=False)

    # If the width or height exceeds 4500 after scaling, do not scale further.
    if pm.width > 4500 or pm.height > 4500:
-        pm = doc.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)
+        pm = page.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)

    # Convert pixmap samples directly to numpy array
    img = np.frombuffer(pm.samples, dtype=np.uint8).reshape(pm.height, pm.width, 3)
--- a/magic_pdf/dict2md/ocr_mkcontent.py
+++ b/magic_pdf/dict2md/ocr_mkcontent.py
@@ -70,19 +70,34 @@ def ocr_mk_markdown_with_para_core_v2(paras_of_layout,
            if mode == 'nlp':
                continue
            elif mode == 'mm':
-                for block in para_block['blocks']:  # 1st.拼image_body
-                    if block['type'] == BlockType.ImageBody:
-                        for line in block['lines']:
-                            for span in line['spans']:
-                                if span['type'] == ContentType.Image:
-                                    if span.get('image_path', ''):
-                                        para_text += f"\n![]({join_path(img_buket_path, span['image_path'])})  \n"
-                for block in para_block['blocks']:  # 2nd.拼image_caption
-                    if block['type'] == BlockType.ImageCaption:
-                        para_text += merge_para_with_text(block) + '  \n'
-                for block in para_block['blocks']:  # 3rd.拼image_footnote
-                    if block['type'] == BlockType.ImageFootnote:
-                        para_text += merge_para_with_text(block) + '  \n'
+                # 检测是否存在图片脚注
+                has_image_footnote = any(block['type'] == BlockType.ImageFootnote for block in para_block['blocks'])
+                # 如果存在图片脚注，则将图片脚注拼接到图片正文后面
+                if has_image_footnote:
+                    for block in para_block['blocks']:  # 1st.拼image_caption
+                        if block['type'] == BlockType.ImageCaption:
+                            para_text += merge_para_with_text(block) + '  \n'
+                    for block in para_block['blocks']:  # 2nd.拼image_body
+                        if block['type'] == BlockType.ImageBody:
+                            for line in block['lines']:
+                                for span in line['spans']:
+                                    if span['type'] == ContentType.Image:
+                                        if span.get('image_path', ''):
+                                            para_text += f"![]({img_buket_path}/{span['image_path']})"
+                    for block in para_block['blocks']:  # 3rd.拼image_footnote
+                        if block['type'] == BlockType.ImageFootnote:
+                            para_text += '  \n' + merge_para_with_text(block)
+                else:
+                    for block in para_block['blocks']:  # 1st.拼image_body
+                        if block['type'] == BlockType.ImageBody:
+                            for line in block['lines']:
+                                for span in line['spans']:
+                                    if span['type'] == ContentType.Image:
+                                        if span.get('image_path', ''):
+                                            para_text += f"![]({img_buket_path}/{span['image_path']})"
+                    for block in para_block['blocks']:  # 2nd.拼image_caption
+                        if block['type'] == BlockType.ImageCaption:
+                            para_text += '  \n' + merge_para_with_text(block)
        elif para_type == BlockType.Table:
            if mode == 'nlp':
                continue
@@ -96,20 +111,19 @@ def ocr_mk_markdown_with_para_core_v2(paras_of_layout,
                            for span in line['spans']:
                                if span['type'] == ContentType.Table:
                                    # if processed by table model
-                                    if span.get('latex', ''):
-                                        para_text += f"\n\n$\n {span['latex']}\n$\n\n"
-                                    elif span.get('html', ''):
-                                        para_text += f"\n\n{span['html']}\n\n"
+                                    if span.get('html', ''):
+                                        para_text += f"\n{span['html']}\n"
                                    elif span.get('image_path', ''):
-                                        para_text += f"\n![]({join_path(img_buket_path, span['image_path'])})  \n"
+                                        para_text += f"![]({img_buket_path}/{span['image_path']})"
                for block in para_block['blocks']:  # 3rd.拼table_footnote
                    if block['type'] == BlockType.TableFootnote:
-                        para_text += merge_para_with_text(block) + '  \n'
+                        para_text += '\n' + merge_para_with_text(block) + '  '

        if para_text.strip() == '':
            continue
        else:
-            page_markdown.append(para_text.strip() + '  ')
+            # page_markdown.append(para_text.strip() + '  ')
+            page_markdown.append(para_text.strip())

    return page_markdown

@@ -257,9 +271,9 @@ def para_to_standard_format_v2(para_block, img_buket_path, page_idx, drop_reason
                        if span['type'] == ContentType.Table:

                            if span.get('latex', ''):
-                                para_content['table_body'] = f"\n\n$\n {span['latex']}\n$\n\n"
+                                para_content['table_body'] = f"{span['latex']}"
                            elif span.get('html', ''):
-                                para_content['table_body'] = f"\n\n{span['html']}\n\n"
+                                para_content['table_body'] = f"{span['html']}"

                            if span.get('image_path', ''):
                                para_content['img_path'] = join_path(img_buket_path, span['image_path'])
--- a/magic_pdf/libs/version.py
+++ b/magic_pdf/libs/version.py
@@ -1 +1 @@
-__version__ = "1.3.10"
+__version__ = "1.3.11"
--- a/magic_pdf/model/batch_analyze.py
+++ b/magic_pdf/model/batch_analyze.py
@@ -6,7 +6,7 @@ from tqdm import tqdm
 from magic_pdf.config.constants import MODEL_NAME
 from magic_pdf.model.sub_modules.model_init import AtomModelSingleton
 from magic_pdf.model.sub_modules.model_utils import (
-    clean_vram, crop_img, get_res_list_from_layout_res)
+    clean_vram, crop_img, get_res_list_from_layout_res, get_coords_and_area)
 from magic_pdf.model.sub_modules.ocr.paddleocr2pytorch.ocr_utils import (
    get_adjusted_mfdetrec_res, get_ocr_result_list)

@@ -148,6 +148,19 @@ class BatchAnalyze:
                # Integration results
                if ocr_res:
                    ocr_result_list = get_ocr_result_list(ocr_res, useful_list, ocr_res_list_dict['ocr_enable'], new_image, _lang)
+
+                    if res["category_id"] == 3:
+                        # ocr_result_list中所有bbox的面积之和
+                        ocr_res_area = sum(get_coords_and_area(ocr_res_item)[4] for ocr_res_item in ocr_result_list if 'poly' in ocr_res_item)
+                        # 求ocr_res_area和res的面积的比值
+                        res_area = get_coords_and_area(res)[4]
+                        if res_area > 0:
+                            ratio = ocr_res_area / res_area
+                            if ratio > 0.25:
+                                res["category_id"] = 1
+                            else:
+                                continue
+
                    ocr_res_list_dict['layout_res'].extend(ocr_result_list)

            # det_count += len(ocr_res_list_dict['ocr_res_list'])
--- a/magic_pdf/model/doc_analyze_by_custom_model.py
+++ b/magic_pdf/model/doc_analyze_by_custom_model.py
@@ -189,7 +189,7 @@ def batch_doc_analyze(
    formula_enable=None,
    table_enable=None,
 ):
-    MIN_BATCH_INFERENCE_SIZE = int(os.environ.get('MINERU_MIN_BATCH_INFERENCE_SIZE', 200))
+    MIN_BATCH_INFERENCE_SIZE = int(os.environ.get('MINERU_MIN_BATCH_INFERENCE_SIZE', 100))
    batch_size = MIN_BATCH_INFERENCE_SIZE
    page_wh_list = []

--- a/magic_pdf/model/sub_modules/model_utils.py
+++ b/magic_pdf/model/sub_modules/model_utils.py
@@ -31,10 +31,10 @@ def crop_img(input_res, input_np_img, crop_paste_x=0, crop_paste_y=0):
    return return_image, return_list


-def get_coords_and_area(table):
+def get_coords_and_area(block_with_poly):
    """Extract coordinates and area from a table."""
-    xmin, ymin = int(table['poly'][0]), int(table['poly'][1])
-    xmax, ymax = int(table['poly'][4]), int(table['poly'][5])
+    xmin, ymin = int(block_with_poly['poly'][0]), int(block_with_poly['poly'][1])
+    xmax, ymax = int(block_with_poly['poly'][4]), int(block_with_poly['poly'][5])
    area = (xmax - xmin) * (ymax - ymin)
    return xmin, ymin, xmax, ymax, area

@@ -243,7 +243,7 @@ def get_res_list_from_layout_res(layout_res, iou_threshold=0.7, overlap_threshol
                "bbox": [int(res['poly'][0]), int(res['poly'][1]),
                         int(res['poly'][4]), int(res['poly'][5])],
            })
-        elif category_id in [0, 2, 4, 6, 7]:  # OCR regions
+        elif category_id in [0, 2, 4, 6, 7, 3]:  # OCR regions
            ocr_res_list.append(res)
        elif category_id == 5:  # Table regions
            table_res_list.append(res)
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/modeling/backbones/init.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/modeling/backbones/init.py
@@ -35,7 +35,7 @@ def build_backbone(config, model_type):
        from .rec_mobilenet_v3 import MobileNetV3
        from .rec_svtrnet import SVTRNet
        from .rec_mv1_enhance import MobileNetV1Enhance
-
+        from .rec_pphgnetv2 import PPHGNetV2_B4
        support_dict = [
            "MobileNetV1Enhance",
            "MobileNetV3",
@@ -48,6 +48,7 @@ def build_backbone(config, model_type):
            "DenseNet",
            "PPLCNetV3",
            "PPHGNet_small",
+            "PPHGNetV2_B4",
        ]
    else:
        raise NotImplementedError
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/modeling/backbones/rec_pphgnetv2.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/modeling/backbones/rec_pphgnetv2.py
@@ -0,0 +1,810 @@
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class AdaptiveAvgPool2D(nn.AdaptiveAvgPool2d):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        if isinstance(self.output_size, int) and self.output_size == 1:
+            self._gap = True
+        elif (
+            isinstance(self.output_size, tuple)
+            and self.output_size[0] == 1
+            and self.output_size[1] == 1
+        ):
+            self._gap = True
+        else:
+            self._gap = False
+
+    def forward(self, x):
+        if self._gap:
+            # Global Average Pooling
+            N, C, _, _ = x.shape
+            x_mean = torch.mean(x, dim=[2, 3])
+            x_mean = torch.reshape(x_mean, [N, C, 1, 1])
+            return x_mean
+        else:
+            return F.adaptive_avg_pool2d(
+                x,
+                output_size=self.output_size
+            )
+
+class LearnableAffineBlock(nn.Module):
+    """
+    Create a learnable affine block module. This module can significantly improve accuracy on smaller models.
+
+    Args:
+        scale_value (float): The initial value of the scale parameter, default is 1.0.
+        bias_value (float): The initial value of the bias parameter, default is 0.0.
+        lr_mult (float): The learning rate multiplier, default is 1.0.
+        lab_lr (float): The learning rate, default is 0.01.
+    """
+
+    def __init__(self, scale_value=1.0, bias_value=0.0, lr_mult=1.0, lab_lr=0.01):
+        super().__init__()
+        self.scale = nn.Parameter(torch.Tensor([scale_value]))
+        self.bias = nn.Parameter(torch.Tensor([bias_value]))
+
+    def forward(self, x):
+        return self.scale * x + self.bias
+
+
+class ConvBNAct(nn.Module):
+    """
+    ConvBNAct is a combination of convolution and batchnorm layers.
+
+    Args:
+        in_channels (int): Number of input channels.
+        out_channels (int): Number of output channels.
+        kernel_size (int): Size of the convolution kernel. Defaults to 3.
+        stride (int): Stride of the convolution. Defaults to 1.
+        padding (int/str): Padding or padding type for the convolution. Defaults to 1.
+        groups (int): Number of groups for the convolution. Defaults to 1.
+        use_act: (bool): Whether to use activation function. Defaults to True.
+        use_lab (bool): Whether to use the LAB operation. Defaults to False.
+        lr_mult (float): Learning rate multiplier for the layer. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size=3,
+        stride=1,
+        padding=1,
+        groups=1,
+        use_act=True,
+        use_lab=False,
+        lr_mult=1.0,
+    ):
+        super().__init__()
+        self.use_act = use_act
+        self.use_lab = use_lab
+
+        self.conv = nn.Conv2d(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride,
+            padding=padding if isinstance(padding, str) else (kernel_size - 1) // 2,
+            # padding=(kernel_size - 1) // 2,
+            groups=groups,
+            bias=False,
+        )
+        self.bn = nn.BatchNorm2d(
+            out_channels,
+        )
+        if self.use_act:
+            self.act = nn.ReLU()
+            if self.use_lab:
+                self.lab = LearnableAffineBlock(lr_mult=lr_mult)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        if self.use_act:
+            x = self.act(x)
+            if self.use_lab:
+                x = self.lab(x)
+        return x
+
+
+class LightConvBNAct(nn.Module):
+    """
+    LightConvBNAct is a combination of pw and dw layers.
+
+    Args:
+        in_channels (int): Number of input channels.
+        out_channels (int): Number of output channels.
+        kernel_size (int): Size of the depth-wise convolution kernel.
+        use_lab (bool): Whether to use the LAB operation. Defaults to False.
+        lr_mult (float): Learning rate multiplier for the layer. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size,
+        use_lab=False,
+        lr_mult=1.0,
+        **kwargs,
+    ):
+        super().__init__()
+        self.conv1 = ConvBNAct(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            use_act=False,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.conv2 = ConvBNAct(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            groups=out_channels,
+            use_act=True,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.conv2(x)
+        return x
+
+
+class CustomMaxPool2d(nn.Module):
+    def __init__(
+            self,
+            kernel_size,
+            stride=None,
+            padding=0,
+            dilation=1,
+            return_indices=False,
+            ceil_mode=False,
+            data_format="NCHW",
+    ):
+        super(CustomMaxPool2d, self).__init__()
+        self.kernel_size = kernel_size if isinstance(kernel_size, (tuple, list)) else (kernel_size, kernel_size)
+        self.stride = stride if stride is not None else self.kernel_size
+        self.stride = self.stride if isinstance(self.stride, (tuple, list)) else (self.stride, self.stride)
+        self.dilation = dilation if isinstance(dilation, (tuple, list)) else (dilation, dilation)
+        self.return_indices = return_indices
+        self.ceil_mode = ceil_mode
+        self.padding_mode = padding
+
+        # 当padding不是"same"时使用标准MaxPool2d
+        if padding != "same":
+            self.padding = padding if isinstance(padding, (tuple, list)) else (padding, padding)
+            self.pool = nn.MaxPool2d(
+                kernel_size=self.kernel_size,
+                stride=self.stride,
+                padding=self.padding,
+                dilation=self.dilation,
+                return_indices=self.return_indices,
+                ceil_mode=self.ceil_mode
+            )
+
+    def forward(self, x):
+        # 处理same padding
+        if self.padding_mode == "same":
+            input_height, input_width = x.size(2), x.size(3)
+
+            # 计算期望的输出尺寸
+            out_height = math.ceil(input_height / self.stride[0])
+            out_width = math.ceil(input_width / self.stride[1])
+
+            # 计算需要的padding
+            pad_height = max((out_height - 1) * self.stride[0] + self.kernel_size[0] - input_height, 0)
+            pad_width = max((out_width - 1) * self.stride[1] + self.kernel_size[1] - input_width, 0)
+
+            # 将padding分配到两边
+            pad_top = pad_height // 2
+            pad_bottom = pad_height - pad_top
+            pad_left = pad_width // 2
+            pad_right = pad_width - pad_left
+
+            # 应用padding
+            x = F.pad(x, (pad_left, pad_right, pad_top, pad_bottom))
+
+            # 使用标准max_pool2d函数
+            if self.return_indices:
+                return F.max_pool2d_with_indices(
+                    x,
+                    kernel_size=self.kernel_size,
+                    stride=self.stride,
+                    padding=0,  # 已经手动pad过了
+                    dilation=self.dilation,
+                    ceil_mode=self.ceil_mode
+                )
+            else:
+                return F.max_pool2d(
+                    x,
+                    kernel_size=self.kernel_size,
+                    stride=self.stride,
+                    padding=0,  # 已经手动pad过了
+                    dilation=self.dilation,
+                    ceil_mode=self.ceil_mode
+                )
+        else:
+            # 使用预定义的MaxPool2d
+            return self.pool(x)
+
+class StemBlock(nn.Module):
+    """
+    StemBlock for PP-HGNetV2.
+
+    Args:
+        in_channels (int): Number of input channels.
+        mid_channels (int): Number of middle channels.
+        out_channels (int): Number of output channels.
+        use_lab (bool): Whether to use the LAB operation. Defaults to False.
+        lr_mult (float): Learning rate multiplier for the layer. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        mid_channels,
+        out_channels,
+        use_lab=False,
+        lr_mult=1.0,
+        text_rec=False,
+    ):
+        super().__init__()
+        self.stem1 = ConvBNAct(
+            in_channels=in_channels,
+            out_channels=mid_channels,
+            kernel_size=3,
+            stride=2,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.stem2a = ConvBNAct(
+            in_channels=mid_channels,
+            out_channels=mid_channels // 2,
+            kernel_size=2,
+            stride=1,
+            padding="same",
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.stem2b = ConvBNAct(
+            in_channels=mid_channels // 2,
+            out_channels=mid_channels,
+            kernel_size=2,
+            stride=1,
+            padding="same",
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.stem3 = ConvBNAct(
+            in_channels=mid_channels * 2,
+            out_channels=mid_channels,
+            kernel_size=3,
+            stride=1 if text_rec else 2,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.stem4 = ConvBNAct(
+            in_channels=mid_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            stride=1,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.pool = CustomMaxPool2d(
+            kernel_size=2, stride=1, ceil_mode=True, padding="same"
+        )
+        # self.pool = nn.MaxPool2d(
+        #     kernel_size=2, stride=1, ceil_mode=True, padding=1
+        # )
+
+    def forward(self, x):
+        x = self.stem1(x)
+        x2 = self.stem2a(x)
+        x2 = self.stem2b(x2)
+        x1 = self.pool(x)
+
+        # if x1.shape[2:] != x2.shape[2:]:
+        #     x1 = F.interpolate(x1, size=x2.shape[2:], mode='bilinear', align_corners=False)
+
+        x = torch.cat([x1, x2], 1)
+        x = self.stem3(x)
+        x = self.stem4(x)
+
+        return x
+
+
+class HGV2_Block(nn.Module):
+    """
+    HGV2_Block, the basic unit that constitutes the HGV2_Stage.
+
+    Args:
+        in_channels (int): Number of input channels.
+        mid_channels (int): Number of middle channels.
+        out_channels (int): Number of output channels.
+        kernel_size (int): Size of the convolution kernel. Defaults to 3.
+        layer_num (int): Number of layers in the HGV2 block. Defaults to 6.
+        stride (int): Stride of the convolution. Defaults to 1.
+        padding (int/str): Padding or padding type for the convolution. Defaults to 1.
+        groups (int): Number of groups for the convolution. Defaults to 1.
+        use_act (bool): Whether to use activation function. Defaults to True.
+        use_lab (bool): Whether to use the LAB operation. Defaults to False.
+        lr_mult (float): Learning rate multiplier for the layer. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        mid_channels,
+        out_channels,
+        kernel_size=3,
+        layer_num=6,
+        identity=False,
+        light_block=True,
+        use_lab=False,
+        lr_mult=1.0,
+    ):
+        super().__init__()
+        self.identity = identity
+
+        self.layers = nn.ModuleList()
+        block_type = "LightConvBNAct" if light_block else "ConvBNAct"
+        for i in range(layer_num):
+            self.layers.append(
+                eval(block_type)(
+                    in_channels=in_channels if i == 0 else mid_channels,
+                    out_channels=mid_channels,
+                    stride=1,
+                    kernel_size=kernel_size,
+                    use_lab=use_lab,
+                    lr_mult=lr_mult,
+                )
+            )
+        # feature aggregation
+        total_channels = in_channels + layer_num * mid_channels
+        self.aggregation_squeeze_conv = ConvBNAct(
+            in_channels=total_channels,
+            out_channels=out_channels // 2,
+            kernel_size=1,
+            stride=1,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+        self.aggregation_excitation_conv = ConvBNAct(
+            in_channels=out_channels // 2,
+            out_channels=out_channels,
+            kernel_size=1,
+            stride=1,
+            use_lab=use_lab,
+            lr_mult=lr_mult,
+        )
+
+    def forward(self, x):
+        identity = x
+        output = []
+        output.append(x)
+        for layer in self.layers:
+            x = layer(x)
+            output.append(x)
+        x = torch.cat(output, dim=1)
+        x = self.aggregation_squeeze_conv(x)
+        x = self.aggregation_excitation_conv(x)
+        if self.identity:
+            x += identity
+        return x
+
+
+class HGV2_Stage(nn.Module):
+    """
+    HGV2_Stage, the basic unit that constitutes the PPHGNetV2.
+
+    Args:
+        in_channels (int): Number of input channels.
+        mid_channels (int): Number of middle channels.
+        out_channels (int): Number of output channels.
+        block_num (int): Number of blocks in the HGV2 stage.
+        layer_num (int): Number of layers in the HGV2 block. Defaults to 6.
+        is_downsample (bool): Whether to use downsampling operation. Defaults to False.
+        light_block (bool): Whether to use light block. Defaults to True.
+        kernel_size (int): Size of the convolution kernel. Defaults to 3.
+        use_lab (bool, optional): Whether to use the LAB operation. Defaults to False.
+        lr_mult (float, optional): Learning rate multiplier for the layer. Defaults to 1.0.
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        mid_channels,
+        out_channels,
+        block_num,
+        layer_num=6,
+        is_downsample=True,
+        light_block=True,
+        kernel_size=3,
+        use_lab=False,
+        stride=2,
+        lr_mult=1.0,
+    ):
+
+        super().__init__()
+        self.is_downsample = is_downsample
+        if self.is_downsample:
+            self.downsample = ConvBNAct(
+                in_channels=in_channels,
+                out_channels=in_channels,
+                kernel_size=3,
+                stride=stride,
+                groups=in_channels,
+                use_act=False,
+                use_lab=use_lab,
+                lr_mult=lr_mult,
+            )
+
+        blocks_list = []
+        for i in range(block_num):
+            blocks_list.append(
+                HGV2_Block(
+                    in_channels=in_channels if i == 0 else out_channels,
+                    mid_channels=mid_channels,
+                    out_channels=out_channels,
+                    kernel_size=kernel_size,
+                    layer_num=layer_num,
+                    identity=False if i == 0 else True,
+                    light_block=light_block,
+                    use_lab=use_lab,
+                    lr_mult=lr_mult,
+                )
+            )
+        self.blocks = nn.Sequential(*blocks_list)
+
+    def forward(self, x):
+        if self.is_downsample:
+            x = self.downsample(x)
+        x = self.blocks(x)
+        return x
+
+
+class DropoutInferDownscale(nn.Module):
+    """
+    实现与Paddle的mode="downscale_in_infer"等效的Dropout
+    训练模式：out = input * mask（直接应用掩码，不进行放大）
+    推理模式：out = input * (1.0 - p)（在推理时按概率缩小）
+    """
+
+    def __init__(self, p=0.5):
+        super().__init__()
+        self.p = p
+
+    def forward(self, x):
+        if self.training:
+            # 训练时：应用随机mask但不放大
+            return F.dropout(x, self.p, training=True) * (1.0 - self.p)
+        else:
+            # 推理时：按照dropout概率缩小输出
+            return x * (1.0 - self.p)
+
+class PPHGNetV2(nn.Module):
+    """
+    PPHGNetV2
+
+    Args:
+        stage_config (dict): Config for PPHGNetV2 stages. such as the number of channels, stride, etc.
+        stem_channels: (list): Number of channels of the stem of the PPHGNetV2.
+        use_lab (bool): Whether to use the LAB operation. Defaults to False.
+        use_last_conv (bool): Whether to use the last conv layer as the output channel. Defaults to True.
+        class_expand (int): Number of channels for the last 1x1 convolutional layer.
+        drop_prob (float): Dropout probability for the last 1x1 convolutional layer. Defaults to 0.0.
+        class_num (int): The number of classes for the classification layer. Defaults to 1000.
+        lr_mult_list (list): Learning rate multiplier for the stages. Defaults to [1.0, 1.0, 1.0, 1.0, 1.0].
+    Returns:
+        model: nn.Layer. Specific PPHGNetV2 model depends on args.
+    """
+
+    def __init__(
+        self,
+        stage_config,
+        stem_channels=[3, 32, 64],
+        use_lab=False,
+        use_last_conv=True,
+        class_expand=2048,
+        dropout_prob=0.0,
+        class_num=1000,
+        lr_mult_list=[1.0, 1.0, 1.0, 1.0, 1.0],
+        det=False,
+        text_rec=False,
+        out_indices=None,
+        **kwargs,
+    ):
+        super().__init__()
+        self.det = det
+        self.text_rec = text_rec
+        self.use_lab = use_lab
+        self.use_last_conv = use_last_conv
+        self.class_expand = class_expand
+        self.class_num = class_num
+        self.out_indices = out_indices if out_indices is not None else [0, 1, 2, 3]
+        self.out_channels = []
+
+        # stem
+        self.stem = StemBlock(
+            in_channels=stem_channels[0],
+            mid_channels=stem_channels[1],
+            out_channels=stem_channels[2],
+            use_lab=use_lab,
+            lr_mult=lr_mult_list[0],
+            text_rec=text_rec,
+        )
+
+        # stages
+        self.stages = nn.ModuleList()
+        for i, k in enumerate(stage_config):
+            (
+                in_channels,
+                mid_channels,
+                out_channels,
+                block_num,
+                is_downsample,
+                light_block,
+                kernel_size,
+                layer_num,
+                stride,
+            ) = stage_config[k]
+            self.stages.append(
+                HGV2_Stage(
+                    in_channels,
+                    mid_channels,
+                    out_channels,
+                    block_num,
+                    layer_num,
+                    is_downsample,
+                    light_block,
+                    kernel_size,
+                    use_lab,
+                    stride,
+                    lr_mult=lr_mult_list[i + 1],
+                )
+            )
+            if i in self.out_indices:
+                self.out_channels.append(out_channels)
+        if not self.det:
+            self.out_channels = stage_config["stage4"][2]
+
+        self.avg_pool = AdaptiveAvgPool2D(1)
+
+        if self.use_last_conv:
+            self.last_conv = nn.Conv2d(
+                in_channels=out_channels,
+                out_channels=self.class_expand,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                bias=False,
+            )
+            self.act = nn.ReLU()
+            if self.use_lab:
+                self.lab = LearnableAffineBlock()
+            self.dropout = DropoutInferDownscale(p=dropout_prob)
+
+        self.flatten = nn.Flatten(start_dim=1, end_dim=-1)
+        if not self.det:
+            self.fc = nn.Linear(
+                self.class_expand if self.use_last_conv else out_channels,
+                self.class_num,
+            )
+
+        self._init_weights()
+
+    def _init_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(m.weight)
+            elif isinstance(m, nn.BatchNorm2d):
+                nn.init.ones_(m.weight)
+                nn.init.zeros_(m.bias)
+            elif isinstance(m, nn.Linear):
+                nn.init.zeros_(m.bias)
+
+    def forward(self, x):
+        x = self.stem(x)
+        out = []
+        for i, stage in enumerate(self.stages):
+            x = stage(x)
+            if self.det and i in self.out_indices:
+                out.append(x)
+        if self.det:
+            return out
+
+        if self.text_rec:
+            if self.training:
+                x = F.adaptive_avg_pool2d(x, [1, 40])
+            else:
+                x = F.avg_pool2d(x, [3, 2])
+        return x
+
+
+def PPHGNetV2_B0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPHGNetV2_B0
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B0` model depends on args.
+    """
+    stage_config = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [16, 16, 64, 1, False, False, 3, 3],
+        "stage2": [64, 32, 256, 1, True, False, 3, 3],
+        "stage3": [256, 64, 512, 2, True, True, 5, 3],
+        "stage4": [512, 128, 1024, 1, True, True, 5, 3],
+    }
+
+    model = PPHGNetV2(
+        stem_channels=[3, 16, 16], stage_config=stage_config, use_lab=True, **kwargs
+    )
+    return model
+
+
+def PPHGNetV2_B1(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPHGNetV2_B1
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B1` model depends on args.
+    """
+    stage_config = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [32, 32, 64, 1, False, False, 3, 3],
+        "stage2": [64, 48, 256, 1, True, False, 3, 3],
+        "stage3": [256, 96, 512, 2, True, True, 5, 3],
+        "stage4": [512, 192, 1024, 1, True, True, 5, 3],
+    }
+
+    model = PPHGNetV2(
+        stem_channels=[3, 24, 32], stage_config=stage_config, use_lab=True, **kwargs
+    )
+    return model
+
+
+def PPHGNetV2_B2(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPHGNetV2_B2
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B2` model depends on args.
+    """
+    stage_config = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [32, 32, 96, 1, False, False, 3, 4],
+        "stage2": [96, 64, 384, 1, True, False, 3, 4],
+        "stage3": [384, 128, 768, 3, True, True, 5, 4],
+        "stage4": [768, 256, 1536, 1, True, True, 5, 4],
+    }
+
+    model = PPHGNetV2(
+        stem_channels=[3, 24, 32], stage_config=stage_config, use_lab=True, **kwargs
+    )
+    return model
+
+
+def PPHGNetV2_B3(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPHGNetV2_B3
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B3` model depends on args.
+    """
+    stage_config = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [32, 32, 128, 1, False, False, 3, 5],
+        "stage2": [128, 64, 512, 1, True, False, 3, 5],
+        "stage3": [512, 128, 1024, 3, True, True, 5, 5],
+        "stage4": [1024, 256, 2048, 1, True, True, 5, 5],
+    }
+
+    model = PPHGNetV2(
+        stem_channels=[3, 24, 32], stage_config=stage_config, use_lab=True, **kwargs
+    )
+    return model
+
+
+def PPHGNetV2_B4(pretrained=False, use_ssld=False, det=False, text_rec=False, **kwargs):
+    """
+    PPHGNetV2_B4
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B4` model depends on args.
+    """
+    stage_config_rec = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num, stride
+        "stage1": [48, 48, 128, 1, True, False, 3, 6, [2, 1]],
+        "stage2": [128, 96, 512, 1, True, False, 3, 6, [1, 2]],
+        "stage3": [512, 192, 1024, 3, True, True, 5, 6, [2, 1]],
+        "stage4": [1024, 384, 2048, 1, True, True, 5, 6, [2, 1]],
+    }
+
+    stage_config_det = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [48, 48, 128, 1, False, False, 3, 6, 2],
+        "stage2": [128, 96, 512, 1, True, False, 3, 6, 2],
+        "stage3": [512, 192, 1024, 3, True, True, 5, 6, 2],
+        "stage4": [1024, 384, 2048, 1, True, True, 5, 6, 2],
+    }
+    model = PPHGNetV2(
+        stem_channels=[3, 32, 48],
+        stage_config=stage_config_det if det else stage_config_rec,
+        use_lab=False,
+        det=det,
+        text_rec=text_rec,
+        **kwargs,
+    )
+    return model
+
+
+def PPHGNetV2_B5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPHGNetV2_B5
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B5` model depends on args.
+    """
+    stage_config = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [64, 64, 128, 1, False, False, 3, 6],
+        "stage2": [128, 128, 512, 2, True, False, 3, 6],
+        "stage3": [512, 256, 1024, 5, True, True, 5, 6],
+        "stage4": [1024, 512, 2048, 2, True, True, 5, 6],
+    }
+
+    model = PPHGNetV2(
+        stem_channels=[3, 32, 64], stage_config=stage_config, use_lab=False, **kwargs
+    )
+    return model
+
+
+def PPHGNetV2_B6(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPHGNetV2_B6
+    Args:
+        pretrained (bool/str): If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld (bool) Whether using ssld pretrained model when pretrained is True.
+    Returns:
+        model: nn.Layer. Specific `PPHGNetV2_B6` model depends on args.
+    """
+    stage_config = {
+        # in_channels, mid_channels, out_channels, num_blocks, is_downsample, light_block, kernel_size, layer_num
+        "stage1": [96, 96, 192, 2, False, False, 3, 6],
+        "stage2": [192, 192, 512, 3, True, False, 3, 6],
+        "stage3": [512, 384, 1024, 6, True, True, 5, 6],
+        "stage4": [1024, 768, 2048, 3, True, True, 5, 6],
+    }
+
+    model = PPHGNetV2(
+        stem_channels=[3, 48, 96], stage_config=stage_config, use_lab=False, **kwargs
+    )
+    return model
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/modeling/necks/rnn.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/modeling/necks/rnn.py
@@ -9,14 +9,27 @@ class Im2Seq(nn.Module):
        super().__init__()
        self.out_channels = in_channels

+    # def forward(self, x):
+    #     B, C, H, W = x.shape
+    #     # assert H == 1
+    #     x = x.squeeze(dim=2)
+    #     # x = x.transpose([0, 2, 1])  # paddle (NTC)(batch, width, channels)
+    #     x = x.permute(0, 2, 1)
+    #     return x
+
    def forward(self, x):
        B, C, H, W = x.shape
-        # assert H == 1
-        x = x.squeeze(dim=2)
-        # x = x.transpose([0, 2, 1])  # paddle (NTC)(batch, width, channels)
-        x = x.permute(0, 2, 1)
-        return x
+        # 处理四维张量，将空间维度展平为序列
+        if H == 1:
+            # 原来的处理逻辑，适用于H=1的情况
+            x = x.squeeze(dim=2)
+            x = x.permute(0, 2, 1)  # (B, W, C)
+        else:
+            # 处理H不为1的情况
+            x = x.permute(0, 2, 3, 1)  # (B, H, W, C)
+            x = x.reshape(B, H * W, C)  # (B, H*W, C)

+        return x

 class EncoderWithRNN_(nn.Module):
    def __init__(self, in_channels, hidden_size):
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/arch_config.yaml
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/arch_config.yaml
@@ -104,6 +104,22 @@ ch_PP-OCRv4_det_infer:
    name: DBHead
    k: 50

+ch_PP-OCRv5_det_infer:
+  model_type: det
+  algorithm: DB
+  Transform: null
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.75
+    det: True
+  Neck:
+    name: RSEFPN
+    out_channels: 96
+    shortcut: True
+  Head:
+    name: DBHead
+    k: 50
+
 ch_PP-OCRv4_det_server_infer:
  model_type: det
  algorithm: DB
@@ -196,6 +212,58 @@ ch_PP-OCRv4_rec_server_doc_infer:
          nrtr_dim: 384
          max_text_length: 25

+ch_PP-OCRv5_rec_server_infer:
+  model_type: rec
+  algorithm: SVTR_HGNet
+  Transform:
+  Backbone:
+    name: PPHGNetV2_B4
+    text_rec: True
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 18385
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [ 1, 3 ]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
+ch_PP-OCRv5_rec_infer:
+  model_type: rec
+  algorithm: SVTR_HGNet
+  Transform:
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.95
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 18385
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [ 1, 3 ]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
 chinese_cht_PP-OCRv3_rec_infer:
  model_type: rec
  algorithm: SVTR
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/dict/ppocrv5_dict.txt
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/dict/ppocrv5_dict.txt
--- a/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/models_config.yml
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr2pytorch/pytorchocr/utils/resources/models_config.yml
@@ -1,9 +1,17 @@
 lang:
  ch_lite:
+    det: ch_PP-OCRv3_det_infer.pth
+    rec: ch_PP-OCRv5_rec_infer.pth
+    dict: ppocrv5_dict.txt
+  ch_lite_v4:
    det: ch_PP-OCRv3_det_infer.pth
    rec: ch_PP-OCRv4_rec_infer.pth
    dict: ppocr_keys_v1.txt
  ch_server:
+    det: ch_PP-OCRv3_det_infer.pth
+    rec: ch_PP-OCRv5_rec_server_infer.pth
+    dict: ppocrv5_dict.txt
+  ch_server_v4:
    det: ch_PP-OCRv3_det_infer.pth
    rec: ch_PP-OCRv4_rec_server_infer.pth
    dict: ppocr_keys_v1.txt
--- a/signatures/version1/cla.json
+++ b/signatures/version1/cla.json
@@ -255,6 +255,14 @@
      "created_at": "2025-04-25T02:54:20Z",
      "repoId": 765083837,
      "pullRequestNo": 2367
+    },
+    {
+      "name": "CharlesKeeling65",
+      "id": 94165417,
+      "comment_id": 2841356871,
+      "created_at": "2025-04-30T09:25:31Z",
+      "repoId": 765083837,
+      "pullRequestNo": 2411
    }
  ]
 }
Author	SHA1	Message	Date
Xiaomeng Zhao	a989444e2f	Merge pull request #2514 from opendatalab/release-1.3.12 Release 1.3.12	2025-05-24 16:02:43 +08:00
Xiaomeng Zhao	e3a4295527	Merge pull request #2513 from myhloli/dev feat(docs): update changelog for PP-OCRv5 model support and handwritten document recognition enhancements	2025-05-24 15:55:39 +08:00
myhloli	73f0530d16	feat(docs): update changelog for PP-OCRv5 model support and handwritten document recognition enhancements	2025-05-24 15:47:31 +08:00
Xiaomeng Zhao	e92b5b698e	Merge pull request #2512 from myhloli/dev fix(ocr): adjust area ratio threshold and update fitz document handling in image conversion	2025-05-24 13:46:17 +08:00
myhloli	1e01ffcf78	fix(ocr): adjust area ratio threshold and update fitz document handling in image conversion	2025-05-24 13:39:34 +08:00
Xiaomeng Zhao	04b81dc1ab	Merge pull request #2511 from myhloli/dev Merge pull request #10 from myhloli/img2text	2025-05-24 12:01:47 +08:00
Xiaomeng Zhao	90585b67a9	Merge pull request #2510 from myhloli/img2text feat(ocr): add area ratio calculation for OCR results and enhance get_coords_and_area function	2025-05-24 12:00:34 +08:00
Xiaomeng Zhao	4949dd0c18	Merge pull request #10 from myhloli/img2text feat(ocr): add area ratio calculation for OCR results and enhance get_coords_and_area function	2025-05-24 11:59:52 +08:00
myhloli	a2b848136b	feat(ocr): add area ratio calculation for OCR results and enhance get_coords_and_area function	2025-05-24 11:58:02 +08:00
Xiaomeng Zhao	04a712f940	Merge pull request #2506 from myhloli/dev feat(ocr): implement PPHGNetV2 architecture with multiple stages and layers	2025-05-23 18:09:27 +08:00
myhloli	27cad566fa	feat(ocr): implement PPHGNetV2 architecture with multiple stages and layers	2025-05-23 18:06:21 +08:00
Xiaomeng Zhao	ea3003f6ef	Merge pull request #2505 from myhloli/dev feat(ocr): add PPHGNetV2_B4 backbone and update OCR models	2025-05-23 17:34:56 +08:00
myhloli	93ad41edce	feat(ocr): add PPHGNetV2_B4 backbone and update OCR models - Add PPHGNetV2_B4 backbone to the list of supported backbones - Introduce new OCR model configuration for PP-OCRv5 with PPHGNetV2_B4 - Update existing model configurations to use the new backbone - Modify RNN neck to support input with H > 1 - Adjust batch size for inference	2025-05-23 17:06:52 +08:00
Xiaomeng Zhao	8f8b8c4c1f	Merge pull request #2501 from myhloli/dev feat(ocr): add PP-OCRv5 models and update configurations	2025-05-22 17:40:43 +08:00
myhloli	048f6af406	feat(ocr): add PP-OCRv5 models and update configurations - Add new PP-OCRv5 detection and recognition models - Update arch_config.yaml with new model architectures - Modify models_config.yml to include PP-OCRv5 models for ch_lite configuration- Change dictionary file for ch_lite to ppocrv5_dict.txt	2025-05-22 17:29:47 +08:00
Xiaomeng Zhao	b122b86e8a	Merge pull request #2487 from myhloli/dev fix(ocr_mkcontent): improve image handling and footnote integration in markdown output	2025-05-19 15:47:48 +08:00
myhloli	002333a8d7	fix(ocr_mkcontent): improve image handling and footnote integration in markdown output	2025-05-19 15:45:26 +08:00
Xiaomeng Zhao	e3f22e84ab	Merge pull request #2468 from opendatalab/master master->dev	2025-05-14 10:46:50 +08:00
myhloli	40851b1c61	Update version.py with new version	2025-05-14 02:34:34 +00:00
Xiaomeng Zhao	ea619281ef	Merge pull request #2467 from opendatalab/release-1.3.11 Release 1.3.11	2025-05-14 10:33:00 +08:00
Xiaomeng Zhao	0b8c614280	Merge pull request #2464 from opendatalab/release-1.3.11 Release 1.3.11	2025-05-14 10:22:18 +08:00
github-actions[bot]	50700646e4	@CharlesKeeling65 has signed the CLA in opendatalab/MinerU#2411	2025-04-30 09:25:44 +00:00