Merge branch 'main' of https://github.com/mindspore-lab/mindocr into doc

mindspore-lab · Dec 14, 2023 · 9429602 · 9429602
2 parents 8bf0224 + 6604ed9
commit 9429602
Show file tree

Hide file tree

Showing 14 changed files with 1,120 additions and 573 deletions.
diff --git a/README.md b/README.md
@@ -184,6 +184,13 @@ You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Thi
 
 </details>
 
+<details open markdown>
+<summary>Layout Analysis</summary>
+
+- [x] [YOLOv8](configs/layout/yolov8/README.md) ([Ultralytics Inc.](https://github.com/ultralytics/ultralytics))
+
+</details>
+
 For the detailed performance of the trained models, please refer to [configs](./configs).
 
 For details of MindSpore Lite and ACL inference models support, please refer to [MindOCR Models Support List](docs/en/inference/inference_quickstart.md) and [Third-party Models Support List](docs/en/inference/inference_thirdparty_quickstart.md) (PaddleOCR, MMOCR, etc.).
@@ -232,13 +239,12 @@ Frequently asked questions about configuring environment and mindocr, please ref
 <summary>News</summary>
 - 2023/12/05
 1. Add new trained models
-    - [YOLOv8 nano]()
+    - [YOLOv8](configs/layout/yolov8) for layout analysis
     - [VI-LayoutXLM](configs/kie/vi_layoutxlm/README_CN.md) for key information extraction
     - [PP-OCRv3](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv3_introduction.md)
         - [PP-OCRv3 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv3_det_cml.yaml) for text detection
         - [PP-OCRv3 SVTR](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv3_rec_distillation.yml) for text recognition
 2. Add new offline inference models
-    - [YOLOv8 nano]() for table recognition, inference on Ascend310
     - [PP-OCRv4](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv4_introduction.md) inference on Ascend310
         - [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml) for text detection
         - [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml) for text recognition

diff --git a/README_CN.md b/README_CN.md
@@ -183,6 +183,12 @@ python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_img
 - [x] [ABINet](configs/rec/abinet/README_CN.md) (CVPR'2021)
 </details>
 
+<details open markdown>
+<summary>版面分析</summary>
+
+- [x] [YOLOv8](configs/layout/yolov8/README_CN.md) ([Ultralytics Inc.](https://github.com/ultralytics/ultralytics))
+</details>
+
 关于以上模型的具体训练方法和结果，请参见[configs](./configs)下各模型子目录的readme文档。
 
 关于[MindSpore Lite](https://www.mindspore.cn/lite)和[ACL](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/inferapplicationdev/aclcppdevg/aclcppdevg_000004.html)模型推理的支持列表，
@@ -233,7 +239,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
 
 - 2023/12/05
 1. 增加新模型
-    - 文档版面识别 [YOLOv8 nano]()
+    - 版面分析[YOLOv8](configs/layout/yolov8)
     - 关键信息提取 [VI-LayoutXLM](configs/kie/vi_layoutxlm/README_CN.md)在线训练推理
     - [PP-OCRv3](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv3_introduction.md)三方模型训练推理
         - 文本检测 [PP-OCRv3 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv3_det_cml.yaml)

diff --git a/configs/det/dbnet/db_mobilenetv3_ppocrv3_param_map.json b/configs/det/dbnet/db_mobilenetv3_ppocrv3_param_map.json
diff --git a/configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yaml b/configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yaml
@@ -84,7 +84,7 @@ train:
 
   loader:
     shuffle: True
-    batch_size: 4
+    batch_size: 8
     drop_remainder: True
     num_workers: 8
 

diff --git a/configs/kie/vi_layoutxlm/README_CN.md b/configs/kie/vi_layoutxlm/README_CN.md
@@ -54,7 +54,7 @@ Table Format:
 
 | **模型** |**任务** |**环境配置** | **训练集** | **参数量** | **单卡批量** | **图模式单卡训练 (s/epoch)** | **图模式单卡训练 (ms/step)** | **图模式单卡训练 (FPS)** | **hmean** | **配置文件** | **模型权重下载** |
 | :-----: | :-----: |:-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----:
-| VI-LayoutXLM | SER | D910Ax1-MS2.1-G | XFUND_zh | 265.7 M | 4 |  7.53 | 203.48 | 19.66 | 93.31%  | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yaml)     | [ckpt](https://download.mindspore.cn/toolkits/mindocr/vi-layoutxlm/ser_vi_layoutxlm.ckpt) |
+| VI-LayoutXLM | SER | D910Ax1-MS2.1-G | XFUND_zh | 265.7 M | 8 |  3.06 | 169.7 | 47.2 | 93.31%  | [yaml](ser_vi_layoutxlm_xfund_zh.yaml)     | [ckpt](https://download.mindspore.cn/toolkits/mindocr/vi-layoutxlm/ser_vi_layoutxlm-f3c83585.ckpt) |
 </div>
 
 

diff --git a/configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yaml b/configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yaml
@@ -19,7 +19,7 @@ model:
   head :
     name: TokenClassificationHead
     num_classes: 7
-    use_visual_backbone: True
+    use_visual_backbone: False
     use_float16: True
   pretrained:
 
@@ -85,7 +85,7 @@ train:
 
   loader:
     shuffle: True
-    batch_size: 4
+    batch_size: 8
     drop_remainder: True
     num_workers: 8
 

diff --git a/configs/layout/yolov8/README.md b/configs/layout/yolov8/README.md
@@ -140,7 +140,7 @@ Please [download](#2-results) the exported MindIR file first, or refer to the [M
 python tools/export.py --model_name_or_config configs/layout/yolov8/yolov8n.yaml --data_shape 800 800 --local_ckpt_path /path/to/local_ckpt.ckpt
 ```
 
-The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table.
+The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table. `distribute` in yaml shall be set to False.
 
 **2. Environment Installation**
 

diff --git a/configs/layout/yolov8/README_CN.md b/configs/layout/yolov8/README_CN.md
@@ -43,7 +43,7 @@ Table Format:
 
 **注意:**
 
-- 环境配置：训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式}，其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如，D910x8-MS2.2-G 用于使用图模式在4张昇腾910 NPU上依赖Mindspore2.2版本进行训练。
+- 环境配置：训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式}，其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如，D910x4-MS2.2-G 用于使用图模式在4张昇腾910 NPU上依赖Mindspore2.2版本进行训练。
 - 如需在其他环境配置重现训练结果，请确保全局批量大小与原配置文件保持一致。
 - 模型都是从头开始训练的，无需任何预训练。关于训练和测试数据集的详细介绍，请参考[PubLayNet数据集准备](#3.1.2 PubLayNet数据集准备)章节。
 - YOLOv8的MindIR导出时的输入Shape均为(1, 3, 800, 800)。
@@ -154,7 +154,7 @@ python tools/eval.py --config configs/layout/yolov8/yolov8n.yaml
 python tools/export.py --model_name_or_config configs/layout/yolov8/yolov8n.yaml --data_shape 800 800 --local_ckpt_path /path/to/local_ckpt.ckpt
 ```
 
-其中，`data_shape`是导出MindIR时的模型输入Shape的height和width，下载链接中MindIR对应的shape值见[注释](#2-评估结果)。
+其中，`data_shape`是导出MindIR时的模型输入Shape的height和width，下载链接中MindIR对应的shape值见[注释](#2-评估结果)。yaml中的`distribute`需要被设置为False。
 
 **2. 环境搭建**
 

diff --git a/configs/layout/yolov8/yolov8n.yaml b/configs/layout/yolov8/yolov8n.yaml
@@ -7,7 +7,7 @@ system:
   log_interval: 100
   val_while_train: False
   drop_overflow_update: False
-  ckpt_max_keep: 100
+  ckpt_max_keep: 500
   device_id: 0
 
 common: