partial refactoring

mindspore-lab · Dec 20, 2023 · 59facc7 · 59facc7
2 parents 2e35846 + 5842c7f
commit 59facc7
Show file tree

Hide file tree

Showing 49 changed files with 4,318 additions and 1,767 deletions.
diff --git a/README.md b/README.md
@@ -17,6 +17,7 @@ English | [中文](README_CN.md)
 [📚Tutorials](#tutorials) |
 [🎁Model List](#model-list) |
 [📰Dataset List](#dataset-list) |
+[📖Frequently Asked Questions](#frequently-asked-questions) |
 [🎉Notes](#notes)
 
 </div>
@@ -36,17 +37,18 @@ MindOCR is an open-source toolbox for OCR development and application based on [
 
 ## Installation
 
-<details close markdown>
+<details open markdown>
+<summary> Details </summary>
 
 #### Prerequisites
 
 MindOCR is built on MindSpore AI framework, which supports CPU/GPU/NPU devices.
 MindOCR is compatible with the following framework versions. For details and installation guideline, please refer to the installation links shown below.
 
-- mindspore >= 1.9 (ABINet requires mindspore >= 2.0) [[install](https://www.mindspore.cn/install)]
+- mindspore >= 2.2.0 [[install](https://www.mindspore.cn/install)]
 - python >= 3.7
 - openmpi 4.0.3 (for distributed training/evaluation)  [[install](https://www.open-mpi.org/software/ompi/v4.0/)]
-- mindspore lite (for inference)  [[install](docs/en/inference/environment.md)]
+- mindspore lite (for offline inference) >= 2.2.0  [[install](docs/en/inference/environment.md)]
 
 
 #### Dependency
@@ -126,10 +128,12 @@ python tools/eval.py \
 
 For more illustration and usage, please refer to the model training section in [Tutorials](#tutorials).
 
-### 3. Model Inference - Quick Guideline
+### 3. Model Offline Inference - Quick Guideline
 
-You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Third-party models** (PaddleOCR, MMOCR, etc.).
-Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/inference_quickstart.md) or [Third-party Models Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
+You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Third-party models** (PaddleOCR, MMOCR, etc.). Please refer to the following documents
+ - [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
+ - [MindOCR Models Offline Inference - Quick Start](docs/en/inference/inference_quickstart.md)
+ - [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
 
 ## Tutorials
 
@@ -142,9 +146,12 @@ Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/infer
     - [Text Recognition](docs/en/tutorials/training_recognition_custom_dataset.md)
     - [Distributed Training](docs/en/tutorials/distribute_train.md)
     - [Advance: Gradient Accumulation, EMA, Resume Training, etc](docs/en/tutorials/advanced_train.md)
-- Inference and Deployment
-    - [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
+- Inference with MindSpore
     - [Python Online Inference](tools/infer/text/README.md)
+- Inference with MindSpore Lite
+    - [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
+    - [MindOCR Models Offline Inference - Quick Start](docs/en/inference/inference_quickstart.md)
+    - [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
 - Developer Guides
     - [Customize Dataset](mindocr/data/README.md)
     - [Customize Data Transformation](mindocr/data/transforms/README.md)
@@ -177,6 +184,20 @@ Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/infer
 
 </details>
 
+<details open markdown>
+<summary>Layout Analysis</summary>
+
+- [x] [YOLOv8](configs/layout/yolov8/README.md) ([Ultralytics Inc.](https://github.com/ultralytics/ultralytics))
+
+</details>
+
+<details open markdown>
+<summary>Key Information Extraction</summary>
+
+- [x] [LayoutXLM SER](configs/kie/vi_layoutxlm/README_CN.md) (arXiv'2016)
+
+</details>
+
 For the detailed performance of the trained models, please refer to [configs](./configs).
 
 For details of MindSpore Lite and ACL inference models support, please refer to [MindOCR Models Support List](docs/en/inference/inference_quickstart.md) and [Third-party Models Support List](docs/en/inference/inference_thirdparty_quickstart.md) (PaddleOCR, MMOCR, etc.).
@@ -212,11 +233,49 @@ MindOCR provides a [dataset conversion tool](tools/dataset_converters) to OCR da
 
 </details>
 
+<details close markdown>
+<summary>Layout Analysis Datasets</summary>
+
+- [PublayNet](https://github.com/ibm-aur-nlp/PubLayNet) [[paper](https://arxiv.org/abs/1908.07836)] [[download](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz)]
+
+</details>
+
+<details close markdown>
+<summary>Key Information Extraction Datasets</summary>
+
+- [XFUND](https://github.com/doc-analysis/XFUND) [[paper](https://aclanthology.org/2022.findings-acl.253/)] [[download](https://github.com/doc-analysis/XFUND/releases/tag/v1.0)]
+
+</details>
+
 We will include more datasets for training and evaluation. This list will be continuously updated.
 
+## Frequently Asked Questions
+Frequently asked questions about configuring environment and mindocr, please refer to [FAQ](docs/en/tutorials/frequently_asked_questions.md).
+
 ## Notes
 
 ### What is New
+
+<details close markdown>
+<summary>News</summary>
+
+- 2023/12/14
+1. Add new trained models
+    - [LayoutXLM SER](configs/kie/vi_layoutxlm) for key information extraction
+    - [VI-LayoutXLM SER](configs/kie/layoutlm_series) for key information extraction
+    - [PP-OCRv3 DBNet](configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml) for text detection and [PP-OCRv3 SVTR](configs/rec/svtr/svtr_ppocrv3_ch.yaml) for recognition, supporting online inferece and finetuning
+2. Add more benchmark datasets and their results
+    - [XFUND](configs/kie/vi_layoutxlm/README_CN.md)
+3. Multiple specifications support for Ascend 910: DBNet ResNet-50, DBNet++ ResNet-50, CRNN VGG7, SVTR-Tiny, FCENet, ABINet
+- 2023/11/28
+1. Add offline inference support for PP-OCRv4
+    - [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml) for text detection and [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml) for text recognition, supporting offline inferece
+2. Fix bugs of third-party models offline inference
+- 2023/11/17
+1. Add new trained models
+    - [YOLOv8](configs/layout/yolov8) for layout analysis
+2. Add more benchmark datasets and their results
+    - [PublayNet](configs/layout/yolov8/README_CN.md)
 - 2023/07/06
 1. Add new trained models
     - [RobustScanner](configs/rec/robustscanner) for text recognition
@@ -275,13 +334,14 @@ which can be enabled by add "shape_list" to the `eval.dataset.output_columns` li
 - 2023/03/13
 1. Add system test and CI workflow.
 2. Add modelarts adapter to allow training on OpenI platform. To train on OpenI:
-  ```text
-    i)   Create a new training task on the openi cloud platform.
-    ii)  Link the dataset (e.g., ic15_mindocr) on the webpage.
-    iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
-    iv)  Add run parameter `enable_modelarts` and set True on the website UI interface.
-    v)   Fill in other blanks and launch.
-  ```
+    ```text
+        i)   Create a new training task on the openi cloud platform.
+        ii)  Link the dataset (e.g., ic15_mindocr) on the webpage.
+        iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
+        iv)  Add run parameter `enable_modelarts` and set True on the website UI interface.
+        v)   Fill in other blanks and launch.
+    ```
+</details>
 
 ### How to Contribute
 

diff --git a/README_CN.md b/README_CN.md
@@ -17,6 +17,7 @@
 [📚使用教程](#使用教程) |
 [🎁模型列表](#模型列表) |
 [📰数据集列表](#数据集列表) |
+[📖常见问题](#常见问题) |
 [🎉更新日志](#更新日志)
 
 </div>
@@ -36,16 +37,16 @@ MindOCR是一个基于[MindSpore](https://www.mindspore.cn/en) 框架开发的OC
 
 ## 安装教程
 
-<details close markdown>
+<details open markdown>
 
 #### MindSpore相关环境准备
 
 MindOCR基于MindSpore AI框架（支持CPU/GPU/NPU）开发，并适配以下框架版本。安装方式请参见下方的安装链接。
 
-- mindspore >= 1.9 (ABINet 需要 mindspore >= 2.0) [[安装](https://www.mindspore.cn/install)]
+- mindspore >= 2.2.0 [[安装](https://www.mindspore.cn/install)]
 - python >= 3.7
-- openmpi 4.0.3 (for distributed training/evaluation)  [[安装](https://www.open-mpi.org/software/ompi/v4.0/)]
-- mindspore lite (for inference)  [[安装](docs/cn/inference/environment.md)]
+- openmpi 4.0.3 (用于分布式训练与验证)  [[安装](https://www.open-mpi.org/software/ompi/v4.0/)]
+- mindspore lite (用于离线推理) >= 2.2.0  [[安装](docs/cn/inference/environment.md)]
 
 #### 包依赖
 
@@ -93,9 +94,9 @@ python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_img
 
 可以看到图像中的文字块均被检测出来并正确识别。更详细的用法介绍，请参考推理[教程](#使用教程)。
 
-### 2. 模型训练与评估-快速指南
+### 2. 模型训练、评估与推理-快速指南
 
-使用`tools/train.py`脚本可以很容易地训练OCR模型，该脚本可支持文本检测和识别模型训练。
+使用`tools/train.py`脚本可以进行OCR模型训练，该脚本可支持文本检测和识别模型训练。
 ```shell
 python tools/train.py --config {path/to/model_config.yaml}
 ```
@@ -112,19 +113,28 @@ python tools/train.py --config configs/det/dbnet/db++_r50_icdar15.yaml
 python tools/train.py --config configs/rec/crnn/crnn_icdar15.yaml
 ```
 
-类似的，使用`tools/eval.py` 脚本可以很容易地评估已训练好的模型，如下所示：
+使用`tools/eval.py` 脚本可以评估已训练好的模型，如下所示：
 ```shell
 python tools/eval.py \
     --config {path/to/model_config.yaml} \
     --opt eval.dataset_root={path/to/your_dataset} eval.ckpt_load_path={path/to/ckpt_file}
 ```
 
-更多使用方法，请参考[使用教程](#使用教程)中的模型训练章节。
+使用`tools/infer/text/predict_system.py` 脚本可进行模型的在线推理，如下所示：
+```shell
+python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
+                                          --det_algorithm DB++  \
+                                          --rec_algorithm CRNN
+```
+
+更多使用方法，请参考[使用教程](#使用教程)中的模型训练、推理章节。
 
-### 3. 模型推理-快速指南
+### 3. 模型离线推理-快速指南
 
-你可以在MindOCR中对**MindOCR自研模型**或**第三方模型**（如PaddleOCR、MMOCR等）进行MindSpore Lite推理。
-请见[MindOCR自研模型推理-快速开始](docs/cn/inference/inference_quickstart.md)或[第三方模型推理-快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)。
+你可以在MindOCR中对**MindOCR原生模型**或**第三方模型**（如PaddleOCR、MMOCR等）进行MindSpore Lite推理。请参考以下文档
+ - [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
+ - [MindOCR原生模型离线推理 - 快速开始](docs/cn/inference/inference_quickstart.md)
+ - [第三方模型离线推理 - 快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)
 
 ## 使用教程
 
@@ -137,9 +147,12 @@ python tools/eval.py \
     - [文本识别](docs/cn/tutorials/training_recognition_custom_dataset.md)
     - [分布式训练](docs/cn/tutorials/distribute_train.md)
     - [进阶技巧：梯度累积，EMA，断点续训等](docs/cn/tutorials/advanced_train.md)
-- 推理与部署
-    - [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
+- 使用MindSpore进行在线推理
     - [基于Python的OCR在线推理](tools/infer/text/README.md)
+- 使用MindSpore Lite进行离线推理
+    - [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
+    - [MindOCR原生模型离线推理 - 快速开始](docs/cn/inference/inference_quickstart.md)
+    - [第三方模型离线推理 - 快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)
 - 开发者指南
     - [如何自定义数据集](mindocr/data/README.md)
     - [如何自定义数据增强方法](mindocr/data/transforms/README.md)
@@ -170,10 +183,24 @@ python tools/eval.py \
 - [x] [ABINet](configs/rec/abinet/README_CN.md) (CVPR'2021)
 </details>
 
+<details open markdown>
+<summary>版面分析</summary>
+
+- [x] [YOLOv8](configs/layout/yolov8/README_CN.md) ([Ultralytics Inc.](https://github.com/ultralytics/ultralytics))
+</details>
+
+<details open markdown>
+<summary>关键信息抽取</summary>
+
+- [x] [LayoutXLM SER](configs/kie/vi_layoutxlm/README_CN.md) (arXiv'2016)
+
+</details>
+
+
 关于以上模型的具体训练方法和结果，请参见[configs](./configs)下各模型子目录的readme文档。
 
 关于[MindSpore Lite](https://www.mindspore.cn/lite)和[ACL](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/inferapplicationdev/aclcppdevg/aclcppdevg_000004.html)模型推理的支持列表，
-请参见[MindOCR自研模型推理支持列表](docs/cn/inference/inference_quickstart.md) 和 [第三方模型推理支持列表](docs/cn/inference/inference_thirdparty_quickstart.md)（如PaddleOCR、MMOCR等）。
+请参见[MindOCR原生模型推理支持列表](docs/cn/inference/inference_quickstart.md) 和 [第三方模型推理支持列表](docs/cn/inference/inference_thirdparty_quickstart.md)（如PaddleOCR、MMOCR等）。
 
 ## 数据集列表
 
@@ -207,26 +234,63 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
 
 </details>
 
+<details close markdown>
+<summary>版面分析数据集</summary>
+
+- [PublayNet](https://github.com/ibm-aur-nlp/PubLayNet) [[paper](https://arxiv.org/abs/1908.07836)] [[download](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz)]
+
+</details>
+
+<details close markdown>
+<summary>关键信息抽取数据集</summary>
+
+- [XFUND](https://github.com/doc-analysis/XFUND) [[paper](https://aclanthology.org/2022.findings-acl.253/)] [[download](https://github.com/doc-analysis/XFUND/releases/tag/v1.0)]
+
+</details>
+
 我们会在更多的数据集上进行模型训练和验证。该列表将持续更新。
 
+## 常见问题
+关于配置环境、使用mindocr遇到的高频问题，可以参考[常见问题](docs/cn/tutorials/frequently_asked_questions.md)。
+
 ## 重要信息
 
 ### 更新日志
+<details close markdown>
+<summary>详细</summary>
+
+- 2023/12/14
+1. 增加新模型
+    - 关键信息抽取[LayoutXLM SER](configs/kie/vi_layoutxlm)
+    - 关键信息抽取[VI-LayoutXLM SER](configs/kie/layoutlm_series)
+    - 文本检测[PP-OCRv3 DBNet](configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml)和文本识别[PP-OCRv3 SVTR](configs/rec/svtr/svtr_ppocrv3_ch.yaml)，支持在线推理和微调训练
+2. 添加更多基准数据集及其结果
+    - [XFUND](configs/kie/vi_layoutxlm/README_CN.md)
+3. 昇腾910硬件多规格支持：DBNet ResNet-50、DBNet++ ResNet-50、CRNN VGG7、SVTR-Tiny、FCENet、ABINet
+- 2023/11/28
+1. 增加支持PP-OCRv4模型离线推理
+    - 文本检测 [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml)和文本识别 [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml)，支持离线推理
+2. 修复第三方模型离线推理bug
+- 2023/11/17
+1. 增加新模型
+    - 版面分析[YOLOv8](configs/layout/yolov8)
+2. 添加更多基准数据集及其结果
+    - [PublayNet](configs/layout/yolov8/README_CN.md)
 - 2023/07/06
 1. 增加新模型
-    - 文本识别[RobustScanner](configs/rec/robustscanner)
+    - 文本识别 [RobustScanner](configs/rec/robustscanner)
 - 2023/07/05
 1. 增加新模型
-    - 文本识别[VISIONLAN](configs/rec/visionlan)
+    - 文本识别 [VISIONLAN](configs/rec/visionlan)
 - 2023/06/29
 1. 新增2个SoTA模型
-    - 文本检测[FCENet](configs/det/fcenet)
-    - 文本识别[MASTER](configs/rec/master)
+    - 文本检测 [FCENet](configs/det/fcenet)
+    - 文本识别 [MASTER](configs/rec/master)
 - 2023/06/07
 1. 增加新模型
-    - 文本检测[PSENet](configs/det/psenet)
-    - 文本检测[EAST](configs/det/east)
-    - 文本识别[SVTR](configs/rec/svtr)
+    - 文本检测 [PSENet](configs/det/psenet)
+    - 文本检测 [EAST](configs/det/east)
+    - 文本识别 [SVTR](configs/rec/svtr)
 2. 添加更多基准数据集及其结果
     - [totaltext](docs/cn/datasets/totaltext.md)
     - [mlt2017](docs/cn/datasets/mlt2017.md)
@@ -237,8 +301,8 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
 
 - 2023/05/15
 1. 增加新模型
-    - 文本检测[DBNet++](configs/det/dbnet)
-    - 文本识别[CRNN-Seq2Seq](configs/rec/rare)
+    - 文本检测 [DBNet++](configs/det/dbnet)
+    - 文本识别 [CRNN-Seq2Seq](configs/rec/rare)
     - 在SynthText数据集上预训练的[DBNet](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt)
 2. 添加更多基准数据集及其结果
     - [SynthText](docs/cn/datasets/synthtext.md), [MSRA-TD500](docs/cn/datasets/td500.md), [CTW1500](docs/cn/datasets/ctw1500.md)
@@ -276,6 +340,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
     iv)  在网页的UI界面增加运行参数`enable_modelarts`并将其设置为True；
     v)   填写其他项并启动训练任务。
   ```
+</details>
 
 ### 如何贡献