Skip to content

Commit

Permalink
Add and update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
horcham committed Dec 8, 2023
2 parents c8d7cb9 + 03036ee commit 357e50d
Show file tree
Hide file tree
Showing 181 changed files with 19,876 additions and 1,341 deletions.
63 changes: 48 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ English | [中文](README_CN.md)
[📚Tutorials](#tutorials) |
[🎁Model List](#model-list) |
[📰Dataset List](#dataset-list) |
[📖Frequently Asked Questions](#frequently-asked-questions) |
[🎉Notes](#notes)

</div>
Expand All @@ -36,17 +37,18 @@ MindOCR is an open-source toolbox for OCR development and application based on [

## Installation

<details close markdown>
<details open markdown>
<summary> Details </summary>

#### Prerequisites

MindOCR is built on MindSpore AI framework, which supports CPU/GPU/NPU devices.
MindOCR is compatible with the following framework versions. For details and installation guideline, please refer to the installation links shown below.

- mindspore >= 1.9 (ABINet requires mindspore >= 2.0) [[install](https://www.mindspore.cn/install)]
- mindspore >= 2.2.0 [[install](https://www.mindspore.cn/install)]
- python >= 3.7
- openmpi 4.0.3 (for distributed training/evaluation) [[install](https://www.open-mpi.org/software/ompi/v4.0/)]
- mindspore lite (for inference) [[install](docs/en/inference/environment.md)]
- mindspore lite (for offline inference) >= 2.2.0 [[install](docs/en/inference/environment.md)]


#### Dependency
Expand Down Expand Up @@ -126,10 +128,12 @@ python tools/eval.py \

For more illustration and usage, please refer to the model training section in [Tutorials](#tutorials).

### 3. Model Inference - Quick Guideline
### 3. Model Offline Inference - Quick Guideline

You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Third-party models** (PaddleOCR, MMOCR, etc.).
Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/inference_quickstart.md) or [Third-party Models Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Third-party models** (PaddleOCR, MMOCR, etc.). Please refer to the following documents
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
- [MindOCR Models Offline Inference](docs/en/inference/inference_quickstart.md)
- [Third-party Models Offline Inference](docs/en/inference/inference_thirdparty_quickstart.md).

## Tutorials

Expand All @@ -142,9 +146,12 @@ Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/infer
- [Text Recognition](docs/en/tutorials/training_recognition_custom_dataset.md)
- [Distributed Training](docs/en/tutorials/distribute_train.md)
- [Advance: Gradient Accumulation, EMA, Resume Training, etc](docs/en/tutorials/advanced_train.md)
- Inference and Deployment
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
- Inference with MindSpore
- [Python Online Inference](tools/infer/text/README.md)
- Inference with MindSpore Lite
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
- [MindOCR Models Offline Inference](docs/en/inference/inference_quickstart.md)
- [Third-party Models Offline Inference](docs/en/inference/inference_thirdparty_quickstart.md).
- Developer Guides
- [Customize Dataset](mindocr/data/README.md)
- [Customize Data Transformation](mindocr/data/transforms/README.md)
Expand Down Expand Up @@ -177,6 +184,12 @@ Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/infer

</details>

<details open markdown>
<summary>Key Information Extraction</summary>

- [x] [LayoutXLM](configs/kie/vi_layoutxlm/README_CN.md)
</details>

For the detailed performance of the trained models, please refer to [configs](./configs).

For details of MindSpore Lite and ACL inference models support, please refer to [MindOCR Models Support List](docs/en/inference/inference_quickstart.md) and [Third-party Models Support List](docs/en/inference/inference_thirdparty_quickstart.md) (PaddleOCR, MMOCR, etc.).
Expand Down Expand Up @@ -214,9 +227,28 @@ MindOCR provides a [dataset conversion tool](tools/dataset_converters) to OCR da

We will include more datasets for training and evaluation. This list will be continuously updated.

## Frequently Asked Questions
Frequently asked questions about environment and mindocr, please refer to [FAQ](docs/en/tutorials/frequently_asked_questions.md).

## Notes

### What is New

<details close markdown>
<summary>News</summary>
- 2023/12/05
1. Add new trained models
- [YOLOv8 nano]()
- [VI-LayoutXLM](configs/kie/vi_layoutxlm/README_CN.md) for key information extraction
- [PP-OCRv3](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv3_introduction.md)
- [PP-OCRv3 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv3_det_cml.yaml) for text detection
- [PP-OCRv3 SVTR](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv3_rec_distillation.yml) for text recognition
2. Add new offline inference models
- [YOLOv8 nano]() for table recognition, inference on Ascend310
- [PP-OCRv4](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv4_introduction.md) inference on Ascend310
- [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml) for text detection
- [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml) for text recognition

- 2023/07/06
1. Add new trained models
- [RobustScanner](configs/rec/robustscanner) for text recognition
Expand Down Expand Up @@ -275,13 +307,14 @@ which can be enabled by add "shape_list" to the `eval.dataset.output_columns` li
- 2023/03/13
1. Add system test and CI workflow.
2. Add modelarts adapter to allow training on OpenI platform. To train on OpenI:
```text
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
```
```text
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
```
</details>
### How to Contribute
Expand Down
85 changes: 62 additions & 23 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# MindOCR

[![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
[![CI](shttps://github.com/mindspore-lab/mindocr/actions/workflow/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
[![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
[![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
[![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
Expand All @@ -17,6 +17,7 @@
[📚使用教程](#使用教程) |
[🎁模型列表](#模型列表) |
[📰数据集列表](#数据集列表) |
[📖常见问题](#常见问题) |
[🎉更新日志](#更新日志)

</div>
Expand All @@ -36,16 +37,16 @@ MindOCR是一个基于[MindSpore](https://www.mindspore.cn/en) 框架开发的OC

## 安装教程

<details close markdown>
<details open markdown>

#### MindSpore相关环境准备

MindOCR基于MindSpore AI框架(支持CPU/GPU/NPU)开发,并适配以下框架版本。安装方式请参见下方的安装链接。

- mindspore >= 1.9 (ABINet 需要 mindspore >= 2.0) [[安装](https://www.mindspore.cn/install)]
- mindspore >= 2.2.0 [[安装](https://www.mindspore.cn/install)]
- python >= 3.7
- openmpi 4.0.3 (for distributed training/evaluation) [[安装](https://www.open-mpi.org/software/ompi/v4.0/)]
- mindspore lite (for inference) [[安装](docs/cn/inference/environment.md)]
- openmpi 4.0.3 (用于分布式训练与验证) [[安装](https://www.open-mpi.org/software/ompi/v4.0/)]
- mindspore lite (用于离线推理) >= 2.2.0 [[安装](docs/cn/inference/environment.md)]

#### 包依赖

Expand Down Expand Up @@ -93,9 +94,9 @@ python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_img

可以看到图像中的文字块均被检测出来并正确识别。更详细的用法介绍,请参考推理[教程](#使用教程)

### 2. 模型训练与评估-快速指南
### 2. 模型训练、评估与推理-快速指南

使用`tools/train.py`脚本可以很容易地训练OCR模型,该脚本可支持文本检测和识别模型训练。
使用`tools/train.py`脚本可以进行OCR模型训练,该脚本可支持文本检测和识别模型训练。
```shell
python tools/train.py --config {path/to/model_config.yaml}
```
Expand All @@ -112,19 +113,28 @@ python tools/train.py --config configs/det/dbnet/db++_r50_icdar15.yaml
python tools/train.py --config configs/rec/crnn/crnn_icdar15.yaml
```

类似的,使用`tools/eval.py` 脚本可以很容易地评估已训练好的模型,如下所示:
使用`tools/eval.py` 脚本可以评估已训练好的模型,如下所示:
```shell
python tools/eval.py \
--config {path/to/model_config.yaml} \
--opt eval.dataset_root={path/to/your_dataset} eval.ckpt_load_path={path/to/ckpt_file}
```

更多使用方法,请参考[使用教程](#使用教程)中的模型训练章节。
使用`tools/infer/text/predict_system.py` 脚本可进行模型推理,如下所示:
```shell
python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
--det_algorithm DB++ \
--rec_algorithm CRNN
```

更多使用方法,请参考[使用教程](#使用教程)中的模型训练、推理章节。

### 3. 模型推理-快速指南
### 3. 模型离线推理-快速指南

你可以在MindOCR中对**MindOCR自研模型****第三方模型**(如PaddleOCR、MMOCR等)进行MindSpore Lite推理。
请见[MindOCR自研模型推理-快速开始](docs/cn/inference/inference_quickstart.md)[第三方模型推理-快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)
你可以在MindOCR中对**MindOCR自研模型****第三方模型**(如PaddleOCR、MMOCR等)进行MindSpore Lite推理。请参考以下文档
- [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- [MindOCR自研模型离线推理](docs/cn/inference/inference_quickstart.md)
- [第三方模型离线推理](docs/cn/inference/inference_thirdparty_quickstart.md)

## 使用教程

Expand All @@ -137,9 +147,12 @@ python tools/eval.py \
- [文本识别](docs/cn/tutorials/training_recognition_custom_dataset.md)
- [分布式训练](docs/cn/tutorials/distribute_train.md)
- [进阶技巧:梯度累积,EMA,断点续训等](docs/cn/tutorials/advanced_train.md)
- 推理与部署
- [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- 使用MindSpore进行在线推理
- [基于Python的OCR在线推理](tools/infer/text/README.md)
- 使用MindSpore Lite进行离线推理
- [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- [MindOCR自研模型离线推理](docs/cn/inference/inference_quickstart.md)
- [第三方模型离线推理](docs/cn/inference/inference_thirdparty_quickstart.md)
- 开发者指南
- [如何自定义数据集](mindocr/data/README.md)
- [如何自定义数据增强方法](mindocr/data/transforms/README.md)
Expand Down Expand Up @@ -170,6 +183,12 @@ python tools/eval.py \
- [x] [ABINet](configs/rec/abinet/README_CN.md) (CVPR'2021)
</details>

<details open markdown>
<summary>关键信息提取</summary>

- [x] [LayoutXLM](configs/kie/vi_layoutxlm/README_CN.md)
</details>

关于以上模型的具体训练方法和结果,请参见[configs](./configs)下各模型子目录的readme文档。

关于[MindSpore Lite](https://www.mindspore.cn/lite)[ACL](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/inferapplicationdev/aclcppdevg/aclcppdevg_000004.html)模型推理的支持列表,
Expand Down Expand Up @@ -209,24 +228,43 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支

我们会在更多的数据集上进行模型训练和验证。该列表将持续更新。

## 常见问题
关于配置环境、使用mindocr遇到的高频问题,可以参考[文档](docs/cn/tutorials/frequently_asked_questions.md)

## 重要信息

### 更新日志
<details close markdown>
<summary>详细</summary>

- 2023/12/05
1. 增加新模型
- 文档版面识别 [YOLOv8 nano]()
- 关键信息提取 [VI-LayoutXLM](configs/kie/vi_layoutxlm/README_CN.md)在线训练推理
- [PP-OCRv3](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv3_introduction.md)三方模型训练推理
- 文本检测 [PP-OCRv3 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv3_det_cml.yaml)
- 文本识别 [PP-OCRv3 SVTR](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv3_rec_distillation.yml)
2. 离线推理
- 文档版面识别 [YOLOv8 nano]()昇腾310推理
- [PP-OCRv4](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv4_introduction.md)三方模型昇腾310推理
- 文本检测 [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml)
- 文本识别 [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml)

- 2023/07/06
1. 增加新模型
- 文本识别[RobustScanner](configs/rec/robustscanner)
- 文本识别 [RobustScanner](configs/rec/robustscanner)
- 2023/07/05
1. 增加新模型
- 文本识别[VISIONLAN](configs/rec/visionlan)
- 文本识别 [VISIONLAN](configs/rec/visionlan)
- 2023/06/29
1. 新增2个SoTA模型
- 文本检测[FCENet](configs/det/fcenet)
- 文本识别[MASTER](configs/rec/master)
- 文本检测 [FCENet](configs/det/fcenet)
- 文本识别 [MASTER](configs/rec/master)
- 2023/06/07
1. 增加新模型
- 文本检测[PSENet](configs/det/psenet)
- 文本检测[EAST](configs/det/east)
- 文本识别[SVTR](configs/rec/svtr)
- 文本检测 [PSENet](configs/det/psenet)
- 文本检测 [EAST](configs/det/east)
- 文本识别 [SVTR](configs/rec/svtr)
2. 添加更多基准数据集及其结果
- [totaltext](docs/cn/datasets/totaltext.md)
- [mlt2017](docs/cn/datasets/mlt2017.md)
Expand All @@ -237,8 +275,8 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支

- 2023/05/15
1. 增加新模型
- 文本检测[DBNet++](configs/det/dbnet)
- 文本识别[CRNN-Seq2Seq](configs/rec/rare)
- 文本检测 [DBNet++](configs/det/dbnet)
- 文本识别 [CRNN-Seq2Seq](configs/rec/rare)
- 在SynthText数据集上预训练的[DBNet](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt)
2. 添加更多基准数据集及其结果
- [SynthText](docs/cn/datasets/synthtext.md), [MSRA-TD500](docs/cn/datasets/td500.md), [CTW1500](docs/cn/datasets/ctw1500.md)
Expand Down Expand Up @@ -276,6 +314,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支
iv) 在网页的UI界面增加运行参数`enable_modelarts`并将其设置为True;
v) 填写其他项并启动训练任务。
```
</details>

### 如何贡献

Expand Down
1 change: 1 addition & 0 deletions configs/det/dbnet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ DBNet and DBNet++ were trained on the ICDAR2015, MSRA-TD500, SCUT-CTW1500, Total
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 83.53% | 86.62% | 85.05% | 13.3 s/epoch | 75.2 img/s | [yaml](db_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir) |
| | | | | | | | | | | |
| DBNet++ | D910x1-MS2.0-G | ResNet-50 | SynthText | 85.70% | 87.81% | 86.74% | 17.7 s/epoch | 56 img/s | [yaml](db++_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir) |
| DBNet++ | D910x1-MS2.2-G | ResNet-50 | SynthText | 86.81% | 86.85% | 86.86% | 12.7 s/epoch | 78.2 img/s | [yaml](db++_r50_icdar15_910.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |
</div>

> The input_shape for exported DBNet MindIR and DBNet++ MindIR in the links are `(1,3,736,1280)` and `(1,3,1152,2048)`, respectively.
Expand Down
2 changes: 1 addition & 1 deletion configs/det/dbnet/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ DBNet和DBNet++在ICDAR2015,MSRA-TD500,SCUT-CTW1500,Total-Text和MLT2017
| DBNet | D910x1-MS2.0-G | ResNet-50 | ImageNet | 83.53% | 86.62% | 85.05% | 13.3 s/epoch | 75.2 img/s | [yaml](db_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50-c3a4aa24-fbf95c82.mindir) |
| | | | | | | | | | | |
| DBNet++ | D910x1-MS2.0-G | ResNet-50 | SynthText | 85.70% | 87.81% | 86.74% | 17.7 s/epoch | 56 img/s | [yaml](db++_r50_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50-068166c2-9934aff0.mindir) |

| DBNet++ | D910x1-MS2.2-G | ResNet-50 | SynthText | 86.81% | 86.85% | 86.86% | 12.7 s/epoch | 78.2 img/s | [yaml](db++_r50_icdar15_910.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_910-35dc71f2-e61a9c37.mindir) |
</div>

> 链接中模型DBNet的MindIR导出时的输入Shape为`(1,3,736,1280)`,模型DBNet++的MindIR导出时的输入Shape为`(1,3,1152,2048)`
Expand Down
Loading

0 comments on commit 357e50d

Please sign in to comment.