Skip to content

Commit

Permalink
partial refactoring
Browse files Browse the repository at this point in the history
  • Loading branch information
horcham committed Dec 20, 2023
2 parents 2e35846 + 5842c7f commit 59facc7
Show file tree
Hide file tree
Showing 49 changed files with 4,318 additions and 1,767 deletions.
90 changes: 75 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ English | [中文](README_CN.md)
[📚Tutorials](#tutorials) |
[🎁Model List](#model-list) |
[📰Dataset List](#dataset-list) |
[📖Frequently Asked Questions](#frequently-asked-questions) |
[🎉Notes](#notes)

</div>
Expand All @@ -36,17 +37,18 @@ MindOCR is an open-source toolbox for OCR development and application based on [

## Installation

<details close markdown>
<details open markdown>
<summary> Details </summary>

#### Prerequisites

MindOCR is built on MindSpore AI framework, which supports CPU/GPU/NPU devices.
MindOCR is compatible with the following framework versions. For details and installation guideline, please refer to the installation links shown below.

- mindspore >= 1.9 (ABINet requires mindspore >= 2.0) [[install](https://www.mindspore.cn/install)]
- mindspore >= 2.2.0 [[install](https://www.mindspore.cn/install)]
- python >= 3.7
- openmpi 4.0.3 (for distributed training/evaluation) [[install](https://www.open-mpi.org/software/ompi/v4.0/)]
- mindspore lite (for inference) [[install](docs/en/inference/environment.md)]
- mindspore lite (for offline inference) >= 2.2.0 [[install](docs/en/inference/environment.md)]


#### Dependency
Expand Down Expand Up @@ -126,10 +128,12 @@ python tools/eval.py \

For more illustration and usage, please refer to the model training section in [Tutorials](#tutorials).

### 3. Model Inference - Quick Guideline
### 3. Model Offline Inference - Quick Guideline

You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Third-party models** (PaddleOCR, MMOCR, etc.).
Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/inference_quickstart.md) or [Third-party Models Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
You can do MindSpore Lite inference in MindOCR using **MindOCR models** or **Third-party models** (PaddleOCR, MMOCR, etc.). Please refer to the following documents
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
- [MindOCR Models Offline Inference - Quick Start](docs/en/inference/inference_quickstart.md)
- [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).

## Tutorials

Expand All @@ -142,9 +146,12 @@ Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/infer
- [Text Recognition](docs/en/tutorials/training_recognition_custom_dataset.md)
- [Distributed Training](docs/en/tutorials/distribute_train.md)
- [Advance: Gradient Accumulation, EMA, Resume Training, etc](docs/en/tutorials/advanced_train.md)
- Inference and Deployment
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
- Inference with MindSpore
- [Python Online Inference](tools/infer/text/README.md)
- Inference with MindSpore Lite
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
- [MindOCR Models Offline Inference - Quick Start](docs/en/inference/inference_quickstart.md)
- [Third-party Models Offline Inference - Quick Start](docs/en/inference/inference_thirdparty_quickstart.md).
- Developer Guides
- [Customize Dataset](mindocr/data/README.md)
- [Customize Data Transformation](mindocr/data/transforms/README.md)
Expand Down Expand Up @@ -177,6 +184,20 @@ Please refer to [MindOCR Models Inference - Quick Start](docs/en/inference/infer

</details>

<details open markdown>
<summary>Layout Analysis</summary>

- [x] [YOLOv8](configs/layout/yolov8/README.md) ([Ultralytics Inc.](https://github.com/ultralytics/ultralytics))

</details>

<details open markdown>
<summary>Key Information Extraction</summary>

- [x] [LayoutXLM SER](configs/kie/vi_layoutxlm/README_CN.md) (arXiv'2016)

</details>

For the detailed performance of the trained models, please refer to [configs](./configs).

For details of MindSpore Lite and ACL inference models support, please refer to [MindOCR Models Support List](docs/en/inference/inference_quickstart.md) and [Third-party Models Support List](docs/en/inference/inference_thirdparty_quickstart.md) (PaddleOCR, MMOCR, etc.).
Expand Down Expand Up @@ -212,11 +233,49 @@ MindOCR provides a [dataset conversion tool](tools/dataset_converters) to OCR da

</details>

<details close markdown>
<summary>Layout Analysis Datasets</summary>

- [PublayNet](https://github.com/ibm-aur-nlp/PubLayNet) [[paper](https://arxiv.org/abs/1908.07836)] [[download](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz)]

</details>

<details close markdown>
<summary>Key Information Extraction Datasets</summary>

- [XFUND](https://github.com/doc-analysis/XFUND) [[paper](https://aclanthology.org/2022.findings-acl.253/)] [[download](https://github.com/doc-analysis/XFUND/releases/tag/v1.0)]

</details>

We will include more datasets for training and evaluation. This list will be continuously updated.

## Frequently Asked Questions
Frequently asked questions about configuring environment and mindocr, please refer to [FAQ](docs/en/tutorials/frequently_asked_questions.md).

## Notes

### What is New

<details close markdown>
<summary>News</summary>

- 2023/12/14
1. Add new trained models
- [LayoutXLM SER](configs/kie/vi_layoutxlm) for key information extraction
- [VI-LayoutXLM SER](configs/kie/layoutlm_series) for key information extraction
- [PP-OCRv3 DBNet](configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml) for text detection and [PP-OCRv3 SVTR](configs/rec/svtr/svtr_ppocrv3_ch.yaml) for recognition, supporting online inferece and finetuning
2. Add more benchmark datasets and their results
- [XFUND](configs/kie/vi_layoutxlm/README_CN.md)
3. Multiple specifications support for Ascend 910: DBNet ResNet-50, DBNet++ ResNet-50, CRNN VGG7, SVTR-Tiny, FCENet, ABINet
- 2023/11/28
1. Add offline inference support for PP-OCRv4
- [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml) for text detection and [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml) for text recognition, supporting offline inferece
2. Fix bugs of third-party models offline inference
- 2023/11/17
1. Add new trained models
- [YOLOv8](configs/layout/yolov8) for layout analysis
2. Add more benchmark datasets and their results
- [PublayNet](configs/layout/yolov8/README_CN.md)
- 2023/07/06
1. Add new trained models
- [RobustScanner](configs/rec/robustscanner) for text recognition
Expand Down Expand Up @@ -275,13 +334,14 @@ which can be enabled by add "shape_list" to the `eval.dataset.output_columns` li
- 2023/03/13
1. Add system test and CI workflow.
2. Add modelarts adapter to allow training on OpenI platform. To train on OpenI:
```text
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
```
```text
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
```
</details>
### How to Contribute
Expand Down
111 changes: 88 additions & 23 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
[📚使用教程](#使用教程) |
[🎁模型列表](#模型列表) |
[📰数据集列表](#数据集列表) |
[📖常见问题](#常见问题) |
[🎉更新日志](#更新日志)

</div>
Expand All @@ -36,16 +37,16 @@ MindOCR是一个基于[MindSpore](https://www.mindspore.cn/en) 框架开发的OC

## 安装教程

<details close markdown>
<details open markdown>

#### MindSpore相关环境准备

MindOCR基于MindSpore AI框架(支持CPU/GPU/NPU)开发,并适配以下框架版本。安装方式请参见下方的安装链接。

- mindspore >= 1.9 (ABINet 需要 mindspore >= 2.0) [[安装](https://www.mindspore.cn/install)]
- mindspore >= 2.2.0 [[安装](https://www.mindspore.cn/install)]
- python >= 3.7
- openmpi 4.0.3 (for distributed training/evaluation) [[安装](https://www.open-mpi.org/software/ompi/v4.0/)]
- mindspore lite (for inference) [[安装](docs/cn/inference/environment.md)]
- openmpi 4.0.3 (用于分布式训练与验证) [[安装](https://www.open-mpi.org/software/ompi/v4.0/)]
- mindspore lite (用于离线推理) >= 2.2.0 [[安装](docs/cn/inference/environment.md)]

#### 包依赖

Expand Down Expand Up @@ -93,9 +94,9 @@ python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_img

可以看到图像中的文字块均被检测出来并正确识别。更详细的用法介绍,请参考推理[教程](#使用教程)

### 2. 模型训练与评估-快速指南
### 2. 模型训练、评估与推理-快速指南

使用`tools/train.py`脚本可以很容易地训练OCR模型,该脚本可支持文本检测和识别模型训练。
使用`tools/train.py`脚本可以进行OCR模型训练,该脚本可支持文本检测和识别模型训练。
```shell
python tools/train.py --config {path/to/model_config.yaml}
```
Expand All @@ -112,19 +113,28 @@ python tools/train.py --config configs/det/dbnet/db++_r50_icdar15.yaml
python tools/train.py --config configs/rec/crnn/crnn_icdar15.yaml
```

类似的,使用`tools/eval.py` 脚本可以很容易地评估已训练好的模型,如下所示:
使用`tools/eval.py` 脚本可以评估已训练好的模型,如下所示:
```shell
python tools/eval.py \
--config {path/to/model_config.yaml} \
--opt eval.dataset_root={path/to/your_dataset} eval.ckpt_load_path={path/to/ckpt_file}
```

更多使用方法,请参考[使用教程](#使用教程)中的模型训练章节。
使用`tools/infer/text/predict_system.py` 脚本可进行模型的在线推理,如下所示:
```shell
python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
--det_algorithm DB++ \
--rec_algorithm CRNN
```

更多使用方法,请参考[使用教程](#使用教程)中的模型训练、推理章节。

### 3. 模型推理-快速指南
### 3. 模型离线推理-快速指南

你可以在MindOCR中对**MindOCR自研模型****第三方模型**(如PaddleOCR、MMOCR等)进行MindSpore Lite推理。
请见[MindOCR自研模型推理-快速开始](docs/cn/inference/inference_quickstart.md)[第三方模型推理-快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)
你可以在MindOCR中对**MindOCR原生模型****第三方模型**(如PaddleOCR、MMOCR等)进行MindSpore Lite推理。请参考以下文档
- [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- [MindOCR原生模型离线推理 - 快速开始](docs/cn/inference/inference_quickstart.md)
- [第三方模型离线推理 - 快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)

## 使用教程

Expand All @@ -137,9 +147,12 @@ python tools/eval.py \
- [文本识别](docs/cn/tutorials/training_recognition_custom_dataset.md)
- [分布式训练](docs/cn/tutorials/distribute_train.md)
- [进阶技巧:梯度累积,EMA,断点续训等](docs/cn/tutorials/advanced_train.md)
- 推理与部署
- [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- 使用MindSpore进行在线推理
- [基于Python的OCR在线推理](tools/infer/text/README.md)
- 使用MindSpore Lite进行离线推理
- [基于Python/C++和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- [MindOCR原生模型离线推理 - 快速开始](docs/cn/inference/inference_quickstart.md)
- [第三方模型离线推理 - 快速开始](docs/cn/inference/inference_thirdparty_quickstart.md)
- 开发者指南
- [如何自定义数据集](mindocr/data/README.md)
- [如何自定义数据增强方法](mindocr/data/transforms/README.md)
Expand Down Expand Up @@ -170,10 +183,24 @@ python tools/eval.py \
- [x] [ABINet](configs/rec/abinet/README_CN.md) (CVPR'2021)
</details>

<details open markdown>
<summary>版面分析</summary>

- [x] [YOLOv8](configs/layout/yolov8/README_CN.md) ([Ultralytics Inc.](https://github.com/ultralytics/ultralytics))
</details>

<details open markdown>
<summary>关键信息抽取</summary>

- [x] [LayoutXLM SER](configs/kie/vi_layoutxlm/README_CN.md) (arXiv'2016)

</details>


关于以上模型的具体训练方法和结果,请参见[configs](./configs)下各模型子目录的readme文档。

关于[MindSpore Lite](https://www.mindspore.cn/lite)[ACL](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/inferapplicationdev/aclcppdevg/aclcppdevg_000004.html)模型推理的支持列表,
请参见[MindOCR自研模型推理支持列表](docs/cn/inference/inference_quickstart.md)[第三方模型推理支持列表](docs/cn/inference/inference_thirdparty_quickstart.md)(如PaddleOCR、MMOCR等)。
请参见[MindOCR原生模型推理支持列表](docs/cn/inference/inference_quickstart.md)[第三方模型推理支持列表](docs/cn/inference/inference_thirdparty_quickstart.md)(如PaddleOCR、MMOCR等)。

## 数据集列表

Expand Down Expand Up @@ -207,26 +234,63 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支

</details>

<details close markdown>
<summary>版面分析数据集</summary>

- [PublayNet](https://github.com/ibm-aur-nlp/PubLayNet) [[paper](https://arxiv.org/abs/1908.07836)] [[download](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz)]

</details>

<details close markdown>
<summary>关键信息抽取数据集</summary>

- [XFUND](https://github.com/doc-analysis/XFUND) [[paper](https://aclanthology.org/2022.findings-acl.253/)] [[download](https://github.com/doc-analysis/XFUND/releases/tag/v1.0)]

</details>

我们会在更多的数据集上进行模型训练和验证。该列表将持续更新。

## 常见问题
关于配置环境、使用mindocr遇到的高频问题,可以参考[常见问题](docs/cn/tutorials/frequently_asked_questions.md)

## 重要信息

### 更新日志
<details close markdown>
<summary>详细</summary>

- 2023/12/14
1. 增加新模型
- 关键信息抽取[LayoutXLM SER](configs/kie/vi_layoutxlm)
- 关键信息抽取[VI-LayoutXLM SER](configs/kie/layoutlm_series)
- 文本检测[PP-OCRv3 DBNet](configs/det/dbnet/db_mobilenetv3_ppocrv3.yaml)和文本识别[PP-OCRv3 SVTR](configs/rec/svtr/svtr_ppocrv3_ch.yaml),支持在线推理和微调训练
2. 添加更多基准数据集及其结果
- [XFUND](configs/kie/vi_layoutxlm/README_CN.md)
3. 昇腾910硬件多规格支持:DBNet ResNet-50、DBNet++ ResNet-50、CRNN VGG7、SVTR-Tiny、FCENet、ABINet
- 2023/11/28
1. 增加支持PP-OCRv4模型离线推理
- 文本检测 [PP-OCRv4 DBNet](deploy/py_infer/src/configs/det/ppocr/ch_PP-OCRv4_det_cml.yaml)和文本识别 [PP-OCRv4 CRNN](deploy/py_infer/src/configs/rec/ppocr/ch_PP-OCRv4_rec_distillation.yaml),支持离线推理
2. 修复第三方模型离线推理bug
- 2023/11/17
1. 增加新模型
- 版面分析[YOLOv8](configs/layout/yolov8)
2. 添加更多基准数据集及其结果
- [PublayNet](configs/layout/yolov8/README_CN.md)
- 2023/07/06
1. 增加新模型
- 文本识别[RobustScanner](configs/rec/robustscanner)
- 文本识别 [RobustScanner](configs/rec/robustscanner)
- 2023/07/05
1. 增加新模型
- 文本识别[VISIONLAN](configs/rec/visionlan)
- 文本识别 [VISIONLAN](configs/rec/visionlan)
- 2023/06/29
1. 新增2个SoTA模型
- 文本检测[FCENet](configs/det/fcenet)
- 文本识别[MASTER](configs/rec/master)
- 文本检测 [FCENet](configs/det/fcenet)
- 文本识别 [MASTER](configs/rec/master)
- 2023/06/07
1. 增加新模型
- 文本检测[PSENet](configs/det/psenet)
- 文本检测[EAST](configs/det/east)
- 文本识别[SVTR](configs/rec/svtr)
- 文本检测 [PSENet](configs/det/psenet)
- 文本检测 [EAST](configs/det/east)
- 文本识别 [SVTR](configs/rec/svtr)
2. 添加更多基准数据集及其结果
- [totaltext](docs/cn/datasets/totaltext.md)
- [mlt2017](docs/cn/datasets/mlt2017.md)
Expand All @@ -237,8 +301,8 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支

- 2023/05/15
1. 增加新模型
- 文本检测[DBNet++](configs/det/dbnet)
- 文本识别[CRNN-Seq2Seq](configs/rec/rare)
- 文本检测 [DBNet++](configs/det/dbnet)
- 文本识别 [CRNN-Seq2Seq](configs/rec/rare)
- 在SynthText数据集上预训练的[DBNet](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt)
2. 添加更多基准数据集及其结果
- [SynthText](docs/cn/datasets/synthtext.md), [MSRA-TD500](docs/cn/datasets/td500.md), [CTW1500](docs/cn/datasets/ctw1500.md)
Expand Down Expand Up @@ -276,6 +340,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支
iv) 在网页的UI界面增加运行参数`enable_modelarts`并将其设置为True;
v) 填写其他项并启动训练任务。
```
</details>

### 如何贡献

Expand Down
Loading

0 comments on commit 59facc7

Please sign in to comment.