From 85a339470751edd1e6f186277202d6954412cd9a Mon Sep 17 00:00:00 2001 From: katekong Date: Mon, 12 Jun 2023 17:53:58 +0800 Subject: [PATCH 01/12] update readme --- configs/rec/crnn/README.md | 72 ++++++++++++++++++++++++++++---------- 1 file changed, 54 insertions(+), 18 deletions(-) diff --git a/configs/rec/crnn/README.md b/configs/rec/crnn/README.md index 20d560f60..25e49f09f 100644 --- a/configs/rec/crnn/README.md +++ b/configs/rec/crnn/README.md @@ -39,33 +39,22 @@ According to our experiments, the evaluation results on public benchmark dataset
-| **Model** | **Context** | **Backbone** | **Avg Accuracy** | **Train T.** | **Recipe** | **Download** | -| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | -| CRNN | D910x8-MS1.8-G | VGG7 | 82.03% | 2445 s/epoch | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-3a19e349.mindir) | -| CRNN | D910x8-MS1.8-G | ResNet34_vd | 84.45% | 2118 s/epoch | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-2f016384.mindir) | +| **Model** | **Context** | **Backbone** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** | +| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | +| CRNN | D910x8-MS1.8-G | VGG7 | 82.03% | 2445 s/epoch | 5802.71 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | +| CRNN | D910x8-MS1.8-G | ResNet34_vd | 84.45% | 2118 s/epoch | 6694.84 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) |
-
+- Detailed accuracy results for each benchmark dataset:
- Detailed accuracy results for each benchmark dataset | **Model** | **Backbone** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | | CRNN | VGG7 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | | CRNN | ResNet34_vd | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% |
-
-### Performance - -#### Training Perf. - -| Device | Model | Backbone | Dataset | Params | Batch size per card | Graph train 8P (s/epoch) | Graph train 8P (ms/step) | Graph train 8P (FPS) | -| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | -| Ascend910| CRNN | VGG7 | MJ+ST | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | -| Ascend910| CRNN | ResNet34_vd | MJ+ST | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | - -#### Inference Perf. +### Inference Perf. | Device | Env | Model | Backbone | Params | Test Dataset | Batch size | Graph infer 1P (FPS) | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Ascend310P | Lite2.0 | CRNN | ResNet34_vd | 24.48 M | IC15 | 1 | 361.09 | @@ -76,6 +65,7 @@ According to our experiments, the evaluation results on public benchmark dataset - To reproduce the result on other contexts, please ensure the global batch size is the same. - The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). - The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. +- The input shape for exported MindIR file in the download link is (32, 100). ## 3. Quick Start @@ -378,13 +368,59 @@ After training, evaluation results on the benchmark test set are as follows, whe | **Model** | **Language** | **Context** |**Backbone** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** | | :-----: | :-----: | :--------: | :--------: | :--------: | :--------: | :--------: | :---------: | :--------: | :---------: | :-----------: | -| CRNN | Chinese | D910x4-MS1.10-G | ResNet34_vd | 60.45% | 65.95% | 97.68% | 647 s/epoch | 1180 | [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir]() | +| CRNN | Chinese | D910x4-MS1.10-G | ResNet34_vd | 60.45% | 65.95% | 97.68% | 647 s/epoch | 1180 | [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c-105bccb2.mindir) | +**Notes:** +- The input shape for exported MindIR file in the download link is (32, 320). + ### Training with Custom Datasets You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md). +## 6. MindSpore Lite Inference + +To inference with MindSpot Lite on Ascend 310, please refer to the tutorial [MindOCR Inference](../../../docs/en/inference/inference_tutorial_en.md). In short, the whole process consists of the following steps: + +**1. Model Export** + +Please [download](#2-results) the exported MindIR file first, or refer to the [Model Export](../../README.md) tutorial and use the following command to export the trained ckpt model to MindIR file: + +```shell +python tools/export.py --model_name crnn --data_shape 32 100 --local_ckpt_path /path/to/local_ckpt.ckpt +# or +python tools/export.py --model_name configs/rec/crnn/crnn_resnet34.yaml --data_shape 32 100 --local_ckpt_path /path/to/local_ckpt.ckpt +``` + +The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table. + + +**2. Environment Installation** + +Please refer to [Environment Installation](../../../docs/en/inference/environment_en.md#2-mindspore-lite-inference) tutorial to configure the MindSpore Lite inference environment. + +**3. Model Conversion** + +Please refer to [Model Conversion](../../../docs/en/inference/convert_tutorial_en.md#1-mindocr-models), +and use the `converter_lite` tool for offline conversion of the MindIR file, where the `input_shape` in `configFile` needs to be filled in with the value from MindIR export, +as mentioned above (32, 100), and the format is NCHW. + +**4. Inference** + +Assuming that you obtain output.mindir after model conversion, go to the `deploy/py_infer` directory, and use the following command for inference: + +```shell +python infer.py \ + --input_images_dir=/your_path_to/test_images \ + --device=Ascend \ + --device_id=0 \ + --det_model_path=your_path_to/output.mindir \ + --det_config_path=../../configs/rec/crnn/crnn_resnet34.yaml \ + --backend=lite \ + --res_save_dir=results_dir +``` + + ## References From 575e050b2591372d685be210184be4e78617d6bb Mon Sep 17 00:00:00 2001 From: katekong Date: Mon, 12 Jun 2023 17:57:01 +0800 Subject: [PATCH 02/12] Revert "update readme" This reverts commit 85a339470751edd1e6f186277202d6954412cd9a. --- configs/rec/crnn/README.md | 72 ++++++++++---------------------------- 1 file changed, 18 insertions(+), 54 deletions(-) diff --git a/configs/rec/crnn/README.md b/configs/rec/crnn/README.md index 25e49f09f..20d560f60 100644 --- a/configs/rec/crnn/README.md +++ b/configs/rec/crnn/README.md @@ -39,22 +39,33 @@ According to our experiments, the evaluation results on public benchmark dataset
-| **Model** | **Context** | **Backbone** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** | -| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | -| CRNN | D910x8-MS1.8-G | VGG7 | 82.03% | 2445 s/epoch | 5802.71 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | -| CRNN | D910x8-MS1.8-G | ResNet34_vd | 84.45% | 2118 s/epoch | 6694.84 | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | +| **Model** | **Context** | **Backbone** | **Avg Accuracy** | **Train T.** | **Recipe** | **Download** | +| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | +| CRNN | D910x8-MS1.8-G | VGG7 | 82.03% | 2445 s/epoch | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-3a19e349.mindir) | +| CRNN | D910x8-MS1.8-G | ResNet34_vd | 84.45% | 2118 s/epoch | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-2f016384.mindir) |
-- Detailed accuracy results for each benchmark dataset: +
+ Detailed accuracy results for each benchmark dataset | **Model** | **Backbone** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | | CRNN | VGG7 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | | CRNN | ResNet34_vd | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% |
+
-### Inference Perf. +### Performance + +#### Training Perf. + +| Device | Model | Backbone | Dataset | Params | Batch size per card | Graph train 8P (s/epoch) | Graph train 8P (ms/step) | Graph train 8P (FPS) | +| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | +| Ascend910| CRNN | VGG7 | MJ+ST | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | +| Ascend910| CRNN | ResNet34_vd | MJ+ST | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | + +#### Inference Perf. | Device | Env | Model | Backbone | Params | Test Dataset | Batch size | Graph infer 1P (FPS) | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Ascend310P | Lite2.0 | CRNN | ResNet34_vd | 24.48 M | IC15 | 1 | 361.09 | @@ -65,7 +76,6 @@ According to our experiments, the evaluation results on public benchmark dataset - To reproduce the result on other contexts, please ensure the global batch size is the same. - The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). - The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. -- The input shape for exported MindIR file in the download link is (32, 100). ## 3. Quick Start @@ -368,59 +378,13 @@ After training, evaluation results on the benchmark test set are as follows, whe | **Model** | **Language** | **Context** |**Backbone** | **Scene** | **Web** | **Document** | **Train T.** | **FPS** | **Recipe** | **Download** | | :-----: | :-----: | :--------: | :--------: | :--------: | :--------: | :--------: | :---------: | :--------: | :---------: | :-----------: | -| CRNN | Chinese | D910x4-MS1.10-G | ResNet34_vd | 60.45% | 65.95% | 97.68% | 647 s/epoch | 1180 | [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c-105bccb2.mindir) | +| CRNN | Chinese | D910x4-MS1.10-G | ResNet34_vd | 60.45% | 65.95% | 97.68% | 647 s/epoch | 1180 | [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt) \| [mindir]() | -**Notes:** -- The input shape for exported MindIR file in the download link is (32, 320). - ### Training with Custom Datasets You can train models for different languages with your own custom datasets. Loading the pretrained Chinese model to finetune on your own dataset usually yields better results than training from scratch. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md). -## 6. MindSpore Lite Inference - -To inference with MindSpot Lite on Ascend 310, please refer to the tutorial [MindOCR Inference](../../../docs/en/inference/inference_tutorial_en.md). In short, the whole process consists of the following steps: - -**1. Model Export** - -Please [download](#2-results) the exported MindIR file first, or refer to the [Model Export](../../README.md) tutorial and use the following command to export the trained ckpt model to MindIR file: - -```shell -python tools/export.py --model_name crnn --data_shape 32 100 --local_ckpt_path /path/to/local_ckpt.ckpt -# or -python tools/export.py --model_name configs/rec/crnn/crnn_resnet34.yaml --data_shape 32 100 --local_ckpt_path /path/to/local_ckpt.ckpt -``` - -The `data_shape` is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in [Notes](#2-results) under results table. - - -**2. Environment Installation** - -Please refer to [Environment Installation](../../../docs/en/inference/environment_en.md#2-mindspore-lite-inference) tutorial to configure the MindSpore Lite inference environment. - -**3. Model Conversion** - -Please refer to [Model Conversion](../../../docs/en/inference/convert_tutorial_en.md#1-mindocr-models), -and use the `converter_lite` tool for offline conversion of the MindIR file, where the `input_shape` in `configFile` needs to be filled in with the value from MindIR export, -as mentioned above (32, 100), and the format is NCHW. - -**4. Inference** - -Assuming that you obtain output.mindir after model conversion, go to the `deploy/py_infer` directory, and use the following command for inference: - -```shell -python infer.py \ - --input_images_dir=/your_path_to/test_images \ - --device=Ascend \ - --device_id=0 \ - --det_model_path=your_path_to/output.mindir \ - --det_config_path=../../configs/rec/crnn/crnn_resnet34.yaml \ - --backend=lite \ - --res_save_dir=results_dir -``` - - ## References From dfe53139417b7c5e6157a5426a637e6792c92fe8 Mon Sep 17 00:00:00 2001 From: katekong Date: Mon, 19 Jun 2023 10:04:30 +0800 Subject: [PATCH 03/12] update rec img transform --- mindocr/data/transforms/rec_transforms.py | 173 +++++++++++++--------- 1 file changed, 99 insertions(+), 74 deletions(-) diff --git a/mindocr/data/transforms/rec_transforms.py b/mindocr/data/transforms/rec_transforms.py index 0897844cd..d019e1b9b 100644 --- a/mindocr/data/transforms/rec_transforms.py +++ b/mindocr/data/transforms/rec_transforms.py @@ -11,9 +11,9 @@ "RecCTCLabelEncode", "RecAttnLabelEncode", "RecResizeImg", + "RecResizeNormImg", "RecResizeNormForInfer", "SVTRRecResizeImg", - "Rotate90IfVertical", "ClsLabelEncode", ] @@ -247,7 +247,13 @@ def str2idx(text: str, label_dict: Dict[str, int], max_text_len: int = 23, lower # TODO: reorganize the code for different resize transformation in rec task -def resize_norm_img(img, image_shape, padding=True, interpolation=cv2.INTER_LINEAR): +def resize_norm_img(img, + image_shape, + padding=True, + norm_before_pad=False, + mean=[127.0, 127.0, 127.0], + std=[127.0, 127.0, 127.0], + interpolation=cv2.INTER_LINEAR): """ resize image Args: @@ -261,7 +267,8 @@ def resize_norm_img(img, image_shape, padding=True, interpolation=cv2.INTER_LINE w = img.shape[1] c = img.shape[2] if not padding: - resized_image = cv2.resize(img, (imgW, imgH), interpolation=interpolation) + resized_image = cv2.resize( + img, (imgW, imgH), interpolation=interpolation) resized_w = imgW else: ratio = w / float(h) @@ -271,32 +278,45 @@ def resize_norm_img(img, image_shape, padding=True, interpolation=cv2.INTER_LINE resized_w = int(math.ceil(imgH * ratio)) resized_image = cv2.resize(img, (resized_w, imgH)) - """ - resized_image = resized_image.astype('float32') - if image_shape[0] == 1: - resized_image = resized_image / 255 - resized_image = resized_image[np.newaxis, :] - else: - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - """ - padding_im = np.zeros((imgH, imgW, c), dtype=np.uint8) - padding_im[:, 0:resized_w, :] = resized_image valid_ratio = min(1.0, float(resized_w / imgW)) - return padding_im, valid_ratio + + if padding: + if norm_before_pad: + resized_image = (resized_image - mean) / std + + padded_img = np.zeros((imgH, imgW, c), dtype=resized_image.dtype) + padded_img[:, 0:resized_w, :] = resized_image + + if not norm_before_pad: + padded_img = (padded_img - mean) / std + + return padded_img, valid_ratio + else: + resized_image = (resized_image - mean) / std + return resized_image, valid_ratio # TODO: check diff from resize_norm_img -def resize_norm_img_chinese(img, image_shape): - """adopted from paddle""" +def resize_norm_img_chinese(img, + image_shape, + norm_before_pad=False, + mean=[127.0, 127.0, 127.0], + std=[127.0, 127.0, 127.0], + interpolation=cv2.INTER_LINEAR): + ''' + resize image with aspect-ratio keeping and padding + Args: + img: shape (H, W, C) + image_shape: image shape after resize, in (C, H, W) + + ''' imgH, imgW = image_shape # todo: change to 0 and modified image shape max_wh_ratio = imgW * 1.0 / imgH h, w = img.shape[0], img.shape[1] c = img.shape[2] ratio = w * 1.0 / h - + max_wh_ratio = min(max(max_wh_ratio, ratio), max_wh_ratio) imgW = int(imgH * max_wh_ratio) if math.ceil(imgH * ratio) > imgW: resized_w = imgW @@ -304,48 +324,80 @@ def resize_norm_img_chinese(img, image_shape): resized_w = int(math.ceil(imgH * ratio)) resized_image = cv2.resize(img, (resized_w, imgH)) - """ - resized_image = resized_image.astype('float32') - if image_shape[0] == 1: - resized_image = resized_image / 255 - resized_image = resized_image[np.newaxis, :] - else: - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - """ - # padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) - padding_im = np.zeros((imgH, imgW, c), dtype=np.uint8) - # padding_im[:, :, 0:resized_w] = resized_image - padding_im[:, 0:resized_w, :] = resized_image valid_ratio = min(1.0, float(resized_w / imgW)) - return padding_im, valid_ratio + if norm_before_pad: + resized_image = (resized_image - mean) / std -# TODO: remove infer_mode and character_dict_path if they are not necesary -class RecResizeImg(object): - """adopted from paddle - resize, convert from hwc to chw, rescale pixel value to -1 to 1 - """ + padded_img = np.zeros((imgH, imgW, c), dtype=resized_image.dtype) + padded_img[:, 0:resized_w, :] = resized_image - def __init__(self, image_shape, infer_mode=False, character_dict_path=None, padding=True, **kwargs): + if not norm_before_pad: + padded_img = (padded_img - mean) / std + + return padded_img, valid_ratio + + +class RecResizeNormImg(object): + ''' adopted from paddle + Resize and normalize image, and pad image if needed. + + Args: + norm_before_pad: If True, perform normalization before padding (by doing so, the padding values will beall zero. Good practice.). Otherwise, per Default: False + ''' + def __init__(self, + image_shape, + infer_mode=False, + character_dict_path=None, + padding=True, + norm_before_pad=False, + mean=[127.0, 127.0, 127.0], + std=[127.0, 127.0, 127.0], + **kwargs): self.image_shape = image_shape self.infer_mode = infer_mode self.character_dict_path = character_dict_path self.padding = padding + self.norm_before_pad = norm_before_pad + self.mean = np.array(mean, dtype="float32") + self.std = np.array(std, dtype="float32") def __call__(self, data): - img = data["image"] + img = data['image'] if self.infer_mode and self.character_dict_path is not None: - norm_img, valid_ratio = resize_norm_img_chinese(img, self.image_shape) + norm_img, valid_ratio = resize_norm_img_chinese(img, + self.image_shape, + self.norm_before_pad, + self.mean, + self.std + ) else: - norm_img, valid_ratio = resize_norm_img(img, self.image_shape, self.padding) - data["image"] = norm_img - data["valid_ratio"] = valid_ratio - # TODO: data['shape_list'] = ? + norm_img, valid_ratio = resize_norm_img(img, + self.image_shape, + self.padding, + self.norm_before_pad, + self.mean, + self.std, + ) + data['image'] = norm_img + data['valid_ratio'] = valid_ratio return data +# TODO: remove infer_mode and character_dict_path if they are not necesary +class RecResizeImg(RecResizeNormImg): + ''' + This is to make compatible with older version code that uses RecResizeImg, which is to be updated. + + TODO: replace RecResizeImg followed by NormlaizeImage in yaml files with RecResizeNormImg op. + ''' + def __init__(self, image_shape, infer_mode=False, character_dict_path=None, padding=True, **kwargs): + super.__init__( + image_shape, infer_mode, character_dict_path, padding, norm_befoer_pad=False, + mean=[0., 0., 0.], std=[1., 1., 1.], + ) + + class SVTRRecResizeImg(object): def __init__(self, image_shape, padding=True, **kwargs): self.image_shape = image_shape @@ -425,9 +477,7 @@ def __call__(self, data): # TODO: norm before padding - data["shape_list"] = np.array( - [h, w, resize_h / h, resize_w / w], dtype=np.float32 - ) # TODO: reformat, currently align to det + data['shape_list'] = [h, w, resize_h / h, resize_w / w] # TODO: reformat, currently align to det if self.norm_before_pad: resized_img = self.norm(resized_img) @@ -444,31 +494,6 @@ def __call__(self, data): return data -class Rotate90IfVertical: - """Rotate the image by 90 degree when the height/width ratio is larger than the given threshold. - Note: It needs to be called before image resize.""" - - def __init__(self, threshold: float = 1.5, direction: str = "counterclockwise", **kwargs): - self.threshold = threshold - - if direction == "counterclockwise": - self.flag = cv2.ROTATE_90_COUNTERCLOCKWISE - elif direction == "clockwise": - self.flag = cv2.ROTATE_90_CLOCKWISE - else: - raise ValueError("Unsupported direction") - - def __call__(self, data): - img = data["image"] - - h, w, _ = img.shape - if h / w > self.threshold: - img = cv2.rotate(img, self.flag) - - data["image"] = img - return data - - class ClsLabelEncode(object): def __init__(self, label_list, **kwargs): self.label_list = label_list From 95c5b52958d3fbfdab03878545e905bbc1d99e67 Mon Sep 17 00:00:00 2001 From: katekong Date: Mon, 19 Jun 2023 10:28:36 +0800 Subject: [PATCH 04/12] add config --- configs/rec/crnn/crnn_resnet34_server.yaml | 150 +++++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 configs/rec/crnn/crnn_resnet34_server.yaml diff --git a/configs/rec/crnn/crnn_resnet34_server.yaml b/configs/rec/crnn/crnn_resnet34_server.yaml new file mode 100644 index 000000000..756266868 --- /dev/null +++ b/configs/rec/crnn/crnn_resnet34_server.yaml @@ -0,0 +1,150 @@ +system: + mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore + distribute: True + amp_level: 'O3' + seed: 42 + log_interval: 100 + val_while_train: True + drop_overflow_update: False + +common: + character_dict_path: &character_dict_path mindocr/utils/dict/en_dict.txt + num_classes: &num_classes 96 # num_chars_in_dict+1, TODO: retreive it from dict or check correctness + max_text_len: &max_text_len 24 + infer_mode: &infer_mode False + use_space_char: &use_space_char True + lower: &lower False + batch_size: &batch_size 64 + +model: + type: rec + transform: null + backbone: + name: rec_resnet34 + pretrained: False + neck: + name: RNNEncoder + hidden_size: 256 + head: + name: CTCHead + weight_init: crnn_customised + bias_init: crnn_customised + out_channels: *num_classes + +postprocess: + name: RecCTCLabelDecode + character_dict_path: *character_dict_path + use_space_char: *use_space_char + +metric: + name: RecMetric + main_indicator: acc + character_dict_path: *character_dict_path + ignore_space: True + print_flag: False + +loss: + name: CTCLoss + pred_seq_len: 25 # TODO: retrieve from the network output shape. + max_label_len: *max_text_len # this value should be smaller than pre_seq_len + batch_size: *batch_size + +scheduler: + scheduler: warmup_cosine_decay + min_lr: 0.000001 + lr: 0.001 + num_epochs: 30 + warmup_epochs: 2 + decay_epochs: 28 + +optimizer: + opt: adamw + filter_bias_and_bn: True + momentum: 0.95 + weight_decay: 0.0001 + nesterov: False + +loss_scaler: + type: dynamic + loss_scale: 512 + scale_factor: 2.0 + scale_window: 1000 + +train: + ckpt_save_dir: './crnn_resnet34_server_adj' + pred_cast_fp32: False # let CTCLoss cast internally + ema: True # added + dataset_sink_mode: False + dataset: + type: LMDBDataset + dataset_root: /path/to/data_lmdb_release/ + data_dir: training/ + # label_file: # not required when using LMDBDataset + sample_ratio: 1.0 + shuffle: True + transform_pipeline: + - DecodeImage: + img_mode: RGB # changed + to_float32: False + - RecCTCLabelEncode: + max_text_len: *max_text_len + character_dict_path: *character_dict_path + use_space_char: *use_space_char + lower: *lower + - RecResizeNormImg: + image_shape: [32, 100] # H, W + infer_mode: *infer_mode + character_dict_path: *character_dict_path + padding: True # aspect ratio will be preserved if true. changed + norm_before_pad: True # changed + - ToCHWImage: + # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize + output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] + net_input_column_index: [0] # input indices for network forward func in output_columns + label_column_index: [1] # input indices marked as label + #keys_for_loss: 4 # num labels for loss func + + loader: + shuffle: True + batch_size: *batch_size + drop_remainder: True + max_rowsize: 12 + num_workers: 8 + +eval: + ckpt_load_path: ./crnn_resnet34_server_adj/best.ckpt + dataset_sink_mode: False + dataset: + type: LMDBDataset + dataset_root: /path/to/data_lmdb_release/ + data_dir: validation/ + # label_file: # not required when using LMDBDataset + sample_ratio: 1.0 + shuffle: False + transform_pipeline: + - DecodeImage: + img_mode: RGB # changed + to_float32: False + - RecCTCLabelEncode: + max_text_len: *max_text_len + character_dict_path: *character_dict_path + use_space_char: *use_space_char + lower: *lower + - RecResizeNormImg: + image_shape: [32, 100] # H, W + infer_mode: *infer_mode + character_dict_path: *character_dict_path + padding: True # aspect ratio will be preserved if true. changed + norm_before_pad: True # changed + - ToCHWImage: + # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize + output_columns: ['image', 'text_padded', 'text_length'] # TODO return text string padding w/ fixed length, and a scaler to indicate the length + net_input_column_index: [0] # input indices for network forward func in output_columns + label_column_index: [1, 2] # input indices marked as label + + loader: + shuffle: False # TODO: tbc + batch_size: 64 + drop_remainder: False + max_rowsize: 12 + num_workers: 8 \ No newline at end of file From fbf174f60d199c8754843533fda4a0ef7ed9796d Mon Sep 17 00:00:00 2001 From: katekong Date: Tue, 20 Jun 2023 09:31:06 +0800 Subject: [PATCH 05/12] add back rotate transform --- configs/rec/crnn/crnn_resnet34_server.yaml | 4 ++-- mindocr/data/transforms/rec_transforms.py | 26 ++++++++++++++++++++++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/configs/rec/crnn/crnn_resnet34_server.yaml b/configs/rec/crnn/crnn_resnet34_server.yaml index 756266868..5932284b9 100644 --- a/configs/rec/crnn/crnn_resnet34_server.yaml +++ b/configs/rec/crnn/crnn_resnet34_server.yaml @@ -71,7 +71,7 @@ loss_scaler: scale_window: 1000 train: - ckpt_save_dir: './crnn_resnet34_server_adj' + ckpt_save_dir: './crnn_resnet34_server' pred_cast_fp32: False # let CTCLoss cast internally ema: True # added dataset_sink_mode: False @@ -112,7 +112,7 @@ train: num_workers: 8 eval: - ckpt_load_path: ./crnn_resnet34_server_adj/best.ckpt + ckpt_load_path: ./crnn_resnet34_server/best.ckpt dataset_sink_mode: False dataset: type: LMDBDataset diff --git a/mindocr/data/transforms/rec_transforms.py b/mindocr/data/transforms/rec_transforms.py index d019e1b9b..3fd0db2d0 100644 --- a/mindocr/data/transforms/rec_transforms.py +++ b/mindocr/data/transforms/rec_transforms.py @@ -14,6 +14,7 @@ "RecResizeNormImg", "RecResizeNormForInfer", "SVTRRecResizeImg", + "Rotate90IfVertical", "ClsLabelEncode", ] @@ -494,6 +495,31 @@ def __call__(self, data): return data +class Rotate90IfVertical: + """Rotate the image by 90 degree when the height/width ratio is larger than the given threshold. + Note: It needs to be called before image resize.""" + + def __init__(self, threshold: float = 1.5, direction: str = "counterclockwise", **kwargs): + self.threshold = threshold + + if direction == "counterclockwise": + self.flag = cv2.ROTATE_90_COUNTERCLOCKWISE + elif direction == "clockwise": + self.flag = cv2.ROTATE_90_CLOCKWISE + else: + raise ValueError("Unsupported direction") + + def __call__(self, data): + img = data["image"] + + h, w, _ = img.shape + if h / w > self.threshold: + img = cv2.rotate(img, self.flag) + + data["image"] = img + return data + + class ClsLabelEncode(object): def __init__(self, label_list, **kwargs): self.label_list = label_list From 3c9ac0cb3344b351f59c14447ed301d0c8dec479 Mon Sep 17 00:00:00 2001 From: katekong Date: Tue, 20 Jun 2023 14:40:11 +0800 Subject: [PATCH 06/12] add docstrings --- configs/rec/crnn/crnn_resnet34.yaml | 12 +- configs/rec/crnn/crnn_resnet34_server.yaml | 10 +- mindocr/data/transforms/rec_transforms.py | 130 ++++++++++++--------- 3 files changed, 87 insertions(+), 65 deletions(-) diff --git a/configs/rec/crnn/crnn_resnet34.yaml b/configs/rec/crnn/crnn_resnet34.yaml index 1325467c1..0f9cace2d 100644 --- a/configs/rec/crnn/crnn_resnet34.yaml +++ b/configs/rec/crnn/crnn_resnet34.yaml @@ -73,14 +73,14 @@ train: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: path/to/data_lmdb_release/ # Optional, if set, dataset_root will be used as a prefix for data_dir + dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ # Optional, if set, dataset_root will be used as a prefix for data_dir data_dir: training/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 shuffle: True transform_pipeline: - DecodeImage: - img_mode: BGR + img_mode: RGB to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len @@ -92,11 +92,7 @@ train: infer_mode: *infer_mode character_dict_path: *character_dict_path padding: False # aspect ratio will be preserved if true. - - NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. - bgr_to_rgb: True - is_hwc: True - mean : [127.0, 127.0, 127.0] - std : [127.0, 127.0, 127.0] + norm_before_pad: False - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] @@ -116,7 +112,7 @@ eval: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: path/to/data_lmdb_release/ + dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ data_dir: validation/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 diff --git a/configs/rec/crnn/crnn_resnet34_server.yaml b/configs/rec/crnn/crnn_resnet34_server.yaml index 5932284b9..47104a9b5 100644 --- a/configs/rec/crnn/crnn_resnet34_server.yaml +++ b/configs/rec/crnn/crnn_resnet34_server.yaml @@ -3,7 +3,7 @@ system: distribute: True amp_level: 'O3' seed: 42 - log_interval: 100 + log_interval: 1000 val_while_train: True drop_overflow_update: False @@ -14,7 +14,7 @@ common: infer_mode: &infer_mode False use_space_char: &use_space_char True lower: &lower False - batch_size: &batch_size 64 + batch_size: &batch_size 32 model: type: rec @@ -77,7 +77,7 @@ train: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: /path/to/data_lmdb_release/ + dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ data_dir: training/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 @@ -116,7 +116,7 @@ eval: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: /path/to/data_lmdb_release/ + dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ data_dir: validation/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 @@ -147,4 +147,4 @@ eval: batch_size: 64 drop_remainder: False max_rowsize: 12 - num_workers: 8 \ No newline at end of file + num_workers: 8 diff --git a/mindocr/data/transforms/rec_transforms.py b/mindocr/data/transforms/rec_transforms.py index 3fd0db2d0..241bd75a0 100644 --- a/mindocr/data/transforms/rec_transforms.py +++ b/mindocr/data/transforms/rec_transforms.py @@ -248,19 +248,25 @@ def str2idx(text: str, label_dict: Dict[str, int], max_text_len: int = 23, lower # TODO: reorganize the code for different resize transformation in rec task -def resize_norm_img(img, - image_shape, - padding=True, - norm_before_pad=False, - mean=[127.0, 127.0, 127.0], - std=[127.0, 127.0, 127.0], - interpolation=cv2.INTER_LINEAR): +def resize_norm_img( + img, + image_shape, + padding=True, + norm_before_pad=False, + mean=[127.0, 127.0, 127.0], + std=[127.0, 127.0, 127.0], + interpolation=cv2.INTER_LINEAR, +): """ resize image Args: img: shape (H, W, C) image_shape: image shape after resize, in (C, H, W) - padding: if Ture, resize while preserving the H/W ratio, then pad the blank. + padding (bool): if Ture, resize while preserving the H/W ratio, then pad the blank. + norm_before_pad (bool): if True, normalize the image array before padding. + mean: shape (3), mean value for normalization. + std: shape (3), std value for normalization. + interpolation: image interpolation mode. """ imgH, imgW = image_shape @@ -268,8 +274,7 @@ def resize_norm_img(img, w = img.shape[1] c = img.shape[2] if not padding: - resized_image = cv2.resize( - img, (imgW, imgH), interpolation=interpolation) + resized_image = cv2.resize(img, (imgW, imgH), interpolation=interpolation) resized_w = imgW else: ratio = w / float(h) @@ -298,19 +303,25 @@ def resize_norm_img(img, # TODO: check diff from resize_norm_img -def resize_norm_img_chinese(img, - image_shape, - norm_before_pad=False, - mean=[127.0, 127.0, 127.0], - std=[127.0, 127.0, 127.0], - interpolation=cv2.INTER_LINEAR): - ''' +def resize_norm_img_chinese( + img, + image_shape, + norm_before_pad=False, + mean=[127.0, 127.0, 127.0], + std=[127.0, 127.0, 127.0], + interpolation=cv2.INTER_LINEAR, +): + """ resize image with aspect-ratio keeping and padding Args: img: shape (H, W, C) image_shape: image shape after resize, in (C, H, W) + norm_before_pad (bool): if True, normalize the image array before padding. + mean: shape (3), mean value for normalization. + std: shape (3), std value for normalization. + interpolation: image interpolation mode. - ''' + """ imgH, imgW = image_shape # todo: change to 0 and modified image shape max_wh_ratio = imgW * 1.0 / imgH @@ -340,21 +351,32 @@ def resize_norm_img_chinese(img, class RecResizeNormImg(object): - ''' adopted from paddle + """adopted from paddle Resize and normalize image, and pad image if needed. Args: - norm_before_pad: If True, perform normalization before padding (by doing so, the padding values will beall zero. Good practice.). Otherwise, per Default: False - ''' - def __init__(self, - image_shape, - infer_mode=False, - character_dict_path=None, - padding=True, - norm_before_pad=False, - mean=[127.0, 127.0, 127.0], - std=[127.0, 127.0, 127.0], - **kwargs): + image_shape: image shape after resize, in (C, H, W) + padding (bool): if Ture, resize while preserving the H/W ratio, then pad the blank. + norm_before_pad (bool): if True, normalize the image array before padding. + mean: shape (3), mean value for normalization. + std: shape (3), std value for normalization. + interpolation: image interpolation mode. + norm_before_pad: If True, perform normalization before padding \ + (by doing so, the padding values will beall zero. Good practice.). \ + Otherwise, per Default: False + """ + + def __init__( + self, + image_shape, + infer_mode=False, + character_dict_path=None, + padding=True, + norm_before_pad=False, + mean=[127.0, 127.0, 127.0], + std=[127.0, 127.0, 127.0], + **kwargs, + ): self.image_shape = image_shape self.infer_mode = infer_mode self.character_dict_path = character_dict_path @@ -364,39 +386,43 @@ def __init__(self, self.std = np.array(std, dtype="float32") def __call__(self, data): - img = data['image'] + img = data["image"] if self.infer_mode and self.character_dict_path is not None: - norm_img, valid_ratio = resize_norm_img_chinese(img, - self.image_shape, - self.norm_before_pad, - self.mean, - self.std - ) + norm_img, valid_ratio = resize_norm_img_chinese( + img, self.image_shape, self.norm_before_pad, self.mean, self.std + ) else: - norm_img, valid_ratio = resize_norm_img(img, - self.image_shape, - self.padding, - self.norm_before_pad, - self.mean, - self.std, - ) - data['image'] = norm_img - data['valid_ratio'] = valid_ratio + norm_img, valid_ratio = resize_norm_img( + img, + self.image_shape, + self.padding, + self.norm_before_pad, + self.mean, + self.std, + ) + data["image"] = norm_img + data["valid_ratio"] = valid_ratio return data # TODO: remove infer_mode and character_dict_path if they are not necesary class RecResizeImg(RecResizeNormImg): - ''' + """ This is to make compatible with older version code that uses RecResizeImg, which is to be updated. TODO: replace RecResizeImg followed by NormlaizeImage in yaml files with RecResizeNormImg op. - ''' + """ + def __init__(self, image_shape, infer_mode=False, character_dict_path=None, padding=True, **kwargs): super.__init__( - image_shape, infer_mode, character_dict_path, padding, norm_befoer_pad=False, - mean=[0., 0., 0.], std=[1., 1., 1.], - ) + image_shape, + infer_mode, + character_dict_path, + padding, + norm_befoer_pad=False, + mean=[0.0, 0.0, 0.0], + std=[1.0, 1.0, 1.0], + ) class SVTRRecResizeImg(object): @@ -478,7 +504,7 @@ def __call__(self, data): # TODO: norm before padding - data['shape_list'] = [h, w, resize_h / h, resize_w / w] # TODO: reformat, currently align to det + data["shape_list"] = [h, w, resize_h / h, resize_w / w] # TODO: reformat, currently align to det if self.norm_before_pad: resized_img = self.norm(resized_img) From 57da9e2268ef253676d4040f6788500572431222 Mon Sep 17 00:00:00 2001 From: katekong Date: Tue, 20 Jun 2023 14:43:48 +0800 Subject: [PATCH 07/12] rebase some unnecessary changes --- configs/rec/crnn/crnn_resnet34.yaml | 12 ++++++++---- configs/rec/crnn/crnn_resnet34_server.yaml | 8 ++++---- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/configs/rec/crnn/crnn_resnet34.yaml b/configs/rec/crnn/crnn_resnet34.yaml index 0f9cace2d..1325467c1 100644 --- a/configs/rec/crnn/crnn_resnet34.yaml +++ b/configs/rec/crnn/crnn_resnet34.yaml @@ -73,14 +73,14 @@ train: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ # Optional, if set, dataset_root will be used as a prefix for data_dir + dataset_root: path/to/data_lmdb_release/ # Optional, if set, dataset_root will be used as a prefix for data_dir data_dir: training/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 shuffle: True transform_pipeline: - DecodeImage: - img_mode: RGB + img_mode: BGR to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len @@ -92,7 +92,11 @@ train: infer_mode: *infer_mode character_dict_path: *character_dict_path padding: False # aspect ratio will be preserved if true. - norm_before_pad: False + - NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. + bgr_to_rgb: True + is_hwc: True + mean : [127.0, 127.0, 127.0] + std : [127.0, 127.0, 127.0] - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] @@ -112,7 +116,7 @@ eval: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ + dataset_root: path/to/data_lmdb_release/ data_dir: validation/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 diff --git a/configs/rec/crnn/crnn_resnet34_server.yaml b/configs/rec/crnn/crnn_resnet34_server.yaml index 47104a9b5..ab2c0c6e3 100644 --- a/configs/rec/crnn/crnn_resnet34_server.yaml +++ b/configs/rec/crnn/crnn_resnet34_server.yaml @@ -3,7 +3,7 @@ system: distribute: True amp_level: 'O3' seed: 42 - log_interval: 1000 + log_interval: 100 val_while_train: True drop_overflow_update: False @@ -14,7 +14,7 @@ common: infer_mode: &infer_mode False use_space_char: &use_space_char True lower: &lower False - batch_size: &batch_size 32 + batch_size: &batch_size 64 model: type: rec @@ -77,7 +77,7 @@ train: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ + dataset_root: /path/to/data_lmdb_release/ data_dir: training/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 @@ -116,7 +116,7 @@ eval: dataset_sink_mode: False dataset: type: LMDBDataset - dataset_root: /home/konghuanqi/datasets/data_lmdb_release/ + dataset_root: /path/to/data_lmdb_release/ data_dir: validation/ # label_file: # not required when using LMDBDataset sample_ratio: 1.0 From 804af55c102f19a4b5fb8d90079cb9cc2b0e9a2e Mon Sep 17 00:00:00 2001 From: katekong Date: Tue, 20 Jun 2023 14:56:48 +0800 Subject: [PATCH 08/12] replace RecResizeImg followed by NormlaizeImage in yaml files with RecResizeNormImg op. --- configs/rec/crnn/crnn_icdar15.yaml | 8 ++------ configs/rec/crnn/crnn_resnet34.yaml | 10 +++------- configs/rec/crnn/crnn_resnet34_ch.yaml | 14 +++++--------- configs/rec/crnn/crnn_resnet34_server.yaml | 12 ++++++------ configs/rec/crnn/crnn_vgg7.yaml | 10 +++------- configs/rec/rare/rare_resnet34.yaml | 8 ++------ configs/rec/rare/rare_resnet34_ch.yaml | 12 ++++-------- mindocr/data/transforms/rec_transforms.py | 2 -- 8 files changed, 25 insertions(+), 51 deletions(-) diff --git a/configs/rec/crnn/crnn_icdar15.yaml b/configs/rec/crnn/crnn_icdar15.yaml index 18139f435..358a1f31b 100644 --- a/configs/rec/crnn/crnn_icdar15.yaml +++ b/configs/rec/crnn/crnn_icdar15.yaml @@ -96,16 +96,12 @@ train: character_dict_path: *character_dict_path use_space_char: *use_space_char lower: True - - RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. + - RecResizeNormImg: image_shape: [32, 100] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path padding: False # aspect ratio will be preserved if true. - - NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. - bgr_to_rgb: True - is_hwc: True - mean : [127.0, 127.0, 127.0] - std : [127.0, 127.0, 127.0] + norm_before_pad: False - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] diff --git a/configs/rec/crnn/crnn_resnet34.yaml b/configs/rec/crnn/crnn_resnet34.yaml index 1325467c1..bc37c7ea5 100644 --- a/configs/rec/crnn/crnn_resnet34.yaml +++ b/configs/rec/crnn/crnn_resnet34.yaml @@ -80,23 +80,19 @@ train: shuffle: True transform_pipeline: - DecodeImage: - img_mode: BGR + img_mode: RGB to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len character_dict_path: *character_dict_path use_space_char: *use_space_char lower: True - - RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. + - RecResizeNormImg: image_shape: [32, 100] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path padding: False # aspect ratio will be preserved if true. - - NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. - bgr_to_rgb: True - is_hwc: True - mean : [127.0, 127.0, 127.0] - std : [127.0, 127.0, 127.0] + norm_before_pad: False - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] diff --git a/configs/rec/crnn/crnn_resnet34_ch.yaml b/configs/rec/crnn/crnn_resnet34_ch.yaml index bf954cbae..6465a7cd4 100644 --- a/configs/rec/crnn/crnn_resnet34_ch.yaml +++ b/configs/rec/crnn/crnn_resnet34_ch.yaml @@ -84,7 +84,7 @@ train: max_text_len: *max_text_len transform_pipeline: - DecodeImage: - img_mode: BGR + img_mode: RGB to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len @@ -94,16 +94,12 @@ train: - Rotate90IfVertical: threshold: 2.0 direction: counterclockwise - - RecResizeImg: - image_shape: [32, 320] + - RecResizeNormImg: + image_shape: [32, 320] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path - padding: True - - NormalizeImage: - bgr_to_rgb: True - is_hwc: True - mean: [127.0, 127.0, 127.0] - std: [127.0, 127.0, 127.0] + padding: True # aspect ratio will be preserved if true. + norm_before_pad: False - ToCHWImage: output_columns: ["image", "text_seq"] net_input_column_index: [0] diff --git a/configs/rec/crnn/crnn_resnet34_server.yaml b/configs/rec/crnn/crnn_resnet34_server.yaml index ab2c0c6e3..7518981ea 100644 --- a/configs/rec/crnn/crnn_resnet34_server.yaml +++ b/configs/rec/crnn/crnn_resnet34_server.yaml @@ -84,7 +84,7 @@ train: shuffle: True transform_pipeline: - DecodeImage: - img_mode: RGB # changed + img_mode: RGB to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len @@ -95,8 +95,8 @@ train: image_shape: [32, 100] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path - padding: True # aspect ratio will be preserved if true. changed - norm_before_pad: True # changed + padding: True # aspect ratio will be preserved if true. + norm_before_pad: True - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] @@ -123,7 +123,7 @@ eval: shuffle: False transform_pipeline: - DecodeImage: - img_mode: RGB # changed + img_mode: RGB to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len @@ -134,8 +134,8 @@ eval: image_shape: [32, 100] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path - padding: True # aspect ratio will be preserved if true. changed - norm_before_pad: True # changed + padding: True # aspect ratio will be preserved if true. + norm_before_pad: True - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_padded', 'text_length'] # TODO return text string padding w/ fixed length, and a scaler to indicate the length diff --git a/configs/rec/crnn/crnn_vgg7.yaml b/configs/rec/crnn/crnn_vgg7.yaml index 5647e3421..a5a750463 100644 --- a/configs/rec/crnn/crnn_vgg7.yaml +++ b/configs/rec/crnn/crnn_vgg7.yaml @@ -81,23 +81,19 @@ train: shuffle: True transform_pipeline: - DecodeImage: - img_mode: BGR + img_mode: RGB to_float32: False - RecCTCLabelEncode: max_text_len: *max_text_len character_dict_path: *character_dict_path use_space_char: *use_space_char lower: True - - RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. + - RecResizeNormImg: image_shape: [32, 100] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path padding: False # aspect ratio will be preserved if true. - - NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. - bgr_to_rgb: True - is_hwc: True - mean : [127.0, 127.0, 127.0] - std : [127.0, 127.0, 127.0] + norm_before_pad: False - ToCHWImage: # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] diff --git a/configs/rec/rare/rare_resnet34.yaml b/configs/rec/rare/rare_resnet34.yaml index d910b7c21..85609b5ea 100644 --- a/configs/rec/rare/rare_resnet34.yaml +++ b/configs/rec/rare/rare_resnet34.yaml @@ -83,16 +83,12 @@ train: character_dict_path: *character_dict_path use_space_char: *use_space_char lower: True - - RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. + - RecResizeNormImg: image_shape: [32, 100] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path padding: False # aspect ratio will be preserved if true. - - NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. - bgr_to_rgb: True - is_hwc: True - mean: [127.0, 127.0, 127.0] - std: [127.0, 127.0, 127.0] + norm_before_pad: False - ToCHWImage: output_columns: ["image", "text_seq"] net_input_column_index: [0, 1] # input indices for network forward func in output_columns diff --git a/configs/rec/rare/rare_resnet34_ch.yaml b/configs/rec/rare/rare_resnet34_ch.yaml index 624c70b3d..5bd8fd705 100644 --- a/configs/rec/rare/rare_resnet34_ch.yaml +++ b/configs/rec/rare/rare_resnet34_ch.yaml @@ -93,16 +93,12 @@ train: - Rotate90IfVertical: threshold: 2.0 direction: counterclockwise - - RecResizeImg: - image_shape: [32, 320] + - RecResizeNormImg: + image_shape: [32, 320] # H, W infer_mode: *infer_mode character_dict_path: *character_dict_path - padding: True - - NormalizeImage: - bgr_to_rgb: True - is_hwc: True - mean: [127.0, 127.0, 127.0] - std: [127.0, 127.0, 127.0] + padding: True # aspect ratio will be preserved if true. + norm_before_pad: False - ToCHWImage: output_columns: ["image", "text_seq"] net_input_column_index: [0, 1] diff --git a/mindocr/data/transforms/rec_transforms.py b/mindocr/data/transforms/rec_transforms.py index 241bd75a0..778b212ba 100644 --- a/mindocr/data/transforms/rec_transforms.py +++ b/mindocr/data/transforms/rec_transforms.py @@ -409,8 +409,6 @@ def __call__(self, data): class RecResizeImg(RecResizeNormImg): """ This is to make compatible with older version code that uses RecResizeImg, which is to be updated. - - TODO: replace RecResizeImg followed by NormlaizeImage in yaml files with RecResizeNormImg op. """ def __init__(self, image_shape, infer_mode=False, character_dict_path=None, padding=True, **kwargs): From 81cd5ca6897ef918bf36983a949e0d0ac5a821c7 Mon Sep 17 00:00:00 2001 From: katekong Date: Wed, 21 Jun 2023 14:43:17 +0800 Subject: [PATCH 09/12] bugfix --- mindocr/data/transforms/rec_transforms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mindocr/data/transforms/rec_transforms.py b/mindocr/data/transforms/rec_transforms.py index 778b212ba..1b0a3621d 100644 --- a/mindocr/data/transforms/rec_transforms.py +++ b/mindocr/data/transforms/rec_transforms.py @@ -328,7 +328,7 @@ def resize_norm_img_chinese( h, w = img.shape[0], img.shape[1] c = img.shape[2] ratio = w * 1.0 / h - max_wh_ratio = min(max(max_wh_ratio, ratio), max_wh_ratio) + max_wh_ratio = max(max_wh_ratio, ratio) imgW = int(imgH * max_wh_ratio) if math.ceil(imgH * ratio) > imgW: resized_w = imgW From e1aca38a5f87a7f8504ee29de06964db340b37ea Mon Sep 17 00:00:00 2001 From: katekong Date: Mon, 26 Jun 2023 11:33:25 +0800 Subject: [PATCH 10/12] bugfix --- mindocr/data/transforms/rec_transforms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mindocr/data/transforms/rec_transforms.py b/mindocr/data/transforms/rec_transforms.py index 1b0a3621d..f3664e262 100644 --- a/mindocr/data/transforms/rec_transforms.py +++ b/mindocr/data/transforms/rec_transforms.py @@ -412,7 +412,7 @@ class RecResizeImg(RecResizeNormImg): """ def __init__(self, image_shape, infer_mode=False, character_dict_path=None, padding=True, **kwargs): - super.__init__( + super().__init__( image_shape, infer_mode, character_dict_path, From 4b0d8405f5a7cfdcbcab28bf8e34217da68d41e7 Mon Sep 17 00:00:00 2001 From: katekong Date: Wed, 28 Jun 2023 10:05:03 +0800 Subject: [PATCH 11/12] update readme --- configs/rec/crnn/README.md | 20 +++++++++++--------- configs/rec/crnn/README_CN.md | 20 +++++++++++--------- 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/configs/rec/crnn/README.md b/configs/rec/crnn/README.md index 950ade180..52c3a0298 100644 --- a/configs/rec/crnn/README.md +++ b/configs/rec/crnn/README.md @@ -39,19 +39,21 @@ According to our experiments, the training (following the steps in [Model Traini
-| **Model** | **Context** | **Backbone** | **Train Dataset** | **Model Params** | **Batch size per card** | **Graph train 8P (s/epoch)** | **Graph train 8P (ms/step)** | **Graph train 8P (FPS)** | **Avg Eval Accuracy** | **Recipe** | **Download** | -| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | -| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | -| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | +| **Model** | **Context** | **Backbone** | **Train Dataset** | **Num Classes** | **Model Params** | **Batch size per card** | **Graph train (s/epoch)** | **Graph train (ms/step)** | **Graph train (FPS)** | **Avg Eval Accuracy** | **Recipe** | **Download** | +| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | +| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 37 | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | +| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 37 | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | +| CRNN | D910x4-MS2.0-G | ResNet34_vd | MJ+ST | 96 | 24.51 M | 64 | 4292.18 | 76.08 | 3364.72 | 83.50% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_server.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c-55748731.mindir) |
- Detailed accuracy results for each benchmark dataset (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE):
- | **Model** | **Backbone** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | CRNN | VGG7 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | - | CRNN | ResNet34_vd | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% | + | **Model** | **Backbone** | **Num Classes** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** | + | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | + | CRNN | VGG7 | 37 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | + | CRNN | ResNet34_vd | 37 | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% | + | CRNN | ResNet34_vd | 96 | 94.65% | 94.70% | 94.28% | 93.20% | 72.5% | 63.94% | 87.63% | 86.09% | 74.42% | 73.61% | 83.50% |
### Inference Perf. @@ -70,7 +72,7 @@ The inference performance is tested on Mindspore Lite, please take a look at [Mi **Notes:** - Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x8-MS1.8-G is for training on 8 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.8. - To reproduce the result on other contexts, please ensure the global batch size is the same. -- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). +- The number of classes of the model is determined by the dictionary used for training. The default dictionary contains lowercase English characters from a to z and digits from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary). - The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section. - The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100). diff --git a/configs/rec/crnn/README_CN.md b/configs/rec/crnn/README_CN.md index 77f508807..f6b407f33 100644 --- a/configs/rec/crnn/README_CN.md +++ b/configs/rec/crnn/README_CN.md @@ -39,20 +39,22 @@ Table Format:
-| **模型** | **环境配置** | **骨干网络** | **训练集** | **参数量** | **单卡批量** | **图模式8卡训练 (s/epoch)** | **图模式8卡训练 (ms/step)** | **图模式8卡训练 (FPS)** | **平均评估精度** | **配置文件** | **模型权重下载** | -| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | -| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | -| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | +| **模型** | **环境配置** | **骨干网络** | **训练集** | **类别数** | **参数量** | **单卡批量** | **图模式8卡训练 (s/epoch)** | **图模式8卡训练 (ms/step)** | **图模式8卡训练 (FPS)** | **平均评估精度** | **配置文件** | **模型权重下载** | +| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :------: | +| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 37 |8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | +| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 37 | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) | +| CRNN | D910x4-MS2.0-G | ResNet34_vd | MJ+ST | 96 | 24.51 M | 64 | 4292.18 | 76.08 | 3364.72 | 83.50% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_server.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c-55748731.mindir) |
- 在各个基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的准确率:
- | **模型** | **骨干网络** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** | - | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | - | CRNN | VGG7 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | - | CRNN | ResNet34_vd | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% | + | **模型** | **骨干网络** | **类别数** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** | + | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | + | CRNN | VGG7 | 37 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% | + | CRNN | ResNet34_vd | 37 |94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% | + | CRNN | ResNet34_vd | 96 | 94.65% | 94.70% | 94.28% | 93.20% | 72.5% | 63.94% | 87.63% | 86.09% | 74.42% | 73.61% | 83.50% |
@@ -72,7 +74,7 @@ Table Format: **注意:** - 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x8-MS1.8-G 用于使用图形模式在8张昇腾910 NPU上依赖Mindspore1.8版本进行训练。 - 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。 -- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典) +- 模型的类别数由用于训练的字典决定。默认字典包含小写英文字符从a到z和数字从0到9,详细请看[4. 字符词典](#4-字符词典) - 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。 - CRNN_VGG7和CRNN_ResNet34_vd的MindIR导出时的输入Shape均为(1, 3, 32, 100)。 From 4f87096d00055f0d0b358800f3d044fa5b07eb7a Mon Sep 17 00:00:00 2001 From: katekong Date: Wed, 28 Jun 2023 10:08:56 +0800 Subject: [PATCH 12/12] minor fix --- configs/rec/crnn/README_CN.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configs/rec/crnn/README_CN.md b/configs/rec/crnn/README_CN.md index f6b407f33..be16c29f5 100644 --- a/configs/rec/crnn/README_CN.md +++ b/configs/rec/crnn/README_CN.md @@ -39,7 +39,7 @@ Table Format:
-| **模型** | **环境配置** | **骨干网络** | **训练集** | **类别数** | **参数量** | **单卡批量** | **图模式8卡训练 (s/epoch)** | **图模式8卡训练 (ms/step)** | **图模式8卡训练 (FPS)** | **平均评估精度** | **配置文件** | **模型权重下载** | +| **模型** | **环境配置** | **骨干网络** | **训练集** | **类别数** | **参数量** | **单卡批量** | **图模式训练 (s/epoch)** | **图模式训练 (ms/step)** | **图模式训练 (FPS)** | **平均评估精度** | **配置文件** | **模型权重下载** | | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :------: | | CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 37 |8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) | | CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 37 | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) |