Skip to content

Commit c570a7e

Browse files
committed
Release cityscapes model
1 parent 477b5ed commit c570a7e

11 files changed

+626
-13
lines changed

segmentation/README.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -114,17 +114,18 @@ Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mms
114114
<br>
115115
<div>
116116

117-
| method | backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download |
118-
| :---------: | :------------: | :--------: | :-----------: | :-----: | :---: | :-------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
119-
| UperNet | InternImage-T | 512x1024 | 82.58 / 83.40 | 59M | 1889G | [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json) |
120-
| UperNet | InternImage-S | 512x1024 | 82.74 / 83.45 | 80M | 2035G | [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json) |
121-
| UperNet | InternImage-B | 512x1024 | 83.18 / 83.97 | 128M | 2369G | [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json) |
122-
| UperNet | InternImage-L | 512x1024 | 83.68 / 84.41 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json) |
123-
| UperNet\* | InternImage-L | 512x1024 | 85.94 / 86.22 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
124-
| UperNet | InternImage-XL | 512x1024 | 83.62 / 84.28 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json) |
125-
| UperNet\* | InternImage-XL | 512x1024 | 86.20 / 86.42 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
126-
| SegFormer\* | InternImage-L | 512x1024 | 85.16 / 85.67 | 220M | 1580G | [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
127-
| SegFormer\* | InternImage-XL | 512x1024 | 85.41 / 85.93 | 330M | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
117+
| method | backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download |
118+
| :---------: | :------------: | :--------: | :-----------: | :-----: | :---: | :-----------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
119+
| UperNet | InternImage-T | 512x1024 | 82.58 / 83.40 | 59M | 1889G | [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json) |
120+
| UperNet | InternImage-S | 512x1024 | 82.74 / 83.45 | 80M | 2035G | [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json) |
121+
| UperNet | InternImage-B | 512x1024 | 83.18 / 83.97 | 128M | 2369G | [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json) |
122+
| UperNet | InternImage-L | 512x1024 | 83.68 / 84.41 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json) |
123+
| UperNet\* | InternImage-L | 512x1024 | 85.94 / 86.22 | 256M | 3234G | [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
124+
| UperNet | InternImage-XL | 512x1024 | 83.62 / 84.28 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json) |
125+
| UperNet\* | InternImage-XL | 512x1024 | 86.20 / 86.42 | 368M | 4022G | [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
126+
| SegFormer\* | InternImage-L | 512x1024 | 85.16 / 85.67 | 220M | 1580G | [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
127+
| SegFormer\* | InternImage-XL | 512x1024 | 85.41 / 85.93 | 330M | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
128+
| Mask2Former | InternImage-H | 1024x1024 | 86.37 / 86.96 | 1094M | 7878G | [config](./configs/cityscapes/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) |
128129

129130
\* denotes the model is trained using extra Mapillary dataset.
130131

@@ -145,6 +146,19 @@ Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mms
145146

146147
</details>
147148

149+
<details>
150+
<summary> Dataset: COCO-Stuff-10K </summary>
151+
<br>
152+
<div>
153+
154+
| method | backbone | resolution | mIoU (ss) | #params | FLOPs | Config | Download |
155+
| :---------: | :-----------: | :--------: | :-------: | :-----: | :---: | :------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
156+
| Mask2Former | InternImage-H | 896x896 | 52.6 | 1.31B | 4635G | [config](./configs/coco_stuff10k/mask2former_internimage_h_896_80k_cocostuff10k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff10k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff10k.log.json) |
157+
158+
</div>
159+
160+
</details>
161+
148162
## Evaluation
149163

150164
To evaluate our `InternImage` on ADE20K val, run:
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
_base_ = './cityscapes_extra.py'
2+
img_norm_cfg = dict(
3+
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
4+
crop_size = (1024, 1024)
5+
train_pipeline = [
6+
dict(type='LoadImageFromFile'),
7+
dict(type='LoadAnnotations'),
8+
dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
9+
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
10+
dict(type='RandomFlip', prob=0.5),
11+
dict(type='PhotoMetricDistortion'),
12+
dict(type='Normalize', **img_norm_cfg),
13+
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
14+
dict(type='DefaultFormatBundle'),
15+
dict(type='Collect', keys=['img', 'gt_semantic_seg']),
16+
]
17+
test_pipeline = [
18+
dict(type='LoadImageFromFile'),
19+
dict(
20+
type='MultiScaleFlipAug',
21+
img_scale=(2048, 1024),
22+
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
23+
flip=False,
24+
transforms=[
25+
dict(type='Resize', keep_ratio=True),
26+
dict(type='RandomFlip'),
27+
dict(type='Normalize', **img_norm_cfg),
28+
dict(type='ImageToTensor', keys=['img']),
29+
dict(type='Collect', keys=['img']),
30+
])
31+
]
32+
data = dict(
33+
train=dict(pipeline=train_pipeline),
34+
val=dict(pipeline=test_pipeline),
35+
test=dict(pipeline=test_pipeline))
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# dataset settings
2+
dataset_type = 'MapillaryDataset'
3+
data_root = 'data/Mapillary/'
4+
img_norm_cfg = dict(
5+
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6+
crop_size = (896, 896)
7+
train_pipeline = [
8+
dict(type='LoadImageFromFile'),
9+
dict(type='LoadAnnotations'),
10+
dict(type='MapillaryHack'),
11+
dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 1.0)),
12+
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
13+
dict(type='RandomFlip', prob=0.5),
14+
dict(type='PhotoMetricDistortion'),
15+
dict(type='Normalize', **img_norm_cfg),
16+
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
17+
dict(type='DefaultFormatBundle'),
18+
dict(type='Collect', keys=['img', 'gt_semantic_seg']),
19+
]
20+
test_pipeline = [
21+
dict(type='LoadImageFromFile'),
22+
dict(
23+
type='MultiScaleFlipAug',
24+
img_scale=(2048, 1024),
25+
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
26+
flip=False,
27+
transforms=[
28+
dict(type='Resize', keep_ratio=True),
29+
dict(type='RandomFlip'),
30+
dict(type='Normalize', **img_norm_cfg),
31+
dict(type='ImageToTensor', keys=['img']),
32+
dict(type='Collect', keys=['img']),
33+
])
34+
]
35+
data = dict(
36+
samples_per_gpu=2,
37+
workers_per_gpu=2,
38+
train=dict(
39+
type=dataset_type,
40+
data_root='data/Mapillary/',
41+
img_dir=['training/images', 'validation/images'],
42+
ann_dir=['training/labels', 'validation/labels'],
43+
pipeline=train_pipeline),
44+
val=dict(
45+
type='CityscapesDataset',
46+
data_root='data/cityscapes/',
47+
img_dir='leftImg8bit/val',
48+
ann_dir='gtFine/val',
49+
pipeline=test_pipeline),
50+
test=dict(
51+
type='CityscapesDataset',
52+
data_root='data/cityscapes/',
53+
img_dir='leftImg8bit/val',
54+
ann_dir='gtFine/val',
55+
pipeline=test_pipeline))

segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
offset_scale=1.0,
2424
post_norm=True,
2525
with_cp=False,
26+
out_indices=(0, 1, 2, 3),
2627
init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
2728
decode_head=dict(num_classes=150, in_channels=[80, 160, 320, 640]),
2829
auxiliary_head=dict(num_classes=150, in_channels=320),

segmentation/configs/cityscapes/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,11 @@ Mapillary 80k + Cityscapes (w/ coarse data) 160k
3636
| :------------: | :--------: | :-----------: | :----------: | :--------: | :-----: | :---: | :------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
3737
| InternImage-L | 512x1024 | 85.16 / 85.67 | 0.37s / iter | 17h | 220M | 1580G | [config](./segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json) |
3838
| InternImage-XL | 512x1024 | 85.41 / 85.93 | 0.43s / iter | 19.5h | 330M | 2364G | [config](./segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
39+
40+
### Mask2Former + InternImage (with additional data)
41+
42+
Mapillary 80k + Cityscapes (w/ coarse data) 80k
43+
44+
| backbone | resolution | mIoU (ss/ms) | #params | FLOPs | Config | Download |
45+
| :-----------: | :--------: | :-----------: | :-----: | :---: | :----------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
46+
| InternImage-H | 1024x1024 | 86.37 / 86.96 | 1094M | 7878G | [config](./mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) |

0 commit comments

Comments
 (0)