OpenGVLab
diff --git a/‎segmentation/README.md‎
Lines changed: 25 additions & 11 deletions b/‎segmentation/README.md‎
Lines changed: 25 additions & 11 deletions
diff --git a/‎segmentation/configs/_base_/datasets/cityscapes_extra_1024x1024.py‎
Lines changed: 35 additions & 0 deletions b/‎segmentation/configs/_base_/datasets/cityscapes_extra_1024x1024.py‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎segmentation/configs/_base_/datasets/mapillary_896x896.py‎
Lines changed: 55 additions & 0 deletions b/‎segmentation/configs/_base_/datasets/mapillary_896x896.py‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py‎
Lines changed: 1 addition & 0 deletions b/‎segmentation/configs/ade20k/upernet_internimage_s_512_160k_ade20k.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎segmentation/configs/cityscapes/README.md‎
Lines changed: 8 additions & 0 deletions b/‎segmentation/configs/cityscapes/README.md‎
Lines changed: 8 additions & 0 deletions
@@ -114,17 +114,18 @@ Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mms
 <br>
 <div>
 
-|   method    |    backbone    | resolution | mIoU (ss/ms)  | #params | FLOPs |                                            Config                                             |                                                                                                                                Download                                                                                                                                |
-| :---------: | :------------: | :--------: | :-----------: | :-----: | :---: | :-------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-|   UperNet   | InternImage-T  |  512x1024  | 82.58 / 83.40 |   59M   | 1889G |       [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json)              |
-|   UperNet   | InternImage-S  |  512x1024  | 82.74 / 83.45 |   80M   | 2035G |       [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json)              |
-|   UperNet   | InternImage-B  |  512x1024  | 83.18 / 83.97 |  128M   | 2369G |       [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json)              |
-|   UperNet   | InternImage-L  |  512x1024  | 83.68 / 84.41 |  256M   | 3234G |       [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py)        |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json)              |
-|  UperNet\*  | InternImage-L  |  512x1024  | 85.94 / 86.22 |  256M   | 3234G |  [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py)   |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth)  \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)    |
-|   UperNet   | InternImage-XL |  512x1024  | 83.62 / 84.28 |  368M   | 4022G |       [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py)       |             [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json)             |
-|  UperNet\*  | InternImage-XL |  512x1024  | 86.20 / 86.42 |  368M   | 4022G |  [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py)  |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json)   |
-| SegFormer\* | InternImage-L  |  512x1024  | 85.16 / 85.67 |  220M   | 1580G | [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)  |
-| SegFormer\* | InternImage-XL |  512x1024  | 85.41 / 85.93 |  330M   | 2364G | [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
+|   method    |    backbone    | resolution | mIoU (ss/ms)  | #params | FLOPs |                                              Config                                               |                                                                                                                                 Download                                                                                                                                 |
+| :---------: | :------------: | :--------: | :-----------: | :-----: | :---: | :-----------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|   UperNet   | InternImage-T  |  512x1024  | 82.58 / 83.40 |   59M   | 1889G |         [config](./configs/cityscapes/upernet_internimage_t_512x1024_160k_cityscapes.py)          |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_t_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_t_512x1024_160k_cityscapes.log.json)               |
+|   UperNet   | InternImage-S  |  512x1024  | 82.74 / 83.45 |   80M   | 2035G |         [config](./configs/cityscapes/upernet_internimage_s_512x1024_160k_cityscapes.py)          |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_s_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_s_512x1024_160k_cityscapes.log.json)               |
+|   UperNet   | InternImage-B  |  512x1024  | 83.18 / 83.97 |  128M   | 2369G |         [config](./configs/cityscapes/upernet_internimage_b_512x1024_160k_cityscapes.py)          |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_b_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_b_512x1024_160k_cityscapes.log.json)               |
+|   UperNet   | InternImage-L  |  512x1024  | 83.68 / 84.41 |  256M   | 3234G |         [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_cityscapes.py)          |               [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_cityscapes.log.json)               |
+|  UperNet\*  | InternImage-L  |  512x1024  | 85.94 / 86.22 |  256M   | 3234G |    [config](./configs/cityscapes/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.py)     |    [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.pth)  \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)     |
+|   UperNet   | InternImage-XL |  512x1024  | 83.62 / 84.28 |  368M   | 4022G |         [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_cityscapes.py)         |              [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_cityscapes.log.json)              |
+|  UperNet\*  | InternImage-XL |  512x1024  | 86.20 / 86.42 |  368M   | 4022G |    [config](./configs/cityscapes/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.py)    |    [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/upernet_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json)    |
+| SegFormer\* | InternImage-L  |  512x1024  | 85.16 / 85.67 |  220M   | 1580G |   [config](./configs/cityscapes/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py)    |   [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)   |
+| SegFormer\* | InternImage-XL |  512x1024  | 85.41 / 85.93 |  330M   | 2364G |   [config](./configs/cityscapes/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py)   |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json)  |
+| Mask2Former | InternImage-H  | 1024x1024  | 86.37 / 86.96 |  1094M  | 7878G | [config](./configs/cityscapes/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) |
 
 \* denotes the model is trained using extra Mapillary dataset.
 
@@ -145,6 +146,19 @@ Prepare datasets according to the [guidelines](https://github.com/open-mmlab/mms
 
 </details>
 
+<details>
+<summary> Dataset: COCO-Stuff-10K </summary>
+<br>
+<div>
+
+|   method    |   backbone    | resolution | mIoU (ss) | #params | FLOPs |                                         Config                                         |                                                                                                                   Download                                                                                                                   |
+| :---------: | :-----------: | :--------: | :-------: | :-----: | :---: | :------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| Mask2Former | InternImage-H |  896x896   |   52.6    |  1.31B  | 4635G | [config](./configs/coco_stuff10k/mask2former_internimage_h_896_80k_cocostuff10k_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_896_80k_cocostuff10k.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_896_80k_cocostuff10k.log.json) |
+
+</div>
+
+</details>
+
 ## Evaluation
 
 To evaluate our `InternImage` on ADE20K val, run:
 
@@ -0,0 +1,35 @@
+_base_ = './cityscapes_extra.py'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+crop_size = (1024, 1024)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(2048, 1024),
+        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    train=dict(pipeline=train_pipeline),
+    val=dict(pipeline=test_pipeline),
+    test=dict(pipeline=test_pipeline))
@@ -0,0 +1,55 @@
+# dataset settings
+dataset_type = 'MapillaryDataset'
+data_root = 'data/Mapillary/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+crop_size = (896, 896)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='MapillaryHack'),
+    dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 1.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(2048, 1024),
+        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        data_root='data/Mapillary/',
+        img_dir=['training/images', 'validation/images'],
+        ann_dir=['training/labels', 'validation/labels'],
+        pipeline=train_pipeline),
+    val=dict(
+        type='CityscapesDataset',
+        data_root='data/cityscapes/',
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline),
+    test=dict(
+        type='CityscapesDataset',
+        data_root='data/cityscapes/',
+        img_dir='leftImg8bit/val',
+        ann_dir='gtFine/val',
+        pipeline=test_pipeline))
@@ -23,6 +23,7 @@
         offset_scale=1.0,
         post_norm=True,
         with_cp=False,
+        out_indices=(0, 1, 2, 3),
         init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
     decode_head=dict(num_classes=150, in_channels=[80, 160, 320, 640]),
     auxiliary_head=dict(num_classes=150, in_channels=320),
 
@@ -36,3 +36,11 @@ Mapillary 80k + Cityscapes (w/ coarse data) 160k
 | :------------: | :--------: | :-----------: | :----------: | :--------: | :-----: | :---: | :------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | InternImage-L  |  512x1024  | 85.16 / 85.67 | 0.37s / iter |    17h     |  220M   | 1580G | [config](./segformer_internimage_l_512x1024_160k_mapillary2cityscapes.py)  |  [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_l_512x1024_160k_mapillary2cityscapes.log.json)  |
 | InternImage-XL |  512x1024  | 85.41 / 85.93 | 0.43s / iter |   19.5h    |  330M   | 2364G | [config](./segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/segformer_internimage_xl_512x1024_160k_mapillary2cityscapes.log.json) |
+
+### Mask2Former + InternImage (with additional data)
+
+Mapillary 80k + Cityscapes (w/ coarse data) 80k
+
+|   backbone    | resolution | mIoU (ss/ms)  | #params | FLOPs |                                     Config                                     |                                                                                                                                 Download                                                                                                                                 |
+| :-----------: | :--------: | :-----------: | :-----: | :---: | :----------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| InternImage-H | 1024x1024  | 86.37 / 86.96 |  1094M  | 7878G | [config](./mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes_ss.py) | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.pth) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/mask2former_internimage_h_1024x1024_80k_mapillary2cityscapes.log.json) |