Skip to content

Commit 477b5ed

Browse files
committed
Update README.md
Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md
1 parent f37f9c2 commit 477b5ed

File tree

7 files changed

+344
-377
lines changed

7 files changed

+344
-377
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
21
.idea/
32
.DS_Store
43
__pycache__/
54
classification/convertor/
65
segmentation/convertor/
76
checkpoint_dir/
87
demo/
8+
pretrained/

README.md

Lines changed: 62 additions & 134 deletions
Large diffs are not rendered by default.

README_CN.md

Lines changed: 78 additions & 167 deletions
Large diffs are not rendered by default.

classification/README.md

Lines changed: 100 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This folder contains the implementation of the InternImage for image classificat
1010
- [Evaluation](#evaluation)
1111
- [Training from Scratch on ImageNet-1K](#training-from-scratch-on-imagenet-1k)
1212
- [Manage Jobs with Slurm](#manage-jobs-with-slurm)
13-
- [Training with Deepspeed](#training-with-deepspeed)
13+
- [Training with DeepSpeed](#training-with-deepspeed)
1414
- [Extracting Intermediate Features](#extracting-intermediate-features)
1515
- [Export](#export)
1616

@@ -47,6 +47,7 @@ pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.p
4747
```bash
4848
pip install -U openmim
4949
mim install mmcv-full==1.5.0
50+
mim install mmsegmentation==0.27.0
5051
pip install timm==0.6.11 mmdet==2.28.1
5152
```
5253

@@ -59,7 +60,7 @@ pip install numpy==1.26.4
5960
pip install pydantic==1.10.13
6061
```
6162

62-
- Compiling CUDA operators
63+
- Compile CUDA operators
6364

6465
Before compiling, please use the `nvcc -V` command to check whether your `nvcc` version matches the CUDA version of PyTorch.
6566

@@ -79,8 +80,9 @@ We provide the following ways to prepare data:
7980

8081
<details open>
8182
<summary>Standard ImageNet-1K</summary>
83+
<br>
8284

83-
We use standard ImageNet dataset, you can download it from http://image-net.org/.
85+
- We use standard ImageNet dataset, you can download it from http://image-net.org/.
8486

8587
- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
8688

@@ -195,12 +197,12 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
195197
<br>
196198
<div>
197199

198-
| name | pretrain | pre-training resolution | #param | download |
199-
| :------------: | :----------: | :----------------------: | :----: | :---------------------------------------------------------------------------------------------------: |
200-
| InternImage-L | ImageNet-22K | 384x384 | 223M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth) |
201-
| InternImage-XL | ImageNet-22K | 384x384 | 335M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth) |
202-
| InternImage-H | Joint 427M | 384x384 | 1.08B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth) |
203-
| InternImage-G | - | 384x384 | 3B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
200+
| name | pretrain | resolution | #param | download |
201+
| :------------: | :----------: | :--------: | :----: | :---------------------------------------------------------------------------------------------------: |
202+
| InternImage-L | ImageNet-22K | 384x384 | 223M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth) |
203+
| InternImage-XL | ImageNet-22K | 384x384 | 335M | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth) |
204+
| InternImage-H | Joint 427M | 384x384 | 1.08B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth) |
205+
| InternImage-G | Joint 427M | 384x384 | 3B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth) |
204206

205207
</div>
206208

@@ -211,15 +213,15 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
211213
<br>
212214
<div>
213215

214-
| name | pretrain | resolution | acc@1 | #param | FLOPs | download |
215-
| :------------: | :----------: | :--------: | :---: | :----: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: |
216-
| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml) |
217-
| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml) |
218-
| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml) |
219-
| InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml) |
220-
| InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
221-
| InternImage-H | Joint 427M | 640x640 | 89.6 | 1.08B | 1478G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml) |
222-
| InternImage-G | - | 512x512 | 90.1 | 3B | 2700G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml) |
216+
| name | pretrain | resolution | acc@1 | #param | FLOPs | download |
217+
| :------------: | :----------: | :--------: | :---: | :----: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
218+
| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_t_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_t_1k_224.log) |
219+
| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_s_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_s_1k_224.log) |
220+
| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth) \| [cfg](configs/without_lr_decay/internimage_b_1k_224.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_b_1k_224.log) |
221+
| InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_l_22kto1k_384.yaml) |
222+
| InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth) \| [cfg](configs/without_lr_decay/internimage_xl_22kto1k_384.yaml) |
223+
| InternImage-H | Joint 427M | 640x640 | 89.6 | 1.08B | 1478G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth) \| [cfg](configs/without_lr_decay/internimage_h_22kto1k_640.yaml) |
224+
| InternImage-G | Joint 427M | 512x512 | 90.1 | 3B | 2700G | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth) \| [cfg](configs/without_lr_decay/internimage_g_22kto1k_512.yaml) |
223225

224226
</div>
225227

@@ -230,9 +232,9 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
230232
<br>
231233
<div>
232234

233-
| name | pretrain | resolution | acc@1 | #param | download |
234-
| :-----------: | :--------: | :--------: | :---: | :----: | :-----------------------------------------------------------------------------: |
235-
| InternImage-H | Joint 427M | 384x384 | 92.6 | 1.1B | [ckpt](<>) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) |
235+
| name | pretrain | resolution | acc@1 | #param | download |
236+
| :-----------: | :--------: | :--------: | :---: | :----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
237+
| InternImage-H | Joint 427M | 384x384 | 92.6 | 1.1B | [ckpt](https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22ktoinat18_384.pth) \| [cfg](configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml) \| [log](https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_h_22ktoinat18_384.log) |
236238

237239
</div>
238240

@@ -267,56 +269,104 @@ python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --maste
267269

268270
## Manage Jobs with Slurm
269271

270-
For example, to train or evaluate `InternImage` with 8 GPU on a single node, run:
272+
For example, to train or evaluate `InternImage` with slurm cluster, run:
271273

272-
`InternImage-T`:
274+
<details open>
275+
<summary> InternImage-T (IN-1K) </summary>
276+
<br>
273277

274278
```bash
275-
# Train for 300 epochs
276-
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml
277-
# Evaluate on ImageNet-1K
279+
# Train for 300 epochs with 8 GPUs
280+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 512
281+
# Train for 300 epochs with 32 GPUs
282+
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --batch-size 128
283+
# Evaluate on ImageNet-1K with 8 GPUs
278284
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_t_1k_224.yaml --resume pretrained/internimage_t_1k_224.pth --eval
279285
```
280286

281-
`InternImage-S`:
287+
</details>
288+
289+
<details>
290+
<summary> InternImage-S (IN-1K) </summary>
291+
<br>
282292

283293
```bash
284-
# Train for 300 epochs
285-
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml
286-
# Evaluate on ImageNet-1K
294+
# Train for 300 epochs with 8 GPUs
295+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 512
296+
# Train for 300 epochs with 32 GPUs
297+
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --batch-size 128
298+
# Evaluate on ImageNet-1K with 8 GPUs
287299
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_s_1k_224.yaml --resume pretrained/internimage_s_1k_224.pth --eval
288300
```
289301

290-
`InternImage-XL`:
302+
</details>
303+
304+
<details>
305+
<summary> InternImage-B (IN-1K) </summary>
306+
<br>
307+
308+
```bash
309+
# Train for 300 epochs with 8 GPUs
310+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 512
311+
# Train for 300 epochs with 32 GPUs
312+
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --batch-size 128
313+
# Evaluate on ImageNet-1K with 8 GPUs
314+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_b_1k_224.yaml --resume pretrained/internimage_b_1k_224.pth --eval
315+
```
316+
317+
</details>
318+
319+
<details>
320+
<summary> InternImage-L (IN-22K to IN-1K) </summary>
321+
<br>
291322

292323
```bash
293-
# Train for 300 epochs
294-
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml
295-
# Evaluate on ImageNet-1K
324+
# Train for 20 epochs with 32 GPUs
325+
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --batch-size 16
326+
# Evaluate on ImageNet-1K with 8 GPUs
327+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_l_22kto1k_384.yaml --resume pretrained/internimage_l_22kto1k_384.pth --eval
328+
```
329+
330+
</details>
331+
332+
<details>
333+
<summary> InternImage-XL (IN-22K to IN-1K) </summary>
334+
<br>
335+
336+
```bash
337+
# Train for 20 epochs with 32 GPUs
338+
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --batch-size 16
339+
# Evaluate on ImageNet-1K with 8 GPUs
296340
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_xl_22kto1k_384.yaml --resume pretrained/internimage_xl_22kto1k_384.pth --eval
297341
```
298342

299-
<!--
300-
### Test pretrained model on ImageNet-22K
343+
</details>
301344

302-
For example, to evaluate the `InternImage-L-22k`:
345+
<details>
346+
<summary> InternImage-H (IN-22K to IN-1K) </summary>
347+
<br>
303348

304349
```bash
305-
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py \
306-
--cfg configs/internimage_xl_22k_192to384.yaml --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory>] \
307-
--resume internimage_xl_22k_192to384.pth --eval
308-
``` -->
350+
# Train for 20 epochs with 32 GPUs
351+
GPUS=32 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16
352+
# Evaluate on ImageNet-1K with 8 GPUs
353+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_h_22kto1k_640.yaml --resume pretrained/internimage_h_22kto1k_640.pth --eval
354+
```
309355

310-
<!-- ### Fine-tuning from a ImageNet-22K pretrained model
356+
</details>
311357

312-
For example, to fine-tune a `InternImage-XL-22k` model pretrained on ImageNet-22K:
358+
<details>
359+
<summary> InternImage-G (IN-22K to IN-1K) </summary>
360+
<br>
313361

314-
```bashs
315-
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_image_.yaml --pretrained intern_image_b.pth --eval
316-
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \
317-
--cfg configs/.yaml --pretrained swin_base_patch4_window7_224_22k.pth \
318-
--data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
319-
``` -->
362+
```bash
363+
# Train for 20 epochs with 64 GPUs
364+
GPUS=64 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --batch-size 8
365+
# Evaluate on ImageNet-1K with 8 GPUs
366+
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/internimage_g_22kto1k_512.yaml --resume pretrained/internimage_g_22kto1k_512.pth --eval
367+
```
368+
369+
</details>
320370

321371
## Training with DeepSpeed
322372

@@ -394,7 +444,7 @@ python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --
394444
Install `mmdeploy` at first:
395445

396446
```shell
397-
pip
447+
pip install mmdeploy==0.14.0
398448
```
399449

400450
To export `InternImage-T` from PyTorch to ONNX, run:

classification/configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,19 +28,15 @@ MODEL:
2828
PRETRAINED: 'pretrained/internimage_h_jointto22k_384.pth'
2929
TRAIN:
3030
EMA:
31-
ENABLE: true
31+
ENABLE: false
3232
DECAY: 0.9999
3333
EPOCHS: 100
3434
WARMUP_EPOCHS: 0
3535
WEIGHT_DECAY: 0.05
3636
BASE_LR: 2e-05 # 512
3737
WARMUP_LR: .0
3838
MIN_LR: .0
39-
LR_LAYER_DECAY: true
40-
LR_LAYER_DECAY_RATIO: 0.9
4139
USE_CHECKPOINT: true
4240
RAND_INIT_FT_HEAD: true
43-
OPTIMIZER:
44-
DCN_LR_MUL: 0.1
4541
AMP_OPT_LEVEL: O0
4642
EVAL_FREQ: 1

0 commit comments

Comments
 (0)