@@ -10,7 +10,7 @@ This folder contains the implementation of the InternImage for image classificat
1010- [ Evaluation] ( #evaluation )
1111- [ Training from Scratch on ImageNet-1K] ( #training-from-scratch-on-imagenet-1k )
1212- [ Manage Jobs with Slurm] ( #manage-jobs-with-slurm )
13- - [ Training with Deepspeed ] ( #training-with-deepspeed )
13+ - [ Training with DeepSpeed ] ( #training-with-deepspeed )
1414- [ Extracting Intermediate Features] ( #extracting-intermediate-features )
1515- [ Export] ( #export )
1616
@@ -47,6 +47,7 @@ pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.p
4747``` bash
4848pip install -U openmim
4949mim install mmcv-full==1.5.0
50+ mim install mmsegmentation==0.27.0
5051pip install timm==0.6.11 mmdet==2.28.1
5152```
5253
@@ -59,7 +60,7 @@ pip install numpy==1.26.4
5960pip install pydantic==1.10.13
6061```
6162
62- - Compiling CUDA operators
63+ - Compile CUDA operators
6364
6465Before compiling, please use the ` nvcc -V ` command to check whether your ` nvcc ` version matches the CUDA version of PyTorch.
6566
@@ -79,8 +80,9 @@ We provide the following ways to prepare data:
7980
8081<details open >
8182 <summary >Standard ImageNet-1K</summary >
83+ <br >
8284
83- We use standard ImageNet dataset, you can download it from http://image-net.org/ .
85+ - We use standard ImageNet dataset, you can download it from http://image-net.org/ .
8486
8587- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
8688
@@ -195,12 +197,12 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
195197<br >
196198<div >
197199
198- | name | pretrain | pre-training resolution | #param | download |
199- | :------------: | :----------: | :---------------------- : | :----: | :---------------------------------------------------------------------------------------------------: |
200- | InternImage-L | ImageNet-22K | 384x384 | 223M | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth ) |
201- | InternImage-XL | ImageNet-22K | 384x384 | 335M | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth ) |
202- | InternImage-H | Joint 427M | 384x384 | 1.08B | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth ) |
203- | InternImage-G | - | 384x384 | 3B | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth ) |
200+ | name | pretrain | resolution | #param | download |
201+ | :------------: | :----------: | :--------: | :----: | :---------------------------------------------------------------------------------------------------: |
202+ | InternImage-L | ImageNet-22K | 384x384 | 223M | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22k_192to384.pth ) |
203+ | InternImage-XL | ImageNet-22K | 384x384 | 335M | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22k_192to384.pth ) |
204+ | InternImage-H | Joint 427M | 384x384 | 1.08B | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_jointto22k_384.pth ) |
205+ | InternImage-G | Joint 427M | 384x384 | 3B | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_pretrainto22k_384.pth ) |
204206
205207</div >
206208
@@ -211,15 +213,15 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
211213<br >
212214<div >
213215
214- | name | pretrain | resolution | acc@1 | #param | FLOPs | download |
215- | :------------: | :----------: | :--------: | :---: | :----: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------: |
216- | InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_t_1k_224.yaml ) |
217- | InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_s_1k_224.yaml ) |
218- | InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_b_1k_224.yaml ) |
219- | InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_l_22kto1k_384.yaml ) |
220- | InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_xl_22kto1k_384.yaml ) |
221- | InternImage-H | Joint 427M | 640x640 | 89.6 | 1.08B | 1478G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_h_22kto1k_640.yaml ) |
222- | InternImage-G | - | 512x512 | 90.1 | 3B | 2700G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_g_22kto1k_512.yaml ) |
216+ | name | pretrain | resolution | acc@1 | #param | FLOPs | download |
217+ | :------------: | :----------: | :--------: | :---: | :----: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- : |
218+ | InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_t_1k_224.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_t_1k_224.yaml ) \| [ log ] ( https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_t_1k_224.log ) |
219+ | InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_s_1k_224.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_s_1k_224.yaml ) \| [ log ] ( https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_s_1k_224.log ) |
220+ | InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_b_1k_224.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_b_1k_224.yaml ) \| [ log ] ( https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_b_1k_224.log ) |
221+ | InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_l_22kto1k_384.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_l_22kto1k_384.yaml ) |
222+ | InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_xl_22kto1k_384.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_xl_22kto1k_384.yaml ) |
223+ | InternImage-H | Joint 427M | 640x640 | 89.6 | 1.08B | 1478G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22kto1k_640.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_h_22kto1k_640.yaml ) |
224+ | InternImage-G | Joint 427M | 512x512 | 90.1 | 3B | 2700G | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_g_22kto1k_512.pth ) \| [ cfg] ( configs/without_lr_decay/internimage_g_22kto1k_512.yaml ) |
223225
224226</div >
225227
@@ -230,9 +232,9 @@ We use standard ImageNet dataset, you can download it from http://image-net.org/
230232<br >
231233<div >
232234
233- | name | pretrain | resolution | acc@1 | #param | download |
234- | :-----------: | :--------: | :--------: | :---: | :----: | :-----------------------------------------------------------------------------: |
235- | InternImage-H | Joint 427M | 384x384 | 92.6 | 1.1B | [ ckpt] ( < > ) \| [ cfg] ( configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml ) |
235+ | name | pretrain | resolution | acc@1 | #param | download |
236+ | :-----------: | :--------: | :--------: | :---: | :----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ : |
237+ | InternImage-H | Joint 427M | 384x384 | 92.6 | 1.1B | [ ckpt] ( https://huggingface.co/OpenGVLab/InternImage/resolve/main/internimage_h_22ktoinat18_384.pth ) \| [ cfg] ( configs/inaturalist2018/internimage_h_22ktoinat18_384.yaml ) \| [ log ] ( https://huggingface.co/OpenGVLab/InternImage/raw/main/internimage_h_22ktoinat18_384.log ) |
236238
237239</div >
238240
@@ -267,56 +269,104 @@ python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --maste
267269
268270## Manage Jobs with Slurm
269271
270- For example, to train or evaluate ` InternImage ` with 8 GPU on a single node , run:
272+ For example, to train or evaluate ` InternImage ` with slurm cluster , run:
271273
272- ` InternImage-T ` :
274+ <details open >
275+ <summary > InternImage-T (IN-1K) </summary >
276+ <br >
273277
274278``` bash
275- # Train for 300 epochs
276- GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_t_1k_224.yaml
277- # Evaluate on ImageNet-1K
279+ # Train for 300 epochs with 8 GPUs
280+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_t_1k_224.yaml --batch-size 512
281+ # Train for 300 epochs with 32 GPUs
282+ GPUS=32 sh train_in1k.sh < partition> < job-name> configs/internimage_t_1k_224.yaml --batch-size 128
283+ # Evaluate on ImageNet-1K with 8 GPUs
278284GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_t_1k_224.yaml --resume pretrained/internimage_t_1k_224.pth --eval
279285```
280286
281- ` InternImage-S ` :
287+ </details >
288+
289+ <details >
290+ <summary > InternImage-S (IN-1K) </summary >
291+ <br >
282292
283293``` bash
284- # Train for 300 epochs
285- GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_s_1k_224.yaml
286- # Evaluate on ImageNet-1K
294+ # Train for 300 epochs with 8 GPUs
295+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_s_1k_224.yaml --batch-size 512
296+ # Train for 300 epochs with 32 GPUs
297+ GPUS=32 sh train_in1k.sh < partition> < job-name> configs/internimage_s_1k_224.yaml --batch-size 128
298+ # Evaluate on ImageNet-1K with 8 GPUs
287299GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_s_1k_224.yaml --resume pretrained/internimage_s_1k_224.pth --eval
288300```
289301
290- ` InternImage-XL ` :
302+ </details >
303+
304+ <details >
305+ <summary > InternImage-B (IN-1K) </summary >
306+ <br >
307+
308+ ``` bash
309+ # Train for 300 epochs with 8 GPUs
310+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_b_1k_224.yaml --batch-size 512
311+ # Train for 300 epochs with 32 GPUs
312+ GPUS=32 sh train_in1k.sh < partition> < job-name> configs/internimage_b_1k_224.yaml --batch-size 128
313+ # Evaluate on ImageNet-1K with 8 GPUs
314+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_b_1k_224.yaml --resume pretrained/internimage_b_1k_224.pth --eval
315+ ```
316+
317+ </details >
318+
319+ <details >
320+ <summary > InternImage-L (IN-22K to IN-1K) </summary >
321+ <br >
291322
292323``` bash
293- # Train for 300 epochs
294- GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_xl_22kto1k_384.yaml
295- # Evaluate on ImageNet-1K
324+ # Train for 20 epochs with 32 GPUs
325+ GPUS=32 sh train_in1k.sh < partition> < job-name> configs/internimage_l_22kto1k_384.yaml --batch-size 16
326+ # Evaluate on ImageNet-1K with 8 GPUs
327+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_l_22kto1k_384.yaml --resume pretrained/internimage_l_22kto1k_384.pth --eval
328+ ```
329+
330+ </details >
331+
332+ <details >
333+ <summary > InternImage-XL (IN-22K to IN-1K) </summary >
334+ <br >
335+
336+ ``` bash
337+ # Train for 20 epochs with 32 GPUs
338+ GPUS=32 sh train_in1k.sh < partition> < job-name> configs/internimage_xl_22kto1k_384.yaml --batch-size 16
339+ # Evaluate on ImageNet-1K with 8 GPUs
296340GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_xl_22kto1k_384.yaml --resume pretrained/internimage_xl_22kto1k_384.pth --eval
297341```
298342
299- <!--
300- ### Test pretrained model on ImageNet-22K
343+ </details >
301344
302- For example, to evaluate the `InternImage-L-22k`:
345+ <details >
346+ <summary > InternImage-H (IN-22K to IN-1K) </summary >
347+ <br >
303348
304349``` bash
305- python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py \
306- --cfg configs/internimage_xl_22k_192to384.yaml --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory>] \
307- --resume internimage_xl_22k_192to384.pth --eval
308- ``` -->
350+ # Train for 20 epochs with 32 GPUs
351+ GPUS=32 sh train_in1k.sh < partition> < job-name> configs/internimage_h_22kto1k_640.yaml --batch-size 16
352+ # Evaluate on ImageNet-1K with 8 GPUs
353+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_h_22kto1k_640.yaml --resume pretrained/internimage_h_22kto1k_640.pth --eval
354+ ```
309355
310- <!-- ### Fine-tuning from a ImageNet-22K pretrained model
356+ </ details >
311357
312- For example, to fine-tune a `InternImage-XL-22k` model pretrained on ImageNet-22K:
358+ <details >
359+ <summary > InternImage-G (IN-22K to IN-1K) </summary >
360+ <br >
313361
314- ```bashs
315- GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_image_.yaml --pretrained intern_image_b.pth --eval
316- python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \
317- --cfg configs/.yaml --pretrained swin_base_patch4_window7_224_22k.pth \
318- --data-path <imagenet-path> --batch-size 64 --accumulation-steps 2 [--use-checkpoint]
319- ``` -->
362+ ``` bash
363+ # Train for 20 epochs with 64 GPUs
364+ GPUS=64 sh train_in1k.sh < partition> < job-name> configs/internimage_g_22kto1k_512.yaml --batch-size 8
365+ # Evaluate on ImageNet-1K with 8 GPUs
366+ GPUS=8 sh train_in1k.sh < partition> < job-name> configs/internimage_g_22kto1k_512.yaml --resume pretrained/internimage_g_22kto1k_512.pth --eval
367+ ```
368+
369+ </details >
320370
321371## Training with DeepSpeed
322372
@@ -394,7 +444,7 @@ python extract_feature.py --cfg configs/internimage_t_1k_224.yaml --img b.png --
394444Install ` mmdeploy ` at first:
395445
396446``` shell
397- pip
447+ pip install mmdeploy==0.14.0
398448```
399449
400450To export ` InternImage-T ` from PyTorch to ONNX, run:
0 commit comments