|
| 1 | +--- |
| 2 | +layout: hub_detail |
| 3 | +background-class: hub-background |
| 4 | +body-class: hub |
| 5 | +title: ResNext WSL |
| 6 | +summary: ResNext models trained with billion scale weakly-supervised data. |
| 7 | +category: researchers |
| 8 | +image: wsl-image.png |
| 9 | +author: Facebook AI |
| 10 | +tags: [vision] |
| 11 | +github-link: https://github.com/facebookresearch/WSL-Images/blob/master/hubconf.py |
| 12 | +github-id: facebookresearch/WSL-Images |
| 13 | +featured_image_1: wsl-image.png |
| 14 | +featured_image_2: no-image |
| 15 | +accelerator: cuda-optional |
| 16 | +order: 10 |
| 17 | +demo-model-link: https://huggingface.co/spaces/pytorch/ResNext_WSL |
| 18 | +--- |
| 19 | + |
| 20 | +```python |
| 21 | +import torch |
| 22 | +model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl') |
| 23 | +# or |
| 24 | +# model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x16d_wsl') |
| 25 | +# or |
| 26 | +# model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl') |
| 27 | +# or |
| 28 | +#model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x48d_wsl') |
| 29 | +model.eval() |
| 30 | +``` |
| 31 | + |
| 32 | +All pre-trained models expect input images normalized in the same way, |
| 33 | +i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least `224`. |
| 34 | +The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]` |
| 35 | +and `std = [0.229, 0.224, 0.225]`. |
| 36 | + |
| 37 | +Here's a sample execution. |
| 38 | + |
| 39 | +```python |
| 40 | +# Download an example image from the pytorch website |
| 41 | +import urllib |
| 42 | +url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg") |
| 43 | +try: urllib.URLopener().retrieve(url, filename) |
| 44 | +except: urllib.request.urlretrieve(url, filename) |
| 45 | +``` |
| 46 | + |
| 47 | +```python |
| 48 | +# sample execution (requires torchvision) |
| 49 | +from PIL import Image |
| 50 | +from torchvision import transforms |
| 51 | +input_image = Image.open(filename) |
| 52 | +preprocess = transforms.Compose([ |
| 53 | + transforms.Resize(256), |
| 54 | + transforms.CenterCrop(224), |
| 55 | + transforms.ToTensor(), |
| 56 | + transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), |
| 57 | +]) |
| 58 | +input_tensor = preprocess(input_image) |
| 59 | +input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model |
| 60 | + |
| 61 | +# move the input and model to GPU for speed if available |
| 62 | +if torch.cuda.is_available(): |
| 63 | + input_batch = input_batch.to('cuda') |
| 64 | + model.to('cuda') |
| 65 | + |
| 66 | +with torch.no_grad(): |
| 67 | + output = model(input_batch) |
| 68 | +# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes |
| 69 | +print(output[0]) |
| 70 | +# The output has unnormalized scores. To get probabilities, you can run a softmax on it. |
| 71 | +print(torch.nn.functional.softmax(output[0], dim=0)) |
| 72 | + |
| 73 | +``` |
| 74 | + |
| 75 | +### Model Description |
| 76 | +The provided ResNeXt models are pre-trained in weakly-supervised fashion on **940 million** public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. Please refer to "Exploring the Limits of Weakly Supervised Pretraining" (https://arxiv.org/abs/1805.00932) presented at ECCV 2018 for the details of model training. |
| 77 | + |
| 78 | +We are providing 4 models with different capacities. |
| 79 | + |
| 80 | +| Model | #Parameters | FLOPS | Top-1 Acc. | Top-5 Acc. | |
| 81 | +| ------------------ | :---------: | :---: | :--------: | :--------: | |
| 82 | +| ResNeXt-101 32x8d | 88M | 16B | 82.2 | 96.4 | |
| 83 | +| ResNeXt-101 32x16d | 193M | 36B | 84.2 | 97.2 | |
| 84 | +| ResNeXt-101 32x32d | 466M | 87B | 85.1 | 97.5 | |
| 85 | +| ResNeXt-101 32x48d | 829M | 153B | 85.4 | 97.6 | |
| 86 | + |
| 87 | +Our models significantly improve the training accuracy on ImageNet compared to training from scratch. **We achieve state-of-the-art accuracy of 85.4% on ImageNet with our ResNext-101 32x48d model.** |
| 88 | + |
| 89 | +### References |
| 90 | + |
| 91 | + - [Exploring the Limits of Weakly Supervised Pretraining](https://arxiv.org/abs/1805.00932) |
0 commit comments