diff --git a/facebookresearch_semi-supervised-ImageNet1K-models_resnext.md b/facebookresearch_semi-supervised-ImageNet1K-models_resnext.md index e82de1cc..466481cc 100644 --- a/facebookresearch_semi-supervised-ImageNet1K-models_resnext.md +++ b/facebookresearch_semi-supervised-ImageNet1K-models_resnext.md @@ -85,7 +85,7 @@ This project includes the semi-supervised and semi-weakly supervised ImageNet mo "Semi-supervised" (SSL) ImageNet models are pre-trained on a subset of unlabeled YFCC100M public image dataset and fine-tuned with the ImageNet1K training dataset, as described by the semi-supervised training framework in the paper mentioned above. In this case, the high capacity teacher model was trained only with labeled examples. -"Semi-weakly" supervised (SWSL) ImageNet models are pre-trained on **940 million** public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. In this case, the associated hashtags are only used for building a better teacher model. During training the student model, those hashtags are ingored and the student model is pretrained with a subset of 64M images selected by the teacher model from the same 940 million public image dataset. +"Semi-weakly" supervised (SWSL) ImageNet models are pre-trained on **940 million** public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. In this case, the associated hashtags are only used for building a better teacher model. During training the student model, those hashtags are ignored and the student model is pretrained with a subset of 64M images selected by the teacher model from the same 940 million public image dataset. Semi-weakly supervised ResNet and ResNext models provided in the table below significantly improve the top-1 accuracy on the ImageNet validation set compared to training from scratch or other training mechanisms introduced in the literature as of September 2019. For example, **We achieve state-of-the-art accuracy of 81.2% on ImageNet for the widely used/adopted ResNet-50 model architecture**. diff --git a/nvidia_deeplearningexamples_efficientnet.md b/nvidia_deeplearningexamples_efficientnet.md index 0e75ff2e..223a4f9e 100644 --- a/nvidia_deeplearningexamples_efficientnet.md +++ b/nvidia_deeplearningexamples_efficientnet.md @@ -108,7 +108,7 @@ for uri, result in zip(uris, results): ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/efficientnet) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:efficientnet_for_pytorch) @@ -123,4 +123,4 @@ and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:efficientnet_for_py - [pretrained model on NGC (efficientnet-widese-b4)](https://ngc.nvidia.com/catalog/models/nvidia:efficientnet_widese_b4_pyt_amp) - [pretrained, quantized model on NGC (efficientnet-widese-b0)](https://ngc.nvidia.com/catalog/models/nvidia:efficientnet_widese_b0_pyt_amp) - [pretrained, quantized model on NGC (efficientnet-widese-b4)](https://ngc.nvidia.com/catalog/models/nvidia:efficientnet_widese_b4_pyt_amp) - \ No newline at end of file + diff --git a/nvidia_deeplearningexamples_fastpitch.md b/nvidia_deeplearningexamples_fastpitch.md index 271f1777..7d86ff3f 100644 --- a/nvidia_deeplearningexamples_fastpitch.md +++ b/nvidia_deeplearningexamples_fastpitch.md @@ -41,7 +41,7 @@ In the example below: - HiFiGAN generates sound given the mel spectrogram - the output sound is saved in an 'audio.wav' file -To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of FastPitch model, we download the CMU pronounciation dictionary. +To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of the FastPitch model, we download the CMU pronunciation dictionary. ```bash apt-get update apt-get install -y libsndfile1 wget @@ -99,7 +99,7 @@ Load text processor. tp = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_textprocessing_utils', cmudict_path="cmudict-0.7b", heteronyms_path="heteronyms") ``` -Set the text to be synthetized, prepare input and set additional generation parameters. +Set the text to be synthesized, prepare input and set additional generation parameters. ```python text = "Say this smoothly, to prove you are not a robot." ``` @@ -136,7 +136,7 @@ plt.ylabel('frequency') _=plt.title('Spectrogram') ``` -Syntesize audio. +Synthesize audio. ```python audio_numpy = audios[0].cpu().numpy() Audio(audio_numpy, rate=22050) @@ -149,7 +149,7 @@ write("audio.wav", vocoder_train_setup['sampling_rate'], audio_numpy) ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/fastpitch_pyt) +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/fastpitch_pyt) ### References diff --git a/nvidia_deeplearningexamples_gpunet.md b/nvidia_deeplearningexamples_gpunet.md index cc526df8..d2591dd8 100644 --- a/nvidia_deeplearningexamples_gpunet.md +++ b/nvidia_deeplearningexamples_gpunet.md @@ -122,7 +122,7 @@ for uri, result in zip(uris, results): ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/GPUNet) ### References diff --git a/nvidia_deeplearningexamples_hifigan.md b/nvidia_deeplearningexamples_hifigan.md index 9e5f669c..e925f876 100644 --- a/nvidia_deeplearningexamples_hifigan.md +++ b/nvidia_deeplearningexamples_hifigan.md @@ -34,7 +34,7 @@ In the example below: - HiFiGAN generates sound given the mel spectrogram - the output sound is saved in an 'audio.wav' file -To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of FastPitch model, we download the CMU pronounciation dictionary. +To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of the FastPitch model, we download the CMU pronunciation dictionary. ```bash pip install numpy scipy librosa unidecode inflect librosa matplotlib==3.6.3 apt-get update @@ -92,7 +92,7 @@ Load text processor. tp = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_textprocessing_utils', cmudict_path="cmudict-0.7b", heteronyms_path="heteronyms") ``` -Set the text to be synthetized, prepare input and set additional generation parameters. +Set the text to be synthesized, prepare input and set additional generation parameters. ```python text = "Say this smoothly, to prove you are not a robot." ``` @@ -129,7 +129,7 @@ plt.ylabel('frequency') _=plt.title('Spectrogram') ``` -Syntesize audio. +Synthesize audio. ```python audio_numpy = audios[0].cpu().numpy() Audio(audio_numpy, rate=22050) @@ -142,7 +142,7 @@ write("audio.wav", vocoder_train_setup['sampling_rate'], audio_numpy) ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/hifigan_pyt) +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/hifigan_pyt) ### References @@ -150,4 +150,4 @@ For detailed information on model input and output, training recipies, inference - [Original implementation](https://github.com/jik876/hifi-gan) - [FastPitch on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/fastpitch_pyt) - [HiFi-GAN on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/hifigan_pyt) - - [FastPitch and HiFi-GAN on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFi-GAN) \ No newline at end of file + - [FastPitch and HiFi-GAN on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFi-GAN) diff --git a/nvidia_deeplearningexamples_resnet50.md b/nvidia_deeplearningexamples_resnet50.md index 6dea0a3c..9b58df7b 100644 --- a/nvidia_deeplearningexamples_resnet50.md +++ b/nvidia_deeplearningexamples_resnet50.md @@ -105,7 +105,7 @@ for uri, result in zip(uris, results): ### Details -For detailed information on model input and output, training recipies, inference and performance visit: +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:resnet_50_v1_5_for_pytorch) diff --git a/nvidia_deeplearningexamples_resnext.md b/nvidia_deeplearningexamples_resnext.md index 5f3c07d1..a889e439 100644 --- a/nvidia_deeplearningexamples_resnext.md +++ b/nvidia_deeplearningexamples_resnext.md @@ -107,7 +107,7 @@ for uri, result in zip(uris, results): ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnext101-32x4d) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:resnext_for_pytorch) diff --git a/nvidia_deeplearningexamples_se-resnext.md b/nvidia_deeplearningexamples_se-resnext.md index 4737f88f..dd4fbe14 100644 --- a/nvidia_deeplearningexamples_se-resnext.md +++ b/nvidia_deeplearningexamples_se-resnext.md @@ -107,7 +107,7 @@ for uri, result in zip(uris, results): ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/se-resnext101-32x4d) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/se_resnext_for_pytorch). diff --git a/nvidia_deeplearningexamples_ssd.md b/nvidia_deeplearningexamples_ssd.md index 21aa5661..0b114921 100644 --- a/nvidia_deeplearningexamples_ssd.md +++ b/nvidia_deeplearningexamples_ssd.md @@ -123,7 +123,7 @@ plt.show() ### Details For detailed information on model input and output, -training recipies, inference and performance visit: +training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch) diff --git a/nvidia_deeplearningexamples_tacotron2.md b/nvidia_deeplearningexamples_tacotron2.md index 5dc15166..d2427d31 100644 --- a/nvidia_deeplearningexamples_tacotron2.md +++ b/nvidia_deeplearningexamples_tacotron2.md @@ -89,7 +89,7 @@ Audio(audio_numpy, rate=rate) ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) ### References diff --git a/nvidia_deeplearningexamples_waveglow.md b/nvidia_deeplearningexamples_waveglow.md index e899533c..0ebffcfc 100644 --- a/nvidia_deeplearningexamples_waveglow.md +++ b/nvidia_deeplearningexamples_waveglow.md @@ -91,7 +91,7 @@ Audio(audio_numpy, rate=rate) ``` ### Details -For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) +For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) ### References diff --git a/pytorch_vision_deeplabv3_resnet101.md b/pytorch_vision_deeplabv3_resnet101.md index ecc52607..ad0b00f2 100644 --- a/pytorch_vision_deeplabv3_resnet101.md +++ b/pytorch_vision_deeplabv3_resnet101.md @@ -74,7 +74,7 @@ To get the maximum prediction of each class, and then use it for a downstream ta Here's a small snippet that plots the predictions, with each color being assigned to each class (see the visualized image on the left). ```python -# create a color pallette, selecting a color for each class +# create a color palette, selecting a color for each class palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1]) colors = torch.as_tensor([i for i in range(21)])[:, None] * palette colors = (colors % 255).numpy().astype("uint8") diff --git a/pytorch_vision_fcn_resnet101.md b/pytorch_vision_fcn_resnet101.md index bb501f26..916e9083 100644 --- a/pytorch_vision_fcn_resnet101.md +++ b/pytorch_vision_fcn_resnet101.md @@ -31,7 +31,7 @@ The images have to be loaded in to a range of `[0, 1]` and then normalized using and `std = [0.229, 0.224, 0.225]`. The model returns an `OrderedDict` with two Tensors that are of the same height and width as the input Tensor, but with 21 classes. -`output['out']` contains the semantic masks, and `output['aux']` contains the auxillary loss values per-pixel. In inference mode, `output['aux']` is not useful. +`output['out']` contains the semantic masks, and `output['aux']` contains the auxiliary loss values per-pixel. In inference mode, `output['aux']` is not useful. So, `output['out']` is of shape `(N, 21, H, W)`. More documentation can be found [here](https://pytorch.org/vision/stable/models.html#object-detection-instance-segmentation-and-person-keypoint-detection). @@ -73,7 +73,7 @@ To get the maximum prediction of each class, and then use it for a downstream ta Here's a small snippet that plots the predictions, with each color being assigned to each class (see the visualized image on the left). ```python -# create a color pallette, selecting a color for each class +# create a color palette, selecting a color for each class palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1]) colors = torch.as_tensor([i for i in range(21)])[:, None] * palette colors = (colors % 255).numpy().astype("uint8") diff --git a/pytorch_vision_googlenet.md b/pytorch_vision_googlenet.md index d1e7b66c..b7d6314b 100644 --- a/pytorch_vision_googlenet.md +++ b/pytorch_vision_googlenet.md @@ -84,7 +84,7 @@ for i in range(top5_prob.size(0)): ### Model Description -GoogLeNet was based on a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The 1-crop error rates on the ImageNet dataset with a pretrained model are list below. +GoogLeNet was based on a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The 1-crop error rates on the ImageNet dataset with a pretrained model are listed below. | Model structure | Top-1 error | Top-5 error | | --------------- | ----------- | ----------- | diff --git a/pytorch_vision_once_for_all.md b/pytorch_vision_once_for_all.md index 10bc7a64..423b87a7 100644 --- a/pytorch_vision_once_for_all.md +++ b/pytorch_vision_once_for_all.md @@ -74,7 +74,7 @@ model, image_size = ofa_specialized_get("flops@595M_top1@80.0_finetune@75", pret model.eval() ``` -The model's prediction can be evalutaed by +The model's prediction can be evaluated by ```python # Download an example image from pytorch website import urllib diff --git a/pytorch_vision_proxylessnas.md b/pytorch_vision_proxylessnas.md index 1cf115a3..22972d24 100644 --- a/pytorch_vision_proxylessnas.md +++ b/pytorch_vision_proxylessnas.md @@ -20,7 +20,7 @@ demo-model-link: https://huggingface.co/spaces/pytorch/ProxylessNAS ```python import torch target_platform = "proxyless_cpu" -# proxyless_gpu, proxyless_mobile, proxyless_mobile14 are also avaliable. +# proxyless_gpu, proxyless_mobile, proxyless_mobile14 are also available. model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True) model.eval() ``` @@ -87,7 +87,7 @@ for i in range(top5_prob.size(0)): ProxylessNAS models are from the [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332) paper. -Conventionally, people tend to design *one efficient model* for *all hardware platforms*. But different hardware has different properties, for example, CPU has higher frequency and GPU is better at parallization. Therefore, instead of generalizing, we need to **specialize** CNN architectures for different hardware platforms. As shown in below, with similar accuracy, specialization offers free yet significant performance boost on all three platforms. +Conventionally, people tend to design *one efficient model* for *all hardware platforms*. But different hardware has different properties, for example, CPU has higher frequency and GPU is better at parallelization. Therefore, instead of generalizing, we need to **specialize** CNN architectures for different hardware platforms. As shown in below, with similar accuracy, specialization offers free yet significant performance boost on all three platforms. | Model structure | GPU Latency | CPU Latency | Mobile Latency | --------------- | ----------- | ----------- | ----------- | diff --git a/pytorch_vision_resnext.md b/pytorch_vision_resnext.md index 4798647d..91955407 100644 --- a/pytorch_vision_resnext.md +++ b/pytorch_vision_resnext.md @@ -2,7 +2,7 @@ layout: hub_detail background-class: hub-background body-class: hub -title: ResNext +title: ResNeXt summary: Next generation ResNets, more efficient and accurate category: researchers image: resnext.png @@ -87,9 +87,9 @@ for i in range(top5_prob.size(0)): ### Model Description -Resnext models were proposed in [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431). -Here we have the 2 versions of resnet models, which contains 50, 101 layers repspectively. -A comparison in model archetechure between resnet50 and resnext50 can be found in Table 1. +ResNeXt models were proposed in [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431). +Here are the two versions of ResNeXt models, which contain 50 and 101 layers, respectively. +A comparison of model architecture between ResNet-50 and ResNeXt-50 can be found in Table 1. Their 1-crop error rates on ImageNet dataset with pretrained models are listed below. | Model structure | Top-1 error | Top-5 error | diff --git a/sigsep_open-unmix-pytorch_umx.md b/sigsep_open-unmix-pytorch_umx.md index 54963210..47b0c18c 100644 --- a/sigsep_open-unmix-pytorch_umx.md +++ b/sigsep_open-unmix-pytorch_umx.md @@ -61,7 +61,7 @@ Furthermore, we provide a model for speech enhancement trained by [Sony Corporat * __`umxse`__ speech enhancement model is trained on the 28-speaker version of the [Voicebank+DEMAND corpus](https://datashare.is.ed.ac.uk/handle/10283/1942?show=full). -All three models are also available as spectrogram (core) models, which take magnitude spectrogram inputs and ouput separated spectrograms. +All three models are also available as spectrogram (core) models, which take magnitude spectrogram inputs and output separated spectrograms. These models can be loaded using `umxhq_spec`, `umx_spec` and `umxse_spec`. ### Details @@ -77,4 +77,4 @@ pip install openunmix ### References - [Open-Unmix - A Reference Implementation for Music Source Separation](https://doi.org/10.21105/joss.01667) -- [SigSep - Open Ressources for Music Separation](https://sigsep.github.io/) +- [SigSep - Open Resources for Music Separation](https://sigsep.github.io/) diff --git a/test_run_python_code.py b/test_run_python_code.py index f44a2856..10e1f191 100644 --- a/test_run_python_code.py +++ b/test_run_python_code.py @@ -11,7 +11,7 @@ @pytest.mark.parametrize('file_path', ALL_FILES) def test_run_file(file_path): if 'nvidia' in file_path: - # FIXME: NVIDIA models checkoints are on cuda + # FIXME: NVIDIA models checkpoints are on CUDA pytest.skip("temporarily disabled") if 'pytorch_fairseq_translation' in file_path: pytest.skip("temporarily disabled") @@ -26,11 +26,11 @@ def test_run_file(file_path): # We just run the python files in a separate sub-process. We really want a # subprocess here because otherwise we might run into package versions - # issues: imagine script A that needs torchvivion 0.9 and script B that + # issues: imagine script A that needs torchvision 0.9 and script B that # needs torchvision 0.10. If script A is run prior to script B in the same # process, script B will still be run with torchvision 0.9 because the only # "import torchvision" statement that counts is the first one, and even - # torchub sys.path shenanigans can do nothing about this. By creating + # torchhub sys.path shenanigans can do nothing about this. By creating # subprocesses we're sure that all file executions are fully independent. try: # This is inspired (and heavily simplified) from