Skip to content

lbrejon/Compute-similarity-between-images-using-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Compute-similarity-between-images-using-CNN

Comparison of cosine similarity performances between VGG16 and ResNet50

Table of contents πŸ“

Estimated reading time : ⏱️ 5min

My goals 🎯

  • Learn how to extract extract feature vector
  • Compute similarity between images
  • Make data augmentation to increase dataset

Technologies πŸ–₯️

Programming languages:

- Python (framework TensorFlow)

Project composition πŸ“‚

.
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ flowerpot.jpg
β”‚Β Β  β”œβ”€β”€ vase.jpg
β”‚Β Β  └── vase2.jpg.csv
β”‚
β”œβ”€β”€ notebooks
β”‚Β Β  └── extract_features.ipynb
β”‚
└── report
 Β Β  β”œβ”€β”€ augmented_img
    β”‚    β”œβ”€β”€ vaseAI0.jpg
    β”‚    β”œβ”€β”€ vaseAI1.jpg
    β”‚    └──  ..
    β”‚
 Β Β  └── cos_sim
         β”œβ”€β”€ resnet50
         β”‚    β”œβ”€β”€ vase_flowerpot.jpg
         β”‚    β”œβ”€β”€ vase_vase.jpg
         β”‚    └── vase_vase2.jpg
         β”‚
         └── vgg16
              β”œβ”€β”€ vase_flowerpot.jpg
              β”œβ”€β”€ vase_vase.jpg
              β”œβ”€β”€ vase_vase2.jpg
              β”œβ”€β”€ vase_vaseAI0.jpg
              β”œβ”€β”€ vase_vaseAI1.jpg
              └── ..

Description πŸ“‹

This project aims to deepen knowledges in CNNs, especially in features extraction and images similarity computation. I decided to work with 2 pre-trained CNN (on ImageNet): the VGG16 and the ResNet50 and to compare their cosine similarity performances. You can choose to load models:
- to make predictions ( include_top = True: the model will be composed of all layers: 'feature learning block' + 'classification block')
- to extract features (include_top = False: the classification block is omitted)



[Figure 1]: Architecture of the VGG16 (left) and ResNet50 (right)

In a first time, I wondered which model could predict an image whith the most accuracy. Here I chose to compare their performances for a vase image: the ResNet50 was the best with 99.89% accuracy against 95.06% for the VGG16. The idea in this part was to manipulate and to understand how prediction works.



[Figure 2]: Comparison of predictions (VGG16/ResNet18)

Then I decided to visualize features maps from main blocks in the VGG16. These feature maps output from each block are collected in a single pass to create an image. There are 5 main blocks in the image (e.g. block1, block2, etc.) that end in a pooling layer for the VGG16. You can choose blocks to visualize by the layers index: idx = [2, 5, 9, 13, 17] # [block1, block2, block3, block4, block5]. Figure 3 highlights that quality-level features extraction is proportional with the network depth



[Figure 3]: Visualization of the 5 main blocks from the VGG16

Now let's focus on features vector extraction. Removing the last layer of the model enables to extract a feature vector as explained previously. Then, the input images is preprocessed (reshaping, RGB->BGR conversion, zero-centering with dataset). The global process on the Figure 4 depicts how to compute similarity between two images. Images were stored on AWS S3 and I used an notebook instance in AWS SageMaker. A features vector was extracted for each image, then the latter compared with cosine similarity. It computes the cosine of the angle between both features vectors with the compute_similarity_img() function.



[Figure 4]: Computation similarity process

Here are the obtained results for cosine similarity with the VGG16



[Figure 5]: Cosine similarity using VGG16

I decided to increase the dataset and to compare results with data augmentation as shown in Figure 6. For the data augmentation, I used a ImageDataGenerator object to set up data augmentation parameters. It generated batches of tensor image data with real-time data augmentation:

gen = ImageDataGenerator(
    rotation_range=30, # Int: degree range for random rotations
    width_shift_range=0.1, # Float: fraction of total width, if < 1, or pixels if >= 1
    height_shift_range=0.1, # Float: fraction of total height, if < 1, or pixels if >= 1
    shear_range=0.15, # Float: shear Intensity (shear angle in counter-clockwise direction in degrees)
    zoom_range=0.1, # Float: range for random zoom
    channel_shift_range=10., # Float: range for random channel shifts
    horizontal_flip=True # Boolean: randomly flip inputs horizontally
)



[Figure 6]: Cosine similarity with augmented images using VGG16

Then I compared cosine similarity performances between both models:



[Figure 7]: Comparison of cosine similarity between VGG16 and ResNet50

Sources βš™οΈ

  • Help for image classification here
  • Help for data augmentation here

About

Compute cosine similarity with VGG16 and ResNet50

Topics

Resources

Stars

Watchers

Forks