top of page
Search

Building a Smile Detector with PyTorch

  • Writer: Peter Ma
    Peter Ma
  • Feb 2
  • 3 min read

Have you ever wondered how machines can detect smiles in photos? In this post, I’ll walk you through my project where I built a neural network using PyTorch to detect whether a person is smiling or not.


Model Overview

The goal of this project was to create a binary classification model that predicts two possible outcomes: smiling (1) or not smiling (0). To achieve this, I used the binary cross-entropy loss function, which is well-suited for binary tasks. This loss function penalizes incorrect predictions by comparing the true label (y) with the predicted probability (ŷ).


Loss Function Used
Loss Function Used

Dataset

For training and testing, I used the CelebA dataset, which contains over 162,000 images in the training set and around 20,000 images in the test set. Each image was resized to 128x128 pixels with 3 color channels (RGB) and transformed into tensors to feed into the model.


Model Architecture

I opted for a Convolutional Neural Network (CNN) because CNNs excel at image classification tasks. Here's the architecture breakdown:

  1. Input Layer: Images are preprocessed and reshaped to 128x128x3 tensors.

  2. Convolution Blocks (3 total): Each block includes:

    • Convolution Layer: Extracts features using multiple filters.

    • Batch Normalization: Stabilizes learning by normalizing the outputs.

    • ReLU Activation: Adds non-linearity, making the model more flexible.

    • Max Pooling: Reduces the spatial dimensions while retaining key features.

  3. Fully Connected Layers (2 layers):

    • Dropout (50%): Applied during training to prevent overfitting.

    • Sigmoid Activation: Outputs the probability of a smile (between 0 and 1).


    Model Architecture
    Model Architecture

    Sigmoid Activation Function
    Sigmoid Activation Function

How Does a Convolution Layer Work?

Convolution layers use filters (3x3x3 grids) filled with learnable weights. The filter slides over the image, multiplying corresponding pixel values with the weights, summing them, and adding a bias. This process captures spatial features like edges, textures, and patterns.


The depth of the output tensor depends on the number of filters used. For instance, using N filters results in an output with N channels.


Why Use a CNN?

  • Efficient Computation: CNNs require fewer parameters compared to traditional fully connected networks.

  • Spatial Awareness: Filters detect local features, like edges or textures, regardless of their position in the image.

  • Abstract Learning: Deeper layers capture complex patterns beyond basic shapes, enabling higher-level feature recognition.



Key Components Explained

  • Batch Normalization: Normalizes the data to prevent issues like vanishing or exploding gradients, improving model stability.


  • ReLU Activation: Efficiently introduces non-linearity while avoiding gradient-related issues common with sigmoid or tanh functions.


  • Max Pooling: Reduces dimensionality by selecting the maximum value from small regions, preserving important features while reducing computation.


  • Dropout: Randomly deactivates neurons during training to prevent overfitting and improve generalization.



Hyperparameter Tuning

There are two critical parameters which if tuned, could drastically improve model performance:

  1. Learning Rate (α): Affects how quickly the model learns. Too high, and the model overshoots optimal values; too low, and training becomes slow.



  2. Batch Size: Impacts memory usage and model stability. Larger batches offer smoother gradients, while smaller batches add beneficial randomness.


Results

After training the model for just 1 epoch (around 7 minutes on an NVIDIA T4 GPU), I achieved:

  • Training Accuracy: 89%

  • Testing Accuracy: 91%

Interestingly, the testing accuracy was slightly higher than the training accuracy. This could be due to the 0.5 dropout rate applied during training, which helped prevent overfitting. Additionally, training for only one epoch may have led to underfitting, leaving room for further improvement.


Final Thoughts

This project was an exciting dive into image classification using PyTorch. It not only helped me understand the mechanics of convolutional neural networks but also the importance of hyperparameter tuning, regularization techniques like dropout, and the intricacies of training deep learning models.


If you have questions about the project or suggestions for improvement, feel free to reach out through the contact page. I’d love to hear your thoughts!

 
 
 

Comments


bottom of page