프린세스 다이어리

[Review] ImageNet Classification with Deep Convolutional Neural Networks 본문

AI, ML

[Review] ImageNet Classification with Deep Convolutional Neural Networks

개발공주 2023. 4. 10. 00:41
728x90
ImageNet Classification with Deep Convolutional Neural Networks

알렉스넷(AlexNet)으로 유명한 CNN 네트워크를 소개한 논문입니다. 

 

1. Introduction

This paper presents the AlexNet architecture, which uses a combination of convolutional and fully-connected layers to effectively classify images. Prior to the development of this neural network, deep learning faced challenges due to insufficient datasets and computing resources.

In this review, I will focus on the key characteristics of the AlexNet architecture and discuss how each of them addressed the limitations of existing image classification neural networks. I will mention what is unique about the suggested solutions. Also I will cover how this idea is evaluated in the end of the review.

 

2. AlexNet Architecture

  • 5 convolutional layers
    • overlapping pooling
    • ReLU
  • 3 fully connected layers
    • Dropout
    • ReLU
  • Details of learning
    • stochastic gradient descent
    • batch size 128, momentum of 0.9, weight decay of 0.005
    • initialized weights using zero-mean Gaussian distribution with standard deviation of 0.01
    • initialized neuron biases to 1 in the 2,3,5 conv layers and fc hidden layers, and initialized newron biases to 0 in the remaining layers
    • initialized learning rate to 0.01

(1) ReLU

  • Current problem: saturating nonlinearities such as tanh or sigmoid were much slower than non-saturating nonlinearity like ReLU.
  • Solution: Using ReLU, which introduces non-linearity within the network, and enables the training process to be accelerated
  • Result: ReLU reached 0.25 training error rate 6 times faster than tanh
  • What makes this method unique: It was the first time ReLU used in a deep convolutional neural network (CNN) for image classification tasks

(2) Multiple GPUs

  • Current problem: one GPU insufficient to train big network
  • Solution: Cross-GPU parallelization using 2 GPUs, putting half of kernels on each GPU
  • Result: Reduced their top-1 and top-5 error rates by 1.7% and 1.2% respectively. It enabled faster training of the network by distributing the workload between the two GPUs and is now a commonly used technique in deep learning.
  • What makes this method unique: It was the first neural network architecture that utilized cross-GPU parallelization to train a large-scale deep CNN for image classification

(3) Local Response Normalization

  • Problem: Using ReLU itself prevent from saturation. But still need more generalization method
  • Solution: Local Response Normalization, a technique that maximizes the activation of neighbouring neurons
  • Result: Response normalization reduces their top-1 and top-5 error rates by 1.4% and 1.2%, respectively.
  • What makes this method unique: This technique was first applied in a deep convolutional neural network for image classification tasks.

(4) Overlapping Pooling

Problem: Necessity for reducing overfitting

Solution: Overlapping pooling

 

Traditional Pooling

  • The neighborhoods summarized by adjacent pooling units do not overlap

Overlapping Pooling

  • Pooling units overlap each other due to smaller size of pooling unit

image shows s=2, z=2 original pooling method and s=1 z=2 overlapping pooling method

Result: models with overlapping polling find it slightly more difficult to overfit, with s = 2 and z = 3. This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively, as compared with the non-overlapping scheme s = 2, z = 2.

What makes this method unique: It was the first time this technique was applied in a deep convolutional neural network for image classification tasks.

(5) Data Augmentation

  • Problem: Network overfitting problem
  • Solution: Artificially enlarging the dataset by image translation and horizontal reflexion

a. Image translation

  • extracting random 224 * 224 patches from the 256 * 256 images

b. horizontal reflexion

  • horizontal reflexion, x2 number of dataset
number of training dataset original dataset * (256 - 224) * (256 - 224) * 2
number of testing dataset original dataset * 5 * 2(the four corner patches and the center patch and horizontal reflections)

c. RGB intensities altering

  • Result: Reduced substantial overfitting, object identity is invariant to changes in the intensity and color of the illumination, and reduces the top-1 error rate by over 1%
  • What makes this method unique: Data augmentation has been used in various forms before the development of AlexNet, but it played a significant role in showcasing its effectiveness in improving the performance of deep learning models.

(6) Dropout

Problem: Combining multiple models is expensive that takes several days to train

Solution: Applying dropout after the first two fully connected layers. It involves randomly setting the output of each hidden neuron to zero with a probability of 0.5.

Result: helped to prevent overfitting and makes it possible to combine the models more efficiently, and this helped to reduce the training time required for combining multiple models.

What makes this method unique: It was the first time this technique was extensively used in a deep convolutional neural network for image classification tasks.

 

3. Evaluation

(1) How it is evaluated

  • The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition focused on object detection and image classification using the ImageNet dataset.
  • The ILSVRC-2012 dataset contained 1.2 million training images, 50,000 validation images, and 100,000 test images, spanning 1,000 object categories.
  • The main evaluation metric used in this challenge was Top-1 and Top-5 error rates. The top-1 error rate is the percentage of test images for which the model's highest-confidence prediction did not match the true label, while the top-5 error rate is the percentage of test images for which the true label did not appear among the model's top five predictions.

(2) Evaluation result

Comparison of results on ILSVRC-2010 test set

  • AlexNet achieved a top-5 error rate of 15.3% and a top-1 error rate of 37.5% on the ILSVRC-2010 dataset

Comparison of error rates on ILSVRC-2012 validation and test sets.

  • AlexNet also was evaluated in the ILSVRC-2012, and it achieved a top-5 error rate of 15.3% and a top-1 error rate of around 37.5%.
  •  

 

728x90
Comments