[Review] ImageNet Classification with Deep Convolutional Neural Networks

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

프린세스 다이어리

[Review] ImageNet Classification with Deep Convolutional Neural Networks 본문

AI, ML

[Review] ImageNet Classification with Deep Convolutional Neural Networks

개발공주 2023. 4. 10. 00:41

728x90

ImageNet Classification with Deep Convolutional Neural Networks

알렉스넷(AlexNet)으로 유명한 CNN 네트워크를 소개한 논문입니다.

1. Introduction

This paper presents the AlexNet architecture, which uses a combination of convolutional and fully-connected layers to effectively classify images. Prior to the development of this neural network, deep learning faced challenges due to insufficient datasets and computing resources.

In this review, I will focus on the key characteristics of the AlexNet architecture and discuss how each of them addressed the limitations of existing image classification neural networks. I will mention what is unique about the suggested solutions. Also I will cover how this idea is evaluated in the end of the review.

2. AlexNet Architecture

5 convolutional layers
- overlapping pooling
- ReLU
3 fully connected layers
- Dropout
- ReLU
Details of learning
- stochastic gradient descent
- batch size 128, momentum of 0.9, weight decay of 0.005
- initialized weights using zero-mean Gaussian distribution with standard deviation of 0.01
- initialized neuron biases to 1 in the 2,3,5 conv layers and fc hidden layers, and initialized newron biases to 0 in the remaining layers
- initialized learning rate to 0.01

(1) ReLU

Current problem: saturating nonlinearities such as tanh or sigmoid were much slower than non-saturating nonlinearity like ReLU.
Solution: Using ReLU, which introduces non-linearity within the network, and enables the training process to be accelerated
Result: ReLU reached 0.25 training error rate 6 times faster than tanh
What makes this method unique: It was the first time ReLU used in a deep convolutional neural network (CNN) for image classification tasks

(2) Multiple GPUs

Current problem: one GPU insufficient to train big network
Solution: Cross-GPU parallelization using 2 GPUs, putting half of kernels on each GPU
Result: Reduced their top-1 and top-5 error rates by 1.7% and 1.2% respectively. It enabled faster training of the network by distributing the workload between the two GPUs and is now a commonly used technique in deep learning.
What makes this method unique: It was the first neural network architecture that utilized cross-GPU parallelization to train a large-scale deep CNN for image classification

(3) Local Response Normalization

Problem: Using ReLU itself prevent from saturation. But still need more generalization method
Solution: Local Response Normalization, a technique that maximizes the activation of neighbouring neurons
Result: Response normalization reduces their top-1 and top-5 error rates by 1.4% and 1.2%, respectively.
What makes this method unique: This technique was first applied in a deep convolutional neural network for image classification tasks.

(4) Overlapping Pooling

Problem: Necessity for reducing overfitting

Solution: Overlapping pooling

Traditional Pooling

The neighborhoods summarized by adjacent pooling units do not overlap

Overlapping Pooling

Pooling units overlap each other due to smaller size of pooling unit

image shows s=2, z=2 original pooling method and s=1 z=2 overlapping pooling method

Result: models with overlapping polling find it slightly more difficult to overfit, with s = 2 and z = 3. This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively, as compared with the non-overlapping scheme s = 2, z = 2.

What makes this method unique: It was the first time this technique was applied in a deep convolutional neural network for image classification tasks.

(5) Data Augmentation

Problem: Network overfitting problem
Solution: Artificially enlarging the dataset by image translation and horizontal reflexion

a. Image translation

extracting random 224 * 224 patches from the 256 * 256 images

b. horizontal reflexion

horizontal reflexion, x2 number of dataset

number of training dataset	original dataset * (256 - 224) * (256 - 224) * 2
number of testing dataset	original dataset * 5 * 2(the four corner patches and the center patch and horizontal reflections)

c. RGB intensities altering

Result: Reduced substantial overfitting, object identity is invariant to changes in the intensity and color of the illumination, and reduces the top-1 error rate by over 1%
What makes this method unique: Data augmentation has been used in various forms before the development of AlexNet, but it played a significant role in showcasing its effectiveness in improving the performance of deep learning models.

(6) Dropout

Problem: Combining multiple models is expensive that takes several days to train

Solution: Applying dropout after the first two fully connected layers. It involves randomly setting the output of each hidden neuron to zero with a probability of 0.5.

Result: helped to prevent overfitting and makes it possible to combine the models more efficiently, and this helped to reduce the training time required for combining multiple models.

What makes this method unique: It was the first time this technique was extensively used in a deep convolutional neural network for image classification tasks.

3. Evaluation

(1) How it is evaluated

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition focused on object detection and image classification using the ImageNet dataset.
The ILSVRC-2012 dataset contained 1.2 million training images, 50,000 validation images, and 100,000 test images, spanning 1,000 object categories.
The main evaluation metric used in this challenge was Top-1 and Top-5 error rates. The top-1 error rate is the percentage of test images for which the model's highest-confidence prediction did not match the true label, while the top-5 error rate is the percentage of test images for which the true label did not appear among the model's top five predictions.

(2) Evaluation result

Comparison of results on ILSVRC-2010 test set

AlexNet achieved a top-5 error rate of 15.3% and a top-1 error rate of 37.5% on the ILSVRC-2010 dataset

Comparison of error rates on ILSVRC-2012 validation and test sets.

AlexNet also was evaluated in the ILSVRC-2012, and it achieved a top-5 error rate of 15.3% and a top-1 error rate of around 37.5%.

728x90

'AI, ML' 카테고리의 다른 글

[Review] Introduction of deep neural network in hybrid WCET analysis(2018) 논문 요약 정리 (0)	2023.08.28
PyTorch DataParallel 코드구현 및 성능 비교 (0)	2023.05.30
[Review] Pytorch Distributed: Experiences on Accelerating Data Parallel Training (0)	2023.05.29
Developing and training the AlexNet model using Tensorflow on CIFAR-10 dataset (0)	2023.04.11
Developing and training the AlexNet model using PyTorch on the CIFAR-10 dataset (0)	2023.04.10

'AI, ML' Related Articles

Comments

프린세스 다이어리

[Review] ImageNet Classification with Deep Convolutional Neural Networks 본문

[Review] ImageNet Classification with Deep Convolutional Neural Networks

1. Introduction

2. AlexNet Architecture

(1) ReLU

(2) Multiple GPUs

(3) Local Response Normalization

(4) Overlapping Pooling

(5) Data Augmentation

(6) Dropout

3. Evaluation

(1) How it is evaluated

(2) Evaluation result

'AI, ML' 카테고리의 다른 글

티스토리툴바