ImageNet classification with deep convolutional neural networks¶
Why this mattered¶
This paper marked the point at which deep convolutional neural networks became the dominant approach to large-scale visual recognition. Its importance was not just that it won ILSVRC-2012, but that it won by an unusually large margin: a 15.3% top-5 error rate versus 26.2% for the next best system. That result showed that learned hierarchical visual features, trained end-to-end on a very large labeled dataset, could outperform pipelines built around hand-engineered descriptors and task-specific classifiers. ImageNet’s scale was central: the paper demonstrated that deep models could exploit millions of labeled natural images rather than being limited by smaller academic benchmarks.
The work also made clear which ingredients were beginning to make deep learning practically viable: GPUs for tractable training, rectified nonlinearities for faster optimization, data augmentation and dropout for generalization, and sufficiently large networks to learn rich visual representations. None of these components was entirely new in isolation, but their combination produced a system whose empirical performance changed the field’s expectations. After this result, computer vision rapidly reorganized around deep convolutional architectures, and benchmarks that had previously advanced incrementally began to see large gains from deeper, larger, and better-regularized neural networks.
Its influence extended beyond image classification. The success of this model helped establish the broader recipe of large datasets, high-capacity neural networks, specialized hardware, and end-to-end training that later powered advances in detection, segmentation, speech recognition, machine translation, reinforcement learning, and eventually foundation models. In that sense, the paper was paradigm-shifting because it converted deep learning from a promising but contested approach into the default experimental starting point for perception problems, and it helped launch the modern era in which progress is often driven by scaling model capacity, data, and computation together.
Abstract¶
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Related¶
- cite → Going deeper with convolutions — AlexNet's convolutional-network success is a direct precursor to GoogLeNet's deeper Inception architecture for ImageNet classification.
- cite → ImageNet: A large-scale hierarchical image database — AlexNet uses the ImageNet large-scale labeled image database as the benchmark dataset for training and evaluating classification accuracy.
- cite → Random Forests — AlexNet cites Random Forests as a contrasting machine-learning ensemble method predating deep convolutional feature learning.
- cite ← Mastering the game of Go without human knowledge — AlphaGo Zero uses deep convolutional neural networks whose image-recognition success was established by AlexNet on ImageNet.
- cite ← The Cityscapes Dataset for Semantic Urban Scene Understanding — Cityscapes cites AlexNet for showing that deep convolutional networks can dominate large-scale image recognition.
- cite ← Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network — SRGAN uses features from an ImageNet-trained deep convolutional classifier, following the representation-learning breakthrough of AlexNet.
- cite ← Going deeper with convolutions — GoogLeNet follows AlexNet in using deep convolutional networks for ImageNet-scale visual classification.
- cite ← Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization — Grad-CAM explains predictions from AlexNet-style deep convolutional classifiers introduced for ImageNet classification.
- cite ← Dermatologist-level classification of skin cancer with deep neural networks — The skin-cancer classifier builds on AlexNet-style deep convolutional image classification trained at large scale.
- cite ← Learning Deep Features for Discriminative Localization — CAM builds on the finding from AlexNet that deep convolutional networks learn discriminative visual features for ImageNet classification.
- cite ← Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification — The rectifier-network paper improves on AlexNet-style ImageNet convolutional classification by using deeper networks and parametric rectified linear units.
- cite ← Deep Residual Learning for Image Recognition — ResNet cites AlexNet as the breakthrough ImageNet convolutional network that established deep CNNs for visual recognition.
- cite ← DeepWalk — DeepWalk cites AlexNet as evidence that learned dense representations and deep models can outperform hand-engineered features.
- cite ← Human-level control through deep reinforcement learning — The DQN paper cites AlexNet as evidence that deep convolutional networks can achieve breakthrough performance on large-scale visual recognition.
- cite ← Squeeze-and-Excitation Networks — Squeeze-and-Excitation Networks cite AlexNet as the landmark deep CNN that established large-scale ImageNet classification as a core benchmark.
- cite ← Mastering the game of Go with deep neural networks and tree search — AlphaGo uses convolutional neural networks for board-position evaluation, following the deep CNN success demonstrated by AlexNet on ImageNet classification.
- cite ← Image Super-Resolution Using Deep Convolutional Networks — SRCNN cites AlexNet-style deep convolutional image classification as motivation for applying deep CNNs to image super-resolution.
- cite ← Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation — R-CNN adopts the AlexNet-style deep convolutional network breakthrough from ImageNet classification for detection feature extraction.
- cite ← Deep Learning with Differential Privacy — Deep Learning with Differential Privacy uses AlexNet-style ImageNet convolutional networks as representative deep models for private learning.
- cite ← Convolutional Neural Networks for Sentence Classification — Kim cites AlexNet as a landmark demonstration that deep convolutional networks trained with dropout and ReLU-style nonlinearities can achieve major classification gains.
- cite ← ImageNet Large Scale Visual Recognition Challenge — ILSVRC cites AlexNet because its deep convolutional network dramatically improved ImageNet classification performance.
- cite ← Image Style Transfer Using Convolutional Neural Networks — Neural style transfer relies on the hierarchical convolutional features popularized by AlexNet-style ImageNet classifiers to separate content and texture statistics.
- cite ← Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs — The retinopathy algorithm applies deep convolutional neural network methods popularized by AlexNet-style ImageNet classification to retinal fundus images.
- cite ← Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations — Physics-informed neural networks cite AlexNet as evidence that deep neural networks can learn high-dimensional nonlinear representations effectively.
- enables ← ImageNet: A large-scale hierarchical image database — ImageNet provided the large labeled visual dataset and classification benchmark on which the deep convolutional neural network achieved its breakthrough result.
- enables ← Random Forests — Random Forests provided a strong pre-deep-learning ensemble baseline for image classification, helping frame the performance gains of convolutional neural networks on ImageNet.
Sources¶
- DOI: https://doi.org/10.1145/3065386
- OpenAlex: https://openalex.org/W2163605009