Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification¶
Why this mattered¶
The paper mattered less because PReLU itself became a dominant activation than because it solved a practical bottleneck in training very deep rectifier networks from scratch. By deriving an initialization rule matched to ReLU-family nonlinearities, He, Zhang, Ren, and Sun showed that depth could be increased without the optimization instabilities that had made very deep plain convolutional networks difficult to train. This turned initialization from a secondary implementation detail into a central condition for scalable deep learning, and the resulting “He initialization” became a standard default for ReLU-based neural networks.
Its ImageNet result also marked an important symbolic threshold. The reported 4.94% top-5 test error was presented as the first result below the commonly cited human-level estimate on ImageNet, reinforcing the idea that convolutional networks were no longer merely improving benchmark numbers but were reaching and exceeding human reference performance on constrained visual recognition tasks. The paper helped shift attention from handcrafted vision pipelines and shallow architectural tweaks toward systematically engineering depth, nonlinearities, and optimization conditions.
The work also sits directly before the next major leap from the same research line: residual networks. By showing that rectifier-specific initialization enabled much deeper models to be trained directly, it clarified both the promise and the remaining difficulty of depth. ResNet would soon address that remaining degradation problem with skip connections, but it relied on the same broader paradigm this paper strengthened: large-scale vision progress would come from making ever-deeper neural architectures trainable, stable, and empirically scalable.
Abstract¶
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.
Related¶
- cite → The Pascal Visual Object Classes (VOC) Challenge — The rectifier-network paper uses PASCAL VOC as an object-recognition benchmark for evaluating transfer from ImageNet-trained convolutional features.
- cite → Going deeper with convolutions — The rectifier-network paper compares its very deep PReLU networks against GoogLeNet's Inception architecture on ImageNet classification.
- cite → Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation — The rectifier-network paper uses R-CNN as the detection framework in which its ImageNet-trained features are transferred to object detection.
- cite → ImageNet: A large-scale hierarchical image database — The rectifier-network paper trains and evaluates on the large-scale ImageNet dataset introduced by Deng et al.
- cite → ImageNet Large Scale Visual Recognition Challenge — The rectifier-network paper reports performance on the ImageNet Large Scale Visual Recognition Challenge classification benchmark.
- cite → Backpropagation Applied to Handwritten Zip Code Recognition — The rectifier-network paper traces its supervised convolutional-network training lineage to LeCun et al.'s backpropagation-based handwritten digit recognizer.
- cite → ImageNet classification with deep convolutional neural networks — The rectifier-network paper improves on AlexNet-style ImageNet convolutional classification by using deeper networks and parametric rectified linear units.
- cite ← Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network — SRGAN uses PReLU activations introduced in Delving Deep into Rectifiers within its deep convolutional generator architecture.
- cite ← Deep Residual Learning for Image Recognition — ResNet cites PReLU rectifier initialization work as evidence that activation and initialization choices enable very deep ImageNet classifiers.
- cite ← Squeeze-and-Excitation Networks — Squeeze-and-Excitation Networks build on PReLU/rectifier advances as part of the deep CNN design space for improving ImageNet accuracy.
- cite ← Deep Learning with Differential Privacy — Deep Learning with Differential Privacy uses rectifier-based deep networks like those improved by PReLU initialization as target models for private training.
- enables ← The Pascal Visual Object Classes (VOC) Challenge — PASCAL VOC provided object-recognition benchmark practices that helped frame ImageNet-era evaluation for deep rectifier networks.
- enables ← ImageNet: A large-scale hierarchical image database — ImageNet supplied the large-scale labeled classification benchmark on which PReLU initialization and rectifier networks surpassed human-level top-5 accuracy.
- enables ← Backpropagation Applied to Handwritten Zip Code Recognition — Backpropagation for convolutional networks supplied the supervised gradient-training method used to optimize the 2015 deep rectifier ImageNet model.