Skip to content

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Why this mattered

The paper mattered less because PReLU itself became a dominant activation than because it solved a practical bottleneck in training very deep rectifier networks from scratch. By deriving an initialization rule matched to ReLU-family nonlinearities, He, Zhang, Ren, and Sun showed that depth could be increased without the optimization instabilities that had made very deep plain convolutional networks difficult to train. This turned initialization from a secondary implementation detail into a central condition for scalable deep learning, and the resulting “He initialization” became a standard default for ReLU-based neural networks.

Its ImageNet result also marked an important symbolic threshold. The reported 4.94% top-5 test error was presented as the first result below the commonly cited human-level estimate on ImageNet, reinforcing the idea that convolutional networks were no longer merely improving benchmark numbers but were reaching and exceeding human reference performance on constrained visual recognition tasks. The paper helped shift attention from handcrafted vision pipelines and shallow architectural tweaks toward systematically engineering depth, nonlinearities, and optimization conditions.

The work also sits directly before the next major leap from the same research line: residual networks. By showing that rectifier-specific initialization enabled much deeper models to be trained directly, it clarified both the promise and the remaining difficulty of depth. ResNet would soon address that remaining degradation problem with skip connections, but it relied on the same broader paradigm this paper strengthened: large-scale vision progress would come from making ever-deeper neural architectures trainable, stable, and empirically scalable.

Abstract

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

Sources