Learning representations by back-propagating errors¶
Why this mattered¶
Before this paper, multilayer neural networks were widely viewed as difficult to train in a principled way: the perceptron had shown limits for single-layer systems, and hidden units lacked an obvious method for assigning credit to internal representations. Rumelhart, Hinton, and Williams made the error signal itself the mechanism for learning, showing how gradients could be propagated backward through layers so that hidden units could be adjusted by their contribution to output error. The paper did not invent every mathematical ingredient of backpropagation, but it made the method concrete, influential, and experimentally persuasive for connectionist learning.
What became newly possible was the practical training of multilayer networks that learned internal representations rather than relying only on hand-designed features or linear decision boundaries. This shifted neural networks from simple adaptive classifiers toward systems capable of distributed, hierarchical representation learning. In historical terms, the paper helped reopen neural-network research after skepticism about perceptrons and supplied a general recipe that could scale across tasks wherever differentiable components could be composed.
Its later importance lies in how directly it underlies modern deep learning. The breakthroughs in speech recognition, computer vision, machine translation, reinforcement learning, and large-scale language modeling all depend on variants of the same core idea: define a differentiable system, measure error, and use backpropagation to tune many layers of parameters. Later advances such as convolutional architectures, GPUs, better initialization, normalization, regularization, and massive datasets changed the scale and reliability of training, but the 1986 paper provided the central learning mechanism that made deep, representation-learning systems a practical scientific program.
Abstract¶
(no abstract available)
Related¶
- enables → Support-Vector Networks — Backpropagation popularized gradient-based learning of internal representations, providing a neural-network baseline and context for support-vector classification.
- enables → Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups — Back-propagation provided the gradient-training method that made multilayer neural networks practical for acoustic modeling.
- enables → Greedy function approximation: A gradient boosting machine. — Back-propagation popularized gradient-based error minimization, enabling gradient boosting's stagewise fitting of models to loss-function gradients.
- enables → TensorFlow: a system for large-scale machine learning — Backpropagation provided the gradient-based training algorithm for neural networks that TensorFlow generalized into scalable automatic differentiation and distributed computation graphs.
- enables → Deep Learning with Differential Privacy — Backpropagation supplies the gradient computations that Abadi et al. privatize with clipped, noise-added stochastic gradient descent.
- enables → Support-vector networks — Back-propagation popularized gradient-based representation learning, while support-vector networks pursued margin-based classification as an alternative supervised learning framework.
- enables → FaceNet: A unified embedding for face recognition and clustering — Back-propagation enabled training deep neural embeddings, the optimization basis for FaceNet's end-to-end triplet-loss face representation.
- cite ← Support-Vector Networks — Support-vector networks contrast margin-based kernel learning with neural networks trained by back-propagation.
- cite ← Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups — The speech-recognition DNN paper cites back-propagation as the core training method for multilayer neural networks.
- cite ← Greedy function approximation: A gradient boosting machine. — Gradient boosting links to back-propagation through gradient-based function optimization, but applies it stagewise to additive models rather than neural-network weights.
- cite ← TensorFlow: a system for large-scale machine learning — TensorFlow supports automatic differentiation and gradient-based training rooted in the back-propagation method for neural networks.
- cite ← Deep Learning with Differential Privacy — Deep Learning with Differential Privacy trains neural networks with differentially private stochastic gradient descent based on backpropagation.
- cite ← Support-vector networks — Support-vector networks contrast margin-based kernel learning with back-propagation-trained neural networks as approaches to supervised pattern recognition.
- cite ← FaceNet: A unified embedding for face recognition and clustering — FaceNet relies on neural-network representation learning made practical by back-propagation for training deep embeddings.
Sources¶
- DOI: https://doi.org/10.1038/323533a0
- OpenAlex: https://openalex.org/W1498436455