Skip to content

Approximation by superpositions of a sigmoidal function

Why this mattered

Cybenko’s 1989 paper gave one of the cleanest early proofs that feedforward neural networks with a single hidden layer and a sigmoidal activation function can approximate any continuous function on a compact subset of Euclidean space, provided enough hidden units are available. This mattered because it turned neural networks from a suggestive engineering heuristic into an object with rigorous approximation-theoretic status. The result did not say such networks could be trained efficiently, nor that small networks would suffice, but it established that their representational capacity was not the fundamental obstacle.

The paradigm shift was conceptual: nonlinear computation could be built from many simple, uniform units and still be dense in a broad class of functions. That gave mathematical legitimacy to the idea that neural networks could serve as general-purpose function approximators in control, signal processing, statistics, and later machine learning. It also helped separate two questions that would shape the field for decades: what networks can represent versus what learning algorithms can find from data.

Cybenko’s theorem became part of the foundation on which later neural-network theory was built, alongside related universal approximation results by others. Subsequent breakthroughs in backpropagation, convolutional networks, deep architectures, large-scale optimization, and modern deep learning addressed issues the theorem left open: sample efficiency, architecture design, depth, generalization, and trainability. Its lasting importance is that it made a minimal but powerful claim precise: even very simple neural-network architectures are, in principle, universal approximators.

Abstract

(no abstract available)

Sources