Approximation by superpositions of a sigmoidal function¶

Why this mattered¶

Cybenko’s 1989 paper gave one of the cleanest early proofs that feedforward neural networks with a single hidden layer and a sigmoidal activation function can approximate any continuous function on a compact subset of Euclidean space, provided enough hidden units are available. This mattered because it turned neural networks from a suggestive engineering heuristic into an object with rigorous approximation-theoretic status. The result did not say such networks could be trained efficiently, nor that small networks would suffice, but it established that their representational capacity was not the fundamental obstacle.

The paradigm shift was conceptual: nonlinear computation could be built from many simple, uniform units and still be dense in a broad class of functions. That gave mathematical legitimacy to the idea that neural networks could serve as general-purpose function approximators in control, signal processing, statistics, and later machine learning. It also helped separate two questions that would shape the field for decades: what networks can represent versus what learning algorithms can find from data.

Cybenko’s theorem became part of the foundation on which later neural-network theory was built, alongside related universal approximation results by others. Subsequent breakthroughs in backpropagation, convolutional networks, deep architectures, large-scale optimization, and modern deep learning addressed issues the theorem left open: sample efficiency, architecture design, depth, generalization, and trainability. Its lasting importance is that it made a minimal but powerful claim precise: even very simple neural-network architectures are, in principle, universal approximators.

Abstract¶

(no abstract available)

cite → Multilayer feedforward networks are universal approximators — Both papers prove universal approximation for multilayer neural networks with sigmoidal nonlinearities.
enables → Gradient-based learning applied to document recognition — Universal approximation by sigmoidal networks justified multilayer neural networks as expressive function approximators for LeNet-style recognition.
cite ← Multilayer feedforward networks are universal approximators — Hornik, Stinchcombe, and White generalize Cybenko's sigmoidal-function approximation theorem to multilayer feedforward neural networks.
cite ← Gradient-based learning applied to document recognition — The CNN document-recognition paper cites the sigmoidal universal approximation theorem to justify neural networks' capacity to approximate complex decision functions.

Sources¶

DOI: https://doi.org/10.1007/bf02551274
OpenAlex: https://openalex.org/W2103496339

Approximation by superpositions of a sigmoidal function¶

Why this mattered¶

Abstract¶

Related¶

Sources¶