Human-level control through deep reinforcement learning¶
Why this mattered¶
Before this paper, reinforcement learning had produced strong results in controlled settings, but it had not convincingly shown that a single agent could learn a broad range of complex behaviors directly from high-dimensional sensory input. Mnih et al. introduced the deep Q-network (DQN), combining Q-learning with a convolutional neural network, experience replay, and a target network to stabilize training. On Atari 2600 games, the same architecture learned policies from raw pixels and game scores, reaching human-level or better performance on many titles without game-specific feature engineering. The paradigm shift was not merely higher scores; it was the demonstration that deep learning could turn perception and action into one trainable system.
What became newly possible was a practical route from raw observation to competent control in environments where hand-designed state representations were costly or unavailable. The paper helped reframe reinforcement learning as a scalable representation-learning problem: instead of giving the agent compact symbolic features, researchers could let neural networks learn task-relevant features jointly with value estimates. This made deep reinforcement learning a central research program and gave the field a benchmark-driven proof of concept that general-purpose agents could acquire diverse skills through trial and error.
Its influence is visible in later breakthroughs that extended the same basic ambition: learning powerful policies and value functions with deep networks. AlphaGo and AlphaZero used deep reinforcement learning together with search and self-play to exceed human expert performance in board games; later work in robotics, simulated control, and large-scale game agents built on the idea that learned representations could support sequential decision-making. The paper did not solve general intelligence, sample efficiency, or robustness, but it made the modern deep RL agenda concrete: agents could learn nontrivial control directly from rich perceptual input, using one broadly applicable neural architecture.
Abstract¶
(no abstract available)
Related¶
- cite → Reducing the Dimensionality of Data with Neural Networks — The DQN paper cites deep autoencoders as evidence that multilayer neural networks can learn compact representations from high-dimensional inputs.
- cite → Gradient-based learning applied to document recognition — The DQN paper cites LeNet to ground its use of convolutional neural networks for learning visual features directly from raw images.
- cite → ImageNet classification with deep convolutional neural networks — The DQN paper cites AlexNet as evidence that deep convolutional networks can achieve breakthrough performance on large-scale visual recognition.
- cite ← Mastering the game of Go without human knowledge — AlphaGo Zero builds on the deep reinforcement learning paradigm popularized by DQN for learning policies from trial-and-error experience.
- cite ← Dermatologist-level classification of skin cancer with deep neural networks — The skin-cancer classifier cites deep reinforcement learning as evidence that deep neural networks can reach human-level performance on complex tasks.
- cite ← TensorFlow: a system for large-scale machine learning — TensorFlow cites deep Q-networks as an example of deep reinforcement learning workloads supported by its dataflow execution model.
- cite ← Mastering the game of Go with deep neural networks and tree search — AlphaGo builds on deep reinforcement learning with neural networks by combining value and policy learning with Monte Carlo tree search.
- enables ← Reducing the Dimensionality of Data with Neural Networks — Deep autoencoder work helped establish multilayer neural representations that made deep learning practical for later deep reinforcement learning systems.
- enables ← Gradient-based learning applied to document recognition — LeNet's convolutional feature learning provided the visual representation method that DQN used to process Atari screen pixels.