A Fast Learning Algorithm for Deep Belief Nets¶
Why this mattered¶
TBD
Abstract¶
We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
Related¶
- cite → Gradient-based learning applied to document recognition — Hinton et al. contrast deep belief net pretraining with the supervised gradient-based convolutional learning demonstrated by LeCun et al. for document recognition.
- enables → Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups — Deep belief net pretraining supplied a layer-wise learning strategy that enabled effective training of deep acoustic models.
- cite ← Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups — The speech-recognition DNN paper cites deep belief nets as a pretraining method that made deep neural networks easier to optimize.
- cite ← Reducing the Dimensionality of Data with Neural Networks — The autoencoder paper uses deep belief net pretraining from Hinton et al. to initialize deep autoencoders before fine-tuning them for dimensionality reduction.
- enables ← Gradient-based learning applied to document recognition — LeCun's backpropagation-trained neural networks enabled deep belief nets by demonstrating that multilayer representations could be learned effectively for perception tasks.