We present two-layered neural network models with Q ( greater than or equal to 2)-states neurons for a system with middle temporal (MT) neurons and medial superior temporal (MST) neurons by using a wake-sleep algorith...
详细信息
We present two-layered neural network models with Q ( greater than or equal to 2)-states neurons for a system with middle temporal (MT) neurons and medial superior temporal (MST) neurons by using a wake-sleep algorithm proposed by Hinton et al.;we notice that the wake-sleep algorithm consists of local learning rules. We first investigate a model with binary neurons for response properties of the MST neurons to optical flows as for various types of motion. We next extend the model with binary neurons to a model with Q (greater than or equal to 3)-states neurons and investigate the response properties of the MST neurons for various values of Q (greater than or equal to 3). We obtain better response properties for the model with Q (greater than or equal to 3)-states neurons than for the one with binary neurons. (C) 2003 Elsevier Ltd. All rights reserved.
We employ statistical dynamics to study the convergence of the wake-sleep (W-S) algorithm, which is a learning algorithm for neural network models having hidden units. Although there have been several reports on the e...
详细信息
We employ statistical dynamics to study the convergence of the wake-sleep (W-S) algorithm, which is a learning algorithm for neural network models having hidden units. Although there have been several reports on the effectiveness of the NV-S algorithm based on experimental methods, the theoretical side is not clear even for a simple network. In this paper, we investigate the dynamic characteristics of the W-S algorithm expressed by a single factor analysis problem, which is the simplest state setting. The advantage of our approach is the ability to quantitatively evaluate the effect that the learning coefficients have on the convergence, which is difficult when using other methods. The result was that the settings of the learning coefficients, particularly in the sleep step, had a substantial effect on the convergence of the algorithm. (C) 2001 Scripta Technica.
We study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. On...
详细信息
We study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. One geometry is related to the full system, the other one to the visible sub-system. These two geometries imply different natural gradients. In a first step, we demonstrate a great simplification of the natural gradient with respect to the first geometry, due to locality properties of the Fisher information matrix. This simplification does not directly translate to a corresponding simplification with respect to the second geometry. We develop the theory for studying the relation between the two versions of the natural gradient and outline a method for the simplification of the natural gradient with respect to the second geometry based on the first one. This method suggests to incorporate a recognition model as an auxiliary model for the efficient application of the natural gradient method in deep networks.
Variational Autoencoders (VAEs) are known to easily suffer from the KL-vanishing problem when combining with powerful autoregressive models like recurrent neural networks (RNNs), which prohibits their wide application...
详细信息
ISBN:
(纸本)9783319700878;9783319700861
Variational Autoencoders (VAEs) are known to easily suffer from the KL-vanishing problem when combining with powerful autoregressive models like recurrent neural networks (RNNs), which prohibits their wide application in natural language processing. In this paper, we tackle this problem by tearing the training procedure into two steps: learning effective mechanisms to encode and decode discrete tokens (wake step) and generalizing meaningful latent variables by reconstructing dreamed encodings (sleep step). The training pattern is similar to the wake-sleep algorithm: these two steps are trained alternatively until an equilibrium is achieved. We test our model in a language modeling task. The results demonstrate significant improvement over the current state-of-the-art latent variable models.
暂无评论