This paper rigorously establishes that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one f...
详细信息
This paper rigorously establishes that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are a class of universal approximators.
Theoretical results and practical experience indicate that feedforward networks are very good at approximating a wide class of functional relationships. Training networks to approximate functions takes place by using ...
详细信息
Theoretical results and practical experience indicate that feedforward networks are very good at approximating a wide class of functional relationships. Training networks to approximate functions takes place by using exemplars to find interconnect weights that maximize some goodness of fit criterion. Given finite data sets it can be important in the training process to take advantage of any a priori information regarding the underlying functional relationship to improve the approximation and the ability of the network to generalize. This paper describes methods for incorporating a priori information of this type into feedforward networks. Two general approaches, one based upon architectural constraints and a second upon connection weight constraints form the basis of the methods presented. These two approaches can be used either alone or in combination to help solve specific training problems. Several examples covering a variety of types of a priori information, including information about curvature, interpolation points, and output layer interrelationships are presented.
It is well known that standard single-hidden layer feed-forward networks (SLFN's) with at most N hidden neurons (including biases) can learn N distinct samples (x(i), t(i)) with zero error, and the weights connect...
详细信息
It is well known that standard single-hidden layer feed-forward networks (SLFN's) with at most N hidden neurons (including biases) can learn N distinct samples (x(i), t(i)) with zero error, and the weights connecting the input neurons and the hidden neurons can be chosen "almost" arbitrarily, However, these results have been obtained for the case when the activation function for the hidden neurons is the signum function, This paper rigorously proves that standard single-hidden layer feedforward networks (SLFN's) with at most N hidden neurons and with any bounded nonlinear activation function which has a limit at one infinity can learn N distinct samples (x(i), t(i)) with zero error, The previous method of arbitrarily choosing weights is not feasible for any SLFN, The proof of our result is constructive and thus gives a method to directly find the weights of the standard SLFN's with any such bounded nonlinear activation function as opposed to iterative training algorithms in the literature.
It has been recently shown (e.g., Hornik, Stinchcombe & White, 1989, 1990) that sufficiently complex multilayer feedforward networks are capable of representing arbitrarily accurate approximations to arbitrary map...
详细信息
It has been recently shown (e.g., Hornik, Stinchcombe & White, 1989, 1990) that sufficiently complex multilayer feedforward networks are capable of representing arbitrarily accurate approximations to arbitrary mappings. We show here that these approximations are learnable by proving the consistency of a class of connectionist nonparametric regression estimators for arbitrary (square integrable) regression functions. The consistency property ensures that as network “experience” accumulates (as indexed by the size of the training set), the probability of network approximation error exceeding any specified level tends to zero. A key feature of the demonstration of consistency is the proper control of the growth of network complexity as a function of network experience. We give specific growth rates for network complexity compatible with consistency. We also consider automatic and semi-automatic data-driven methods for determining network complexity in applications, based on minimization of a cross-validated average squared error measure of network performance. We recommend cross-validated average squared error as a generally applicable criterion for comparing relative performance of differing network architectures and configurations.
Approximation of real functions by feedforward networks of the usual kind is shown to be based on the fundamental principle of approximation by piecewise-constant functions. This principle underlies a simple construct...
详细信息
Approximation of real functions by feedforward networks of the usual kind is shown to be based on the fundamental principle of approximation by piecewise-constant functions. This principle underlies a simple construction given for three-layer networks and suggests possible difficulties in determining two-layer networks.
We give conditions ensuring that multilayer feedforward networks with as few as a single hidden layer and an appropriately smooth hidden layer activation function are capable of arbitrarily accurate approximation to a...
详细信息
We give conditions ensuring that multilayer feedforward networks with as few as a single hidden layer and an appropriately smooth hidden layer activation function are capable of arbitrarily accurate approximation to an arbitrary function and its derivatives. In fact, these networks can approximate functions that are not differentiable in the classical sense, but possess only a generalized derivative, as is the case for certain piecewise differentiable functions. The conditions imposed on the hidden layer activation function are relatively mild; the conditions imposed on the domain of the function to be approximated have practical implications. Our approximation results provide a previously missing theoretical justification for the use of multilayer feedforward networks in applications requiring simultaneous approximation of a function and its derivatives.
Spike-timing-dependent plasticity (STDP) with asymmetric learning windows is commonly found in the brain and useful for a variety of spike-based computations such as input filtering and associative memory. A natural c...
详细信息
Spike-timing-dependent plasticity (STDP) with asymmetric learning windows is commonly found in the brain and useful for a variety of spike-based computations such as input filtering and associative memory. A natural consequence of STDP is establishment of causality in the sense that a neuron learns to fire with a lag after specific presynaptic neurons have fired. The effect of STDP on synchrony is elusive because spike synchrony implies unitary spike events of different neurons rather than a causal delayed relationship between neurons. We explore how synchrony can be facilitated by STDP in oscillator networks with a pacemaker. We show that STDP with asymmetric learning windows leads to self-organization of feedforward networks starting from the pacemaker. As a result, STDP drastically facilitates frequency synchrony. Even though differences in spike times are lessened as a result of synaptic plasticity, the finite time lag remains so that perfect spike synchrony is not realized. In contrast to traditional mechanisms of large-scale synchrony based on mutual interaction of coupled neurons, the route to synchrony discovered here is enslavement of downstream neurons by upstream ones. Facilitation of such feedforward synchrony does not occur for STDP with symmetric learning windows.
This paper investigates new learning algorithms (LF I and LF H) based on Lyapunov function for the training of feedforward neural networks. It is observed that such algorithms have interesting parallel with the popula...
详细信息
This paper investigates new learning algorithms (LF I and LF H) based on Lyapunov function for the training of feedforward neural networks. It is observed that such algorithms have interesting parallel with the popular backpropagation (BP) algorithm where the fixed learning rate is replaced by an adaptive learning rate computed using convergence theorem based on Lyapunov stability theory. LF 11, a modified version of LF 1, has been introduced with *** to avoid local minima. This modification also helps in improving the convergence speed in some cases. Conditions for achieving global minimum for these kind of algorithms have been studied in detail. The performances of the proposed algorithms are compared with BP algorithm and extended Kalman filtering (EKF) on three bench-mark function approximation problems: XOR, 3-bit parity, and 8-3 encoder. The comparisons are made in terms of number of learning iterations and computational time required for convergence. It is found that the proposed algorithms (LF I and H) are much faster in convergence than other two algorithms to attain same accuracy. Finally, the comparison is made on a complex two-dimensional (2-D) Gabor function and effect of adaptive learning rate for faster convergence is verified. In a nutshell, the investigations made in this paper help us better understand the learning procedure of feedforward neural networks in terms of adaptive learning rate, convergence speed, and local minima.
We present a new network topology to avoid overfitting in two-layered feedforward networks. We use two additional linear layers and principal component analysis to reduce the dimension of both inputs and internal repr...
详细信息
We present a new network topology to avoid overfitting in two-layered feedforward networks. We use two additional linear layers and principal component analysis to reduce the dimension of both inputs and internal representations and to transmit the essential information. Thereby neurons with small variance in the output are removed, which results in better generalization properties. Our network and learning rules can also be seen as a procedure to reduce the number of free parameters without using second order information of the error function. As a second strategy we derive a penalty term, which drives the network to keep the variances of the hidden layer outputs small. Experimental results show that thereby the transmitted information is limited, which reduces the noise and gives better generalization. The variances of the outputs of the hidden neurons are used again as a pruning criterion. (C) 1997 Elsevier Science Ltd.
We investigate bifurcations in feedforward coupled cell networks. feedforward structure (the absence of feedback) can be defined by a partial order on the cells. We use this property to study generic one-parameter ste...
详细信息
We investigate bifurcations in feedforward coupled cell networks. feedforward structure (the absence of feedback) can be defined by a partial order on the cells. We use this property to study generic one-parameter steady state bifurcations for such networks. Branching solutions and their asymptotics are described in terms of Taylor coefficients of the internal dynamics. They can be determined via an algorithm that only exploits the network structure. Similar to previous results on feedforward chains, we observe amplifications of the growth rates of steady state branches induced by the feedforward structure. However, contrary to these earlier results, as the interaction scenarios can be more complicated in general feedforward networks, different branching patterns and different amplifications can occur for different regions in the space of Taylor coefficients.
暂无评论