We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for le...
详细信息
We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin halfspace, based on an algorithm of Nesterov's that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown gamma-margin halfspace over n dimensions using n . poly(1/gamma) processors and running in time (O) over tilde (1/gamma) + O(log n). In contrast, naive parallelalgorithms that learn a gamma-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter gamma. Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions.
We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative *** our main positive result, we give a parallel algorithm for learning a...
详细信息
We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative *** our main positive result, we give a parallel algorithm for learning a large-margin half-space, based on an algorithm of Nesterov's that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin halfspace over n dimensions using n ċ poly(1/γ) processors and running in time Õ(1/γ)+O(log n). In contrast, naive parallelalgorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter γ.Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions.
Self-organizing maps (SOMs) have become popular for tasks in data visualization, pattern classification or natural language processing and can be seen as one of the major concepts for artificial neural networks of tod...
详细信息
Self-organizing maps (SOMs) have become popular for tasks in data visualization, pattern classification or natural language processing and can be seen as one of the major concepts for artificial neural networks of today. Their general idea is to approximate a high dimensional and previously unknown input distribution by a lower-dimensional neural network structure with the goal to model the topology of the input space as close as possible. Classical SOMs read the input values in random but sequential order one by one and thus adjust the network structure over space: the network will be built while reading larger and larger parts of the input. In contrast to this approach, we present a SOM that processes the whole input in parallel and organizes itself over time. The main reason for parallel input processing lies in the fact that knowledge can be used to recognize parts of patterns in the input space that have already been learned. This way, networks can be developed that do not reorganize their structure from scratch every time a new set of input vectors is presented, but rather adjust their internal architecture in accordance with previous mappings. One basic application could be a modeling of the whole-part relationship through layered architectures. (c) 2004 Elsevier B.V. All rights reserved.
The self-organizing map (SOM) is a common methodology used to capture and represent data patterns and increasingly playing a significant role in the development of neural networks. The primary objective of an SOM is t...
详细信息
ISBN:
(纸本)0889865078
The self-organizing map (SOM) is a common methodology used to capture and represent data patterns and increasingly playing a significant role in the development of neural networks. The primary objective of an SOM is to determine an approximate representation of data with an unknown probability distribution, from a multi-dimensional input space, using a lower dimensional neural network. The approximation by the network corresponds to the topological structure inherent in the data distribution. The classical SOM, and many of its variations such as the growing grid, construct the network based on randomly selected pieces of the input space, where the number of pieces increases over time. We give an overview of a parallel algorithm for the SOM (ParaSOM), which alternatively examines the entire input in each step, leading to a more accurate representation of input patterns after only a fraction of iterations, albeit requiring significantly more time. Both growing grid and ParaSOM, unlike the classical SOM, do not maintain a fixed number of neurons. Instead, their networks may grow and increase in density to match the input space. We present a comparison of results generated by implementations of ParaSOM and growing grid is made, making apparent their considerable performance differences despite having the growth feature in common.
Efficient parallel learning algorithms are proposed for training a powerful modular neural network, the hierarchical mixture of experts (HME). parallelizations are based on the concept of modular parallelism, i.e. par...
详细信息
Efficient parallel learning algorithms are proposed for training a powerful modular neural network, the hierarchical mixture of experts (HME). parallelizations are based on the concept of modular parallelism, i.e. parallel execution of network modules. From modeling the speed-up as a function of the number of processors and the number of training examples, several improvements are derived, such as pipelining the training examples by packets. Compared to experimental measurements, theoretical models are accurate. For regular topologies, an analysis of the models shows that the parallelalgorithms are highly scalable when the size of the experts grows from linear units to multi-layer perceptrons (MLPs). These results are confirmed experimentally, achieving near-linear speedups for HME-MLP. Although this work can be viewed as a case study in the parallelization of HME neural networks, both algorithms and theoretical models can be expanded to different learning rules or less regular tree architectures. (C) 2002 Elsevier Science B.V. All rights reserved.
暂无评论