Graphics processing units (GPUs) are powerful computational devices tailored towards the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia Corporation released a new generation ...
详细信息
ISBN:
(纸本)9780769536422
Graphics processing units (GPUs) are powerful computational devices tailored towards the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia Corporation released a new generation of GPUs designed for general-purpose computing in 2006, and it released a GPU programming language called CUDA in 2007. The DNA microarray technology is a high throughput tool for assaying mRNA abundance in cell samples. In. data analysis, scientists often apply hierarchical clustering of the genes, where a fundamental operation is to calculate all pairwise distances. If there are n genes, it takes O(n(2)) time. In this work, GPUs and the CUDA language are used to calculate pairwise distances. For Manhattan distance, GPU/CUDA achieves a 40 to 90 times speed-up compared to the central processing unit implementation;for Pearson correlation coefficient, the speed-up is 28 to 38 times.
computations based on graphs are very common problems but complexity, increasing size of analyzed graphs and a huge amount of communication make this analysis a challenging task. In this paper, we present a comparison...
详细信息
ISBN:
(纸本)9783319780542;9783319780535
computations based on graphs are very common problems but complexity, increasing size of analyzed graphs and a huge amount of communication make this analysis a challenging task. In this paper, we present a comparison of two parallel BFS (Breath- First Search) implementations: MapReduce run on Hadoop infrastructure and in PGAS (Partitioned Global Address Space) model. The latter implementation has been developed with the help of the PCJ (parallelcomputations in Java) - a library for parallel and distributed computations in Java. Both implementations realize the level synchronous strategy - Hadoop algorithm assumes iterative MapReduce jobs, whereas PCJ uses explicit synchronization after each level. The scalability of both solutions is similar. However, the PCJ implementation is much faster (about 100 times) than the MapReduce Hadoop solution.
Motivated by a peer-to-peer estimation algorithm in which adaptive weights are optimized to minimize the estimation error variance, we formulate and solve a novel non-convex Lipschitz optimization problem that guarant...
详细信息
ISBN:
(纸本)9781424451081
Motivated by a peer-to-peer estimation algorithm in which adaptive weights are optimized to minimize the estimation error variance, we formulate and solve a novel non-convex Lipschitz optimization problem that guarantees global stability of a large class of peer-to-peer consensus-based algorithms for wireless sensor network. Because of packet. losses, the solution of this optimization problem cannot be achieved efficiently with either traditional centralized methods or distributed Lagrangian message passing. The prove that the optimal solution can be obtained by solving a set of nonlinear equations. A fast distributed algorithm, which requires only local computations, is presented for solving these equations. Analysis and computer simulations illustrate the algorithm and its application to various network topologies.
This paper introduces a new PARAFAC algorithm for a class of third-order tensors. Particularly, the proposed algorithm is based on subspace estimation and solving a non-symmetrical joint diagonalization problem. To de...
详细信息
We propose a new and low per-iteration complexity first-order primal-dual optimization framework for a convex optimization template with broad applications. Our analysis relies on a novel combination of three classic ...
详细信息
We propose a new and low per-iteration complexity first-order primal-dual optimization framework for a convex optimization template with broad applications. Our analysis relies on a novel combination of three classic ideas applied to the primal-dual gap function: smoothing, acceleration, and homotopy. The algorithms due to the new approach achieve the best-known convergence rate results, in particular when the template consists of only nonsmooth functions. We also outline a restart strategy for the acceleration to significantly enhance the practical performance. We demonstrate relations with the augmented Lagrangian method and show how to exploit the strongly convex objectives with rigorous convergence rate guarantees. We provide representative examples to illustrate that the new methods can outperform the state of the art, including Chambolle Pock, and the alternating direction method-of-multipliers algorithms. We also compare our algorithms with the well-known Nesterov smoothing method.
We present a simulation system which meets the requirements for practical application of inverse modeling in a professional environment. A tool interface for the integration of arbitrary simulation tools at the user l...
详细信息
We present a simulation system which meets the requirements for practical application of inverse modeling in a professional environment. A tool interface for the integration of arbitrary simulation tools at the user level is introduced and a methodology for the formation of simulation networks is described. A Levenberg-Marquardt optimizer automates the inverse modeling procedure. Strategics for the efficient execution of simulation tools are discussed. An example demonstrates the extraction of doping profile information on the basis of electrical measurements.
This paper, the third one in a three-paper sequence, presents the result of TOSSIM simulation of a Hopfield neural network as a static optimizer and configured to solve the maximum independent set (MIS) problem using ...
详细信息
This paper, the third one in a three-paper sequence, presents the result of TOSSIM simulation of a Hopfield neural network as a static optimizer and configured to solve the maximum independent set (MIS) problem using a wireless sensor network as a fully parallel and distributed computing hardware platform. TinyOS with its default protocol stack along with nesC were used to develop the simulation model. Simulations were realized for mote counts of 10, 50, 100, and 182; messaging complexity, memory and simulation time costs were measured. Results indicated, as the most prominent finding, that the neural optimization algorithm was able to compute solutions to the MIS problem. The memory footprint of the TOSSIM process in Windows XP environment was about 20 MB for the range of sensor networks considered. The messaging complexity as measured by the total number of messages transmitted and the simulation time increased rather quickly indicating a need to optimize and tune certain aspects of the simulation environment if wireless sensor networks with higher mote counts need to be simulated.
暂无评论