Reconfigurable models were shown to be very powerful in solving many problems faster than non reconfigurable models. WECPAR W(M,N,k) is an M x N reconfigurable model that has point-to-point reconfigurable interconnect...
详细信息
ISBN:
(纸本)9781479941162
Reconfigurable models were shown to be very powerful in solving many problems faster than non reconfigurable models. WECPAR W(M,N,k) is an M x N reconfigurable model that has point-to-point reconfigurable interconnection with k wires between neighboring processors. This paper studies several aspects of WECPAR. We first solve the list ranking problem on WECPAR. Some of the results obtained show that ranking one element in a list of N elements can be solved on W(N,N,N) WECPAR in O(1) time. Also, on W(N,N,k), ranking a list L(N) of N elements can be done in O((log N)( inverted right perpendicular log(k) (+1) N inverted left perpendicular )) time. To transfer a large body of algorithms to work on WECPAR and to assess its relative computational power, several simulations algorithms are introduced between WECPAR and well-known models such as PRAM and RMBM. Simulations algorithms show that a PRIORITY CRCW PRAM of N processors and S shared memory locations can be simulated by an W(S, N, k) WECPAR in O( inverted right perpendicular log(k) (+1) N inverted left perpendicular + inverted right perpendicular log S-k (+1) inverted left perpendicular ) time. Also, we show that a PRIORITY CRCW Basic-RMBM(P, B), of P processors and B buses can be simulated by an W(B, P+ B, k) WECPAR in O( inverted right perpendicular log(k) (+1) (P + B) inverted left perpendicular ) time. This has the effect of migrating a large number of algorithms to work directly on WECPAR with the simulation overhead.
This paper considers the parallel implementation of a novel variable memory quasinewton neural network training algorithm recently developed by the author. Unlike existing training methods this new technique is able t...
详细信息
This paper considers the parallel implementation of a novel variable memory quasinewton neural network training algorithm recently developed by the author. Unlike existing training methods this new technique is able to optimize performance in relation to available memory. Numerically it has equivalent properties to Full Memory BFGS optimization (FM) when there are no restrictions on memory and to FM with periodic reset when memory is limited. parallel implementations of both the Full and Variable Memory BFGS algorithms are outlined and performance results presented for a PVM target architecture.
The authors show how synchronization time, which greatly affects the time taken by a communication step, can be reduced by increasing contention. Their experience indicates that, despite improvements in interprocessor...
详细信息
The authors show how synchronization time, which greatly affects the time taken by a communication step, can be reduced by increasing contention. Their experience indicates that, despite improvements in interprocessor communication hardware, parallel algorithm designers still need to take topology into account to obtain high performance.
We present a novel parallel programming model called Cluster-M. This model facilitates the efficient design of highly parallel portable software. The two main components of this model are Cluster-M Specifications and ...
详细信息
We present a novel parallel programming model called Cluster-M. This model facilitates the efficient design of highly parallel portable software. The two main components of this model are Cluster-M Specifications and Cluster-M Representations. A Cluster-M Specification consists of a number of clustering levels emphasizing computation and communication requirements of a parallel solution to a given problem. A Cluster-M Representation on the other hand, represents a multi-layered partitioning of a system graph corresponding; to the topology of the target architecture. An algorithm for generating Cluster-M Representations is given. A set of basic constructs essential for writing Cluster-M Specifications using PCN is presented in this paper. Cluster-M Specifications are mapped onto the Representations using a proposed mapping methodology. Using Cluster-M a single software can be ported among various parallel computing systems.
Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and ...
详细信息
Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n 2 log n), O(n log n), O(n), and O(log n), and require n, n 2 , n 2 , and n 3 processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
A new parallel version of Friedman's Multivariate Adaptive Regression Splines (MARS) algorithm is discussed. By partitioning the data over the processors of a parallel computational system one achieves good parall...
详细信息
A new parallel version of Friedman's Multivariate Adaptive Regression Splines (MARS) algorithm is discussed. By partitioning the data over the processors of a parallel computational system one achieves good parallel efficiency. Instead of using truncated power basis functions of the original MARS, the new method (BMARS) utilises B-sp!ines which improves numerical stability and reduces the computational cost of the procedure. In addition, the coefficients of the basis functions of a BMARS model provide quickly accessible information about the local behaviour of the function. The algorithm has a time complexity proportional to the number of data records. The method provides a new means for the detection of areas in the space of features which are characterised by the "interesting" patterns of response values. This is applied to searching for classes of incorrect tax returns using multiple predictor variables or features. The parallel algorithm makes it feasible to investigate very large databases, such as the taxation database.
With the upgrade of current gravitational wave detectors, the first detection of gravitational wave signals is expected to occur in the next decade. Low-latency gravitational wave triggers will be necessary to make fa...
详细信息
With the upgrade of current gravitational wave detectors, the first detection of gravitational wave signals is expected to occur in the next decade. Low-latency gravitational wave triggers will be necessary to make fast follow-up electromagnetic observations of events related to their source, e.g., prompt optical emission associated with short gamma-ray bursts. In this paper we present a new time-domain low-latency algorithm for identifying the presence of gravitational waves produced by compact binary coalescence events in noisy detector data. Our method calculates the signal to noise ratio from the summation of a bank of parallel infinite impulse response filters. We show that our summed parallel infinite impulse response method can retrieve the signal to noise ratio to greater than 99% of that produced from the optimal matched filter.
Automated parametric design is becoming an important issue in engineering. Especially in analog integrated circuit design this trend is becoming more and more obvious. Parametric optimization is a key element of autom...
详细信息
Automated parametric design is becoming an important issue in engineering. Especially in analog integrated circuit design this trend is becoming more and more obvious. Parametric optimization is a key element of automated parametric design. For the reason of computational complexity of the cost function evaluation (which is usually simulation based) optimization can take a lot of time even for solving small design problems. One of the possible ways for attacking the computational complexity of the typical optmization problems is the use of parallel optimization algorithms. Two large families of such algorithms exist: synchronous and asynchronous algorithms. Due to their flexibility and efficiency asynchronous algorithms are becoming a viable option for parallel optimization. An asynchronous parallel simplex algorithm for unconstrained minimization is presented based on the convergent variant of the Nelder-Mead simplex algorithm. Several different approaches to information exchange between workers in a parallel asynchronous optimization system are discussed and evaluated on a set of testbench unconstrained optimization problems. Speedup results are presented for the best-performing information exchange algorithm. The results show a significant speedup for moderately dimensional optimization problems. The defficiencies of the algorithm are discussed and guidelines for future work are given.
This paper shows a simple algorithm for solving the single function coarses partition problem on the CRCW PRAM model of parallel computation using O(n) processors in O(log n) time with O(n1+Ε) space.
This paper shows a simple algorithm for solving the single function coarses partition problem on the CRCW PRAM model of parallel computation using O(n) processors in O(log n) time with O(n1+Ε) space.
parallel Givens sequences for solving the General Linear Model (GLM) are developed and analyzed. The block updating GLM estimation problem is also considered. The solution of the GLM employs as a main computational de...
详细信息
parallel Givens sequences for solving the General Linear Model (GLM) are developed and analyzed. The block updating GLM estimation problem is also considered. The solution of the GLM employs as a main computational device the Generalized QR Decomposition, where one of the two matrices is initially upper triangular. The proposed Givens sequences efficiently exploit the initial triangular structure of the matrix and special properties of the solution method. The complexity analysis of the sequences is based on a Exclusive Read-Exclusive Write (EREW) parallel Random Access Machine (PRAM) model with limited parallelism. Furthermore, the number of operations performed by a Givens rotation is determined by the size of the vectors used in the rotation. With these assumptions one conclusion drawn is that a sequence which applies the smallest number of compound disjoint Givens rotations to solve the GLM estimation problem does not necessarily have the lowest computational complexity. The various Givens sequences and their computational complexity analyses will be useful when addressing the solution of other similar factorization problems.
暂无评论