We are proposing a novel framework that ameliorates locality-aware parallel programming models, by defining hierarchical data locality model extension. We also propose a hierarchical thread partitioning algorithm. thi...
详细信息
ISBN:
(纸本)9781479961238
We are proposing a novel framework that ameliorates locality-aware parallel programming models, by defining hierarchical data locality model extension. We also propose a hierarchical thread partitioning algorithm. this algorithm synthesizes hierarchical thread placement layouts that targets minimizing the program's overall communication costs. We demonstrated the effectiveness of our approach using NAS parallel Benchmarks implemented in Unified parallel C (UPC) language using a modified Berkeley UPC Compiler and runtime system. We demonstrated an up to 85% improvement in performance by applying the placement layout suggested by our algorithm.
the clusters of SMP using fast networks, such as the Myricom's Myrinet, have emerged as important platforms for high performance computing. Although their peak advertised performance is very high, their real perfo...
详细信息
ISBN:
(纸本)354043786X
the clusters of SMP using fast networks, such as the Myricom's Myrinet, have emerged as important platforms for high performance computing. Although their peak advertised performance is very high, their real performance may be much lower than the peak advertised performance for many applications. To achieve high performance, we need to take advantages of both SMP and cluster architectures. Based on the HPM model for parallelcomputing, the performance of clusters of SMP systems is analyzed, and principles to optimize parallel algorithms (both from the paxallelism and locality point of view) are proposed. the influence of memory hierarchies on the performance is highly emphasized. Some practical examples on commercial clusters of SMPs systems Dawning D2000-2 and D3000 are also given.
In this paper, we propose a blocking algorithm for parallel one-dimensional fast Fourier transform (FFT) on shared-memory parallel computers. Our proposed FFT algorithm is based on the six-step FFT algorithm. the bloc...
详细信息
ISBN:
(纸本)354043786X
In this paper, we propose a blocking algorithm for parallel one-dimensional fast Fourier transform (FFT) on shared-memory parallel computers. Our proposed FFT algorithm is based on the six-step FFT algorithm. the block six-step FFT algorithm improves performance by effectively utilizing the cache memory. Performance results of one-dimensional FFTs on the SGI Onyx 3400 and Sun Enterprise 6000 are reported. We successfully achieved performance of about 1929 MFLOPS on the SGI Onyx 3400 (MIPS R12000 400 MHz, 16 CPUs) and about 520 MFLOPS on the Sun Enterprise 6000 (UltraSPARC 168 MHz, 16 CPUs).
Distributed computing systems are widely used in mission-critical real-time applications like missile defense systems, aircraft control and sonar applications. Designing a low cost distributed computing system which s...
详细信息
ISBN:
(纸本)0818676140
Distributed computing systems are widely used in mission-critical real-time applications like missile defense systems, aircraft control and sonar applications. Designing a low cost distributed computing system which satisfies all the stringent requirements of a given application is a difficult problem. this problem can be alleviated using Computer-Aided Synthesis (CAS) tools. Due to the large number of design alternatives, the CAS tools are compute intensive and can take a considerably long time even for medium sized real-time applications. In this paper, we describe a set of parallel synthesis algorithms which dynamically adapt to the number of available processors in a parallel computer system to substantially reduce the total turn-around time of the synthesis process.
In this paper we present an experimental comparison of several numerical tools for computingthe numerical rank of dense matrices. the study includes the well-known SVD, the URV decomposition, and several rank-reveali...
In this paper we present an experimental comparison of several numerical tools for computingthe numerical rank of dense matrices. the study includes the well-known SVD, the URV decomposition, and several rank-revealing QR factorizations: the QR factorization with column pivoting, and two QR factorizations with restricted column pivoting. Two different parallel programming methodologies are analyzed in our paper. First, we present block-partitioned algorithms for the URV decomposition and rank-revealing QR factorizations which provide efficient implementations on shared memory environments. Furthermore, we also present parallel distributed algorithms, based on the message-passing paradigm, for computing rank-revealing QR factorizations on multicomputers. (C) 1998 Elsevier Science Inc. All rights reserved.
this paper studied knowledge discovery in high dimensional data and temporal data based on parallelcomputing and GIS technology, and provided the parallel spatial-temporal knowledge discovery model which including mo...
详细信息
Today, withthe rapid development of information technology in all areas of life, a large amount of information is created every second. In recent years, many network topologies have been proposed to find the most eff...
详细信息
ISBN:
(纸本)9781538629963
Today, withthe rapid development of information technology in all areas of life, a large amount of information is created every second. In recent years, many network topologies have been proposed to find the most effective communication method. the folded hypercube is one of those networks, which is a variant of the hypercube, which is one of the most popular topologies for interconnection networks. In this paper, we propose a link-fault-tolerant routing algorithm in folded hypercube based on directed routing probabilities. the probabilities represent the routing ability of a vertex for an arbitrary vertex at a specific distance. Each vertex delivers message to one of its neighbor vertices by considering directed routing probabilities.
Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. We propose a technique that can be used...
详细信息
ISBN:
(纸本)9789897581823
Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. We propose a technique that can be used in the parallel execution of this operator when data is naturally partitioned. the proposed method benefits the cases where the required partitioning is not the natural partitioning employed. Preliminary evaluation shows that we are able to limit data transfer among parallel workers to 14% of the registered transfer when using a naive approach.
Learning Automata (LA) and Genetic Algorithms (GA) have been used for a long time to solve problems in different domains. However, there is criticism that LA has slow rate of convergence and both LA and GA have the pr...
详细信息
ISBN:
(纸本)9781467327435;9781467327428
Learning Automata (LA) and Genetic Algorithms (GA) have been used for a long time to solve problems in different domains. However, there is criticism that LA has slow rate of convergence and both LA and GA have the problem of getting stuck in local optima. In this paper we tried to solve the multi-objective problems using LA in batch mode to make the learning faster and more accurate. We used Decentralized pursuit learning automaton as LA and NSGA2 as GA. Problems where evaluation of fitness function is a bottleneck like SWAT, evaluation of individuals in parallel can give considerable speed-up. In the multi-objective LA, different weight pairs and individual designs can be evaluated independently. So we created their parallel versions to make them practically faster in learning and computations and extended the parallelization concept withthe batch mode learning.
A new parallel and multiscale computational strategy for the analyze of heterogeneous structures has been proposed recently. this strategy includes automatic homogenization in space and time, with no periodicity condi...
详细信息
暂无评论