This article presents the design of a High Level parallel Composition or CPAN (according to its Spanish acronym) that implements a parallelization of the algorithmic design technique named Branch & Bound and uses ...
详细信息
ISBN:
(纸本)0769522831
This article presents the design of a High Level parallel Composition or CPAN (according to its Spanish acronym) that implements a parallelization of the algorithmic design technique named Branch & Bound and uses it to solve the Travelling Salesman Problem (TSP), within a methodological infrastructure made up of an environment of parallel Objects, an approach to Structured parallel Programming and the Object-Orientation paradigm. A CPAN is defined as the composition of a set of parallel objects of three types: one object manager, the stages and the Collector objects. By following this idea, the Branch & Bound design technique implemented as an algorithmic parallel pattern of communication among processes and based on the model of the CPAN is shown. Thus, in this work, the CPAN Branch & Bound is added as a new pattern to the library of classes already proposed in [9], which was initially constituted by the CPANs Farm, Pipe and TreeDV that represent, respectively, the patterns of communication Farm, Pipeline and Binary Tree, the latter one implementing the design technique known as Divide and Conquer. As the programming environment used to derive the proposed CPANs, we use C++ and the POSIX standard for thread programming.
The problem of the longest common subsequence (LCS) is a fundamental problem in sequence alignment. In this paper, we first present fast parallel algorithms for sequence similarity with LCS. For two sequences of lengt...
详细信息
ISBN:
(纸本)3540297707
The problem of the longest common subsequence (LCS) is a fundamental problem in sequence alignment. In this paper, we first present fast parallel algorithms for sequence similarity with LCS. For two sequences of lengths m and n (m <= n), the algorithm uses n processors and costs O(m) computation time. Time-area cost of the algorithm is O(mn) which reaches optimality. Based on this algorithm, we also give a fast parallel algorithm which can compute the length of LCS in O(logm) time. To our best knowledge, this is the fastest one among the parallel LCS algorithms on array architectures.
The development of efficient parallel algorithms for large scale wildfire simulations is a challenging research problem because the factors that determine wildfire behavior are complex. These factors make static paral...
详细信息
ISBN:
(数字)9783540321118
ISBN:
(纸本)3540260323
The development of efficient parallel algorithms for large scale wildfire simulations is a challenging research problem because the factors that determine wildfire behavior are complex. These factors make static parallel algorithms inefficient, especially when large number of processors is used because we cannot predict accurately the propagation of the fire and its computational requirements at runtime. In this paper, we propose an Autonomic Runtime Manager (ARM) to dynamically exploit the physics properties of the fire simulation and use them as the basis of our self-optimization algorithm. At each step of the wildfire simulation, the ARM decomposes the computational domain into several natural regions (e.g., burning, unburned, burned) where each region has the same temporal and special characteristics. The number of burning, unburned and burned cells determines the current state of the fire simulation and can then be used to accurately predict the computational power required for each region. By regularly monitoring and analyzing the state of the simulation, and using that to drive the runtime optimization, we can achieve significant performance gains because we can efficiently balance the computational load on each processor. Our experimental results show that the performance of the fire simulation has been improved by 45% when compared with a static portioning algorithm.
Large latencies over WAN will remain an obstacle to running communication intensive parallel applications on Grid environments. This paper takes one of such applications, Gaussian elimination of dense matrices and des...
详细信息
ISBN:
(纸本)0780394925
Large latencies over WAN will remain an obstacle to running communication intensive parallel applications on Grid environments. This paper takes one of such applications, Gaussian elimination of dense matrices and describes a parallel algorithm that is highly tolerant to latencies. The key technique is a pivoting strategy called batched pivoting;which requires much less frequent synchronizations than other methods. Although it is one of relaxed pivoting methods that may select other pivots than the 'best' ones;we show that it achieves good numerical accuracy. Through experiments with random matrices of the sizes of 64 to 49,152, botched pivoting achieves comparable numerical accuracy to that of partial pivoting. We also evaluate parallel execution speed of our implementation and show that it is much more tolerant to latencies than partial pivoting.
In this paper, we present a refinement of the BSP (Bulk Synchronous parallel) cost model, in order to allow a more exact prediction of the parallel algorithms communication cost. Our approach is based on two point: (I...
详细信息
ISBN:
(纸本)0769524869
In this paper, we present a refinement of the BSP (Bulk Synchronous parallel) cost model, in order to allow a more exact prediction of the parallel algorithms communication cost. Our approach is based on two point: (I): a deepening of the benchmarks to take into account all influential factors on the word sending cost in a communication, and (II) a more elaborate manner of prediction which carefully detects the communications course context of the algorithms to be predicted.
This paper presents an investigation into exploiting the population-based nature of Learning Classifier Systems for their use within highly-parallel systems. In particular, the use of simple accuracy-based Learning Cl...
详细信息
ISBN:
(纸本)0780393635
This paper presents an investigation into exploiting the population-based nature of Learning Classifier Systems for their use within highly-parallel systems. In particular, the use of simple accuracy-based Learning Classifier Systems within the ensemble machine approach is examined. Results indicate that inclusion of a rule migration mechanism inspired by parallel Genetic algorithms is an effective way to improve learning speed.
Large scale distributed computing infrastructure captures the use of high number of nodes, poor communication performance and continously varying resources that are not available at any time. In this paper, we focus o...
详细信息
ISBN:
(数字)9783540320715
ISBN:
(纸本)3540292357
Large scale distributed computing infrastructure captures the use of high number of nodes, poor communication performance and continously varying resources that are not available at any time. In this paper, we focus on the different tools available for mining traces of the activities of such aforementioned architecture. We propose new techniques for fast management of a frequent itemset mining parallel algorithm. The technique allow us to exhibit statistical results about the activity of more that one hundred PCs connected to the web.
This paper focuses on a parallel version of particle swarm optimization (PSO) algorithm which can significantly reduces execution time for solving complex large-scale optimization problems. This paper gives an overvie...
详细信息
ISBN:
(纸本)0780391950
This paper focuses on a parallel version of particle swarm optimization (PSO) algorithm which can significantly reduces execution time for solving complex large-scale optimization problems. This paper gives an overview of PSO algorithm, and then proposes a design and an implementation of parallel PSO. The proposed algorithm eliminates redundant synchronizations and optimizes message transfer to overlap communication with computation. The experimental results showed that 13.2 times speedup was obtained by the proposed parallel PSO algorithm with 14 processors.
In recent years, reconfigurable technology has emerged as a popular choice for implementing various types of cryptographic functions. Nevertheless, an insufficient amount effort has been placed into fully exploiting t...
详细信息
ISBN:
(纸本)0769524451
In recent years, reconfigurable technology has emerged as a popular choice for implementing various types of cryptographic functions. Nevertheless, an insufficient amount effort has been placed into fully exploiting the tremendous amounts of parallelism intrinsic to FPGAs for this class of algorithms. In this paper, we focus on block cipher architectures and explore design decisions that leverage the multi-grained parallelism inherent in many of these algorithms. We demonstrate the usefulness of this approach with a highly parallel FPGA implementation of the AES standard, and present results detailing the area/delay tradeoffs resulting from our design decisions.
Increasing the number of instructions executing in parallel has helped improve processor performance, but the technique is limited. Executing code on parallel threads and processors has fewer limitations, but most com...
详细信息
ISBN:
(纸本)0769524052
Increasing the number of instructions executing in parallel has helped improve processor performance, but the technique is limited. Executing code on parallel threads and processors has fewer limitations, but most computer programs tend to be serial in nature. This paper presents a compiler optimisation that at run-time parallelises code inside a JVM and thereby increases the number of threads. We show Spec JVM benchmark results for this optimisation. The performance on a current desktop processor is slower than without parallel threads, caused by thread creation costs, but with these costs removed the performance is better than the serial code. We measure the threading costs and discuss how a future computer architecture will enable this optimisation to be feasible in exploiting thread instead of instruction and/or vector parallelism.
暂无评论