Important classes of algorithms which can benefit from the advantages of C-to-VHDL compiling are window operations. These execute a number of instructions on a large amount of array data. Since arrays are usually tran...
详细信息
ISBN:
(纸本)9780769531106
Important classes of algorithms which can benefit from the advantages of C-to-VHDL compiling are window operations. These execute a number of instructions on a large amount of array data. Since arrays are usually translated into FPGA block memory structures, it is important to minimize the required number of block memory accesses. Recently, a smart buffer has been introduced, in which a number of past and present array elements can be temporarily stored to be reused over a number of different loop nest iterations. In this paper, the smart buffer approach is analysed for use in the stream-oriented Impulse-C compiler. Experimental automatic generation of VHDL code for this buffer is described. The smart buffer is then linked with the VHDL code generated by the Impulse-C compiler to obtain data efficient designs.
parallel computing is notoriously challenging due to the difficulty in developing correct and efficient programs. With the arrival of multi-core processors for desktop systems, desktop applications must now be paralle...
详细信息
parallel computing is notoriously challenging due to the difficulty in developing correct and efficient programs. With the arrival of multi-core processors for desktop systems, desktop applications must now be parallelised. However achieving task parallelism for such object-oriented programs has traditionally been, and still remains, difficult. This paper presents a powerful task concept for parallel object-oriented programming and presents the results from a source-to-source compiler and runtime system. With the addition of a single keyword, the sequential code does not require restructuring and asynchronous task management is performed on behalf of the programmer; the parallel code required to realise task parallelism looks very much like the sequential counterpart. An intuitive solution is provided to handle task dependencies as well as integrating different task concepts into one model.
Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-proce...
详细信息
Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to the lack of suitable scheme to manage partitioning and dependencies. A scheme for parallel implementation of the dynamic programming multiple sequence alignment is presented, based on a peer to peer design and a multidimensional array indexing method. This design results in up to 5-fold improvement compared to a previously described master/slave design, and scales favourably with the number of processors used. This study demonstrates an approach for parallelising multi-dimensional dynamic programming and similar algorithms utilizing multi-processor architectures.
Nowadays, important efforts in the research of the Transition P Systems have been focused on the simulation/implementation of the massively parallel character ofthe model. The distributed implementation of P Systems h...
详细信息
Nowadays, important efforts in the research of the Transition P Systems have been focused on the simulation/implementation of the massively parallel character ofthe model. The distributed implementation of P Systems hasmet with the communications bottleneck problem. In thissense, there are several research works on analysis for distributed architectures that are technology independent and avoiding communication collisions. Recent works presentanalysis on the suitability for implementing theses architectures using an universal membrane hardware/software component based on microcontrollers. These implementations are directly programming on hardware components and the resulting data is analyzed by means of a specific program which translates the execution bit code into intelligible data for researchers. The aim of this work is to present an operating environment to automate manual tasks performed during the loading, execution and interpretation of P System into distributed architectures based on microcontrollers as well as the interpretation from inner binary data representation inside of microcontrollers.
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programmingalgorithms. The purpose of this paper is to study a family of dyn...
详细信息
ISBN:
(纸本)9781595936677
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programmingalgorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynamic programming is typically called nonserial polyadic dynamic programming. Owing to the: non-uniform data dependence;it is harder to optimize this problem for parallelism and locality on parallelarchitectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with the technique. We formulate and analytically solve the optimization problem determing the the size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.
Dynamic programming is a widely applied algorithm design technique in many areas such as computational biology and scientific computing. Typical applications using this technique are compute-intensive and suffer from ...
详细信息
ISBN:
(纸本)9783540768364
Dynamic programming is a widely applied algorithm design technique in many areas such as computational biology and scientific computing. Typical applications using this technique are compute-intensive and suffer from long runtimes on sequential architectures. Therefore, several parallelalgorithms for both fine-grained and coarse-grained architectures have been introduced. However, the commonly used data partitioning scheme can not be efficiently applied to irregular dynamic programmingalgorithms, i.e. dynamic programmingalgorithms with an uneven load density pattern. In this paper we present a tunable parallel Bulk Synchronous parallel (BSP) algorithm for such kind of applications. This new algorithm can balance the workload among processors using a tunable block-cyclic data partitioning method and thus is capable of getting almost linear performance gains. We present a theoretical analysis and experimentally show that it leads to significant runtime savings for pairwise sequence alignment with general gap penalties using BSPonMPI on a PC cluster.
Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requi...
详细信息
Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requi...
详细信息
Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requires the exploitation of concurrency at the algorithmic level. In this paper, we identify key issues in algorithm design for multicore processors and propose a computational model for these systems. We introduce SWARM (software and algorithms for running on multi-core), a portable open-source parallel library of basic primitives that fully exploit multicore processors. Using this framework, we have implemented efficient parallelalgorithms for important primitive operations such as prefix-sums, pointer-jumping, symmetry breaking, and list ranking; for combinatorial problems such as sorting and selection; for parallel graph theoretic algorithms such as spanning tree, minimum spanning tree, graph decomposition, and tree contraction; and for computational genomics applications such as maximum parsimony. The main contributions of this paper are the design of the SWARM multicore framework, the presentation of a multicore algorithmic model, and validation results for this model. SWARM is freely available as open-source from http://***/.
This paper describes the development and testing of control of the OmniTread OT-4 robot by the seventh generation (7G) control system. Control of OT-4 was developed in the Yobotics 3D simulator by an iterative process...
详细信息
This paper describes the development and testing of control of the OmniTread OT-4 robot by the seventh generation (7G) control system. Control of OT-4 was developed in the Yobotics 3D simulator by an iterative process combining genetic algorithm, learning and analytic programming techniques. The control system developed in simulation was tested by controlling the real OT-4 robot in the laboratory. The performance of the real OT-4 robot under 7G control on stairs, parallel bars, a slalom course, and stairs with obstacles corresponded well to the simulated performance on which development of the control system was based.
This paper presents a new approach for the execution of coarse-grain (tiled) parallel SPMD code for applications derived from the explicit discretization of 1-dimensional PDE problems with finite-differencing schemes....
详细信息
This paper presents a new approach for the execution of coarse-grain (tiled) parallel SPMD code for applications derived from the explicit discretization of 1-dimensional PDE problems with finite-differencing schemes. Tiling transformation is an efficient loop transformation to achieve coarse-grain parallelism in such algorithms, while rectangular tile shapes are the only feasible shapes that can be manually applied by program developers. However, rectangular tiling transformations are not always valid due to data dependencies, and thus requiring the application of an appropriate skewing transformation prior to tiling in order to enable rectangular tile shapes. We employ cyclic mapping of tiles to processes and propose a method to determine an efficient rectangular tiling transformation for a fixed number of processes for 2-dimensional, skewed PDE problems. Our experimental results confirm the merit of coarse-grain execution in this family of applications and indicate that the proposed method leads to the selection of highly efficient tiling transformations.
暂无评论