In the last years reconfigurable computing grew from a niche application to an important R&D scene. But also today most architectures lack essential features for the convenient use as a co-processing unit. E.g. em...
详细信息
this paper deals withthe problem of scheduling a specific precedence task graph, namely the Fork graph, under the LogP model. LogP is a computational model more sophisticated than the usual ones which was introduced ...
详细信息
ISBN:
(纸本)3540649522
this paper deals withthe problem of scheduling a specific precedence task graph, namely the Fork graph, under the LogP model. LogP is a computational model more sophisticated than the usual ones which was introduced to be closer to actual machines. We present a scheduling algorithm for this kind of graphs. Our algorithm is optimal under some assumptions especially when the messages have the same sire and when the gap is equal to the overhead.
the structural specification and modeling of time critical real-time systems has become a major area for recent research topics. this is particularly relevant for computer music when sound computation is realized invo...
详细信息
In parallel computing, performance is related both to algorithmic design choices at the application level and to the scheduling strategy. Concerning dynamic scheduling, general classifications have been proposed. they...
详细信息
ISBN:
(纸本)3540649522
In parallel computing, performance is related both to algorithmic design choices at the application level and to the scheduling strategy. Concerning dynamic scheduling, general classifications have been proposed. they outline two fundamental units, related to control and information. In this paper, we propose a generic modular specification, based not on two but on four components. they and the interactions between them are precisely described. this specification has been used to implement various scheduling algorithms in two different parallel programming environments: PM2 (Espace) and Athapascan (Apache).
作者:
Nash, JUniv Leeds
Sch Comp Studies Scalable Syst & Algorithms Grp Leeds LS2 9JT W Yorkshire England
the Bulk Synchronous parallelism (BSP) model provides a simple and elegant cost model, as a result of using supersteps to develop parallel software. this paper demonstrates how the cost model can be preserved when dev...
详细信息
ISBN:
(纸本)3540649522
the Bulk Synchronous parallelism (BSP) model provides a simple and elegant cost model, as a result of using supersteps to develop parallel software. this paper demonstrates how the cost model can be preserved when developing software for irregular problems, which typically require dynamic load balancing and introduce runtime task dependencies. the solution introduces shared data types within a superstep, which support weakened forms of shared data consistency for scalable performance. An example of a priority queue to support a solution of the travelling salesman problem is given, with predicted and observed performance results provided for 256 processors of a Gray T3D MPP.
Dynamic loop scheduling algorithms can suffer from overheads due to synchronisation, loss of locality and small iteration counts. We observe that timing information from previous executions of the loop can be utilised...
详细信息
ISBN:
(纸本)3540649522
Dynamic loop scheduling algorithms can suffer from overheads due to synchronisation, loss of locality and small iteration counts. We observe that timing information from previous executions of the loop can be utilised to reduce these overheads. We introduce two new algorithms for dynamic loop scheduling which implement this type of feedback guidance, and report experimental results on a distributed shared memory architecture. Under appropriate circumstances, these algorithms are observed to give significant performance gains over existing loop scheduling techniques.
the nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, withthe exploitation of fine-grain parallelism from standard applications. A prerequisite fo...
详细信息
this paper experiments with a methodology for mapping the 8×8 row-column Inverse Discrete Cosine Transform on general-purpose Very Long Instruction Word architectures. By exploiting the parallelism inherent in th...
详细信息
this paper experiments with a methodology for mapping the 8×8 row-column Inverse Discrete Cosine Transform on general-purpose Very Long Instruction Word architectures. By exploiting the parallelism inherent in the algorithm, the results obtained indicate that such processors, using sufficiently advanced compilers, can provide satisfactory performance at low cost without need to resort to special-purpose hardware or time-consuming hand-tuning of codes.
作者:
Semé, D.Myoupo, J.-F.LaRIA
Univ. de Picardie Jules Verne CURI 9 rue du Moulin Neuf 80000 Amiens France
the use of BSR solution of the LIS problem to solve the longest common subsequence (LCS) problem in constant time is discussed. Constant time BSR solutions for LCS problem is obtained with bounded number of selections...
详细信息
the use of BSR solution of the LIS problem to solve the longest common subsequence (LCS) problem in constant time is discussed. Constant time BSR solutions for LCS problem is obtained with bounded number of selections. To solve the LIS problem, the algorithm needs N processors, where N is the length of input sequence. the algorithm solving the LCS problem needs N*M processors, where N and M are the length of two input sequences.
Efficient algorithms for engineering problems can be achieved by a combination of various optimization techniques. this paper presents an application of such combined approach for engineering problems involving large ...
详细信息
Efficient algorithms for engineering problems can be achieved by a combination of various optimization techniques. this paper presents an application of such combined approach for engineering problems involving large sparse matrices, using the example of digital filter analysis. the processor implementation of a pipeline sparse matrix algorithm demonstrates the optimization results achieved by: efficient modeling of a sequential algorithm, algorithm parallelization, parallel architecture process mapping, high processor utilization, specific processor hardware modeling and hardware optimization. First, the Crout's sequential algorithm for sparse matrix solution is modified and optimized into the CR algorithm. the two major processes LUP and REP are identified and parallelized. An algorithm for optimal pipeline mapping and module distribution is developed to achieve balanced processor load and high efficiency. the LUP and REP process computation structures are generalized in order to enable efficient processor implementation, optimizing processor hardware and program length.
暂无评论