Radix sort is an efficient method to sort integer keys on parallel computers. It is easy to parallelize and simple to implement. the main drawbacks of existing algorithms are load balancing problems and communication ...
详细信息
ISBN:
(纸本)354040788X
Radix sort is an efficient method to sort integer keys on parallel computers. It is easy to parallelize and simple to implement. the main drawbacks of existing algorithms are load balancing problems and communication overhead. these problems are caused in data characteristics like data-skew and duplicates. there are several approaches how to parallelize the radix sort algorithm, which yield to reduce communication operations or to improve the load balance. If an algorithm has its focus on the optimization of the load balance then, its communication is inefficient. Otherwise, if the focus is on the communication minimization, then the algorithms are only efficient for well-distributed data. For the latter case, we will present an efficient improvement which helps to overcome the problems with unbalanced data characteristics. the suggested improvements are tested practically on a Linux-based SMP cluster.
Distributed Video-on-Demand (DVoD) systems are proposed as a solution to the limited streaming capacity and null scalability of centralized systems. In a previous work, we proposed a fully distributed large-scale VoD ...
详细信息
ISBN:
(纸本)354040788X
Distributed Video-on-Demand (DVoD) systems are proposed as a solution to the limited streaming capacity and null scalability of centralized systems. In a previous work, we proposed a fully distributed large-scale VoD architecture, called Double P-Tree, which has shown itself to be a good approach to the design of flexible and scalable DVoD systems. In this paper, we present relevant design aspects related to video mapping and traffic balancing in order to improve Double P-Tree architecture performance. Our simulation results demonstrate that these techniques yield a more efficient system and considerably increase its streaming capacity. the results also show the crucial importance of topology connectivity in improving multicasting performance in DVoD systems. Finally, a comparison among several DVoD architectures was performed using simulation, and the results show that the Double P-Tree architecture incorporating mapping and load balancing policies outperforms similar DVoD architectures.
this paper considers the Heterogeneous Earliest Finish Time (HEFT) algorithm for scheduling the tasks of an application, represented by a directed acyclic graph, onto a bounded number of heterogeneous machines. We foc...
详细信息
ISBN:
(纸本)354040788X
this paper considers the Heterogeneous Earliest Finish Time (HEFT) algorithm for scheduling the tasks of an application, represented by a directed acyclic graph, onto a bounded number of heterogeneous machines. We focus on the appropriate selection of the weight for the nodes and edges of the graph, and experiment with a number of different schemes for computing these weights. Our findings indicate that the length of the schedule produced may be affected significantly by the scheme used, and suggest that the mean value based approach used by HEFT may not be a particularly good choice.
this paper describes two different approaches to optimize the performance of SoC architectures in the architecture exploration phase. Both solve the problem to map and schedule a task graph on a target architecture un...
详细信息
ISBN:
(纸本)0769518680
this paper describes two different approaches to optimize the performance of SoC architectures in the architecture exploration phase. Both solve the problem to map and schedule a task graph on a target architecture under special consideration of on-chip communications. A constructive algorithm is presented that extends previous work by taking into account potential data transfers in the future. the second approach is a recursive procedure that is based on local search techniques in a specially defined neighborhood of the critical path. Simulated annealing and tabu search are used as search algorithms. Both approaches find solutions with better performance than established methodologies. the recursive technique leads to superior results than the constructive approach, however is limited to small and mid-sized problems, whereas the constructive algorithm is not limited by this issue.
the proceedings contain 13 papers. the special focus in this conference is on Job Scheduling Strategies for parallelprocessing. the topics include: Scheduling in HPC resource management systems;a system for structure...
ISBN:
(纸本)9783540397274
the proceedings contain 13 papers. the special focus in this conference is on Job Scheduling Strategies for parallelprocessing. the topics include: Scheduling in HPC resource management systems;a system for structured dag scheduling;simple Linux utility for resource management;an approach to easily assemble grids with equitable resource sharing;scheduling of parallel jobs in a heterogeneous multi-site environment;a measurement-based simulation study of processor co-allocation in multicluster systems;performance estimation for scheduling on shared networks;parallel job scheduling under dynamic workloads;backfilling with lookahead to optimize the performance of parallel job scheduling and a QOS based scheme for parallel job scheduling.
An implementation of a parallel ScaLAPACK-style solver for the general Sylvester equation, op(A)X - Xop(B) = C, where op(A) denotes A or its transpose AT, is presented. the parallel algorithm is based on explicit bloc...
详细信息
ISBN:
(纸本)354040788X
An implementation of a parallel ScaLAPACK-style solver for the general Sylvester equation, op(A)X - Xop(B) = C, where op(A) denotes A or its transpose AT, is presented. the parallel algorithm is based on explicit blocking of the Bartels-Stewart method. An initial transformation of the coefficient matrices A and B to Schur form leads to a reduced triangular matrix equation. We use different matrix traversing strategies to handle the transposes in the problem to solve, leading to different new parallel wave-front algorithms. We also present a strategy to handle the problem when 2 x 2 diagonal blocks of the matrices in Schur form, corresponding to complex conjugate pairs of eigenvalues, are split between several blocks in the block partitioned matrices. Finally, the solution of the reduced matrix equation is transformed back to the originally coordinate system. the implementation acts in a ScaLAPACK environment using 2-dimensional block cyclic mapping of the matrices onto a rectangular grid of processes. Real performance results are presented which verify that our parallelalgorithms are reliable and scalable.
A program system for manipulating and translating of multimedia program skeletons ("films") into parallel programs is considered. the main goal of this system is to make easier to create parallel programs fo...
详细信息
ISBN:
(纸本)354040788X
A program system for manipulating and translating of multimedia program skeletons ("films") into parallel programs is considered. the main goal of this system is to make easier to create parallel programs for various parallel computing systems.
暂无评论