We present algorithms for high performance communication between message-passing parallel programs, and evaluate the algorithms as implemented in InterComm. InterComm is a framework to couple parallel programs in the ...
详细信息
ISBN:
(纸本)0769523129
We present algorithms for high performance communication between message-passing parallel programs, and evaluate the algorithms as implemented in InterComm. InterComm is a framework to couple parallel programs in the presence of complex data distributions within a coupled application. Multiple parallel libraries and languages may be used in the different programs of a single coupled application. The ability to couple such programs is required in many emerging application areas, such as complex simulations that model physical phenomena at multiple scales and resolutions, and image data analysis applications. We describe the new algorithms we have developed for computing inter-program communication patterns. We present experimental results showing the performance of various algorithmic tradeoffs, and also compare performance against an earlier system.
Ant Colony Optimization algorithms are intrinsically distributed algorithms where independent agents are in charge of building solutions collaboratively. Stigmergy or indirect communication is the way in which each ag...
详细信息
ISBN:
(纸本)0769523129
Ant Colony Optimization algorithms are intrinsically distributed algorithms where independent agents are in charge of building solutions collaboratively. Stigmergy or indirect communication is the way in which each agent learns from the experience of the whole colony. In this sense, explicit communication models of ACO can be defined directly giving birth to parallel algorithms of high numerical and real time efficiency. We do so in this work, and apply the resulting algorithms to the Minimum Tardy Task Problem (MTTP), a scheduling problem that has been faced with other meta-heuristics in the past. The aim of this article is to report experimental results on the behavior of three types of parallel ACO algorithms on large instances of the mentioned problems with the goal of improving existing solutions significantly.
parallel and distributed heterogeneous computing systems may operate in an environment that undergoes unpredictable changes causing certain system performance features to degrade. Such systems need robustness to guara...
详细信息
ISBN:
(纸本)0769523129
parallel and distributed heterogeneous computing systems may operate in an environment that undergoes unpredictable changes causing certain system performance features to degrade. Such systems need robustness to guarantee limited degradation despite fluctuations in the behavior of its component parts or environment. Our previous work in this area presented a method for generating a measure of robustness for a given system. However, the focus of that approach was on a scenario where all perturbations were of the same kind, e.g., all perturbations were in message sizes or computation times, but not both message sizes and computation times. This paper gives an extended discussion of the case where perturbations could be of different kinds, and presents some new insights.
作者:
Banino, Cyril
Sem Sælandsvei 7-9 NO-7491 Trondheim Norway
The problem of scheduling independent tasks on heterogeneous trees is considered. The nodes of the tree may have different processing times, and links different communication times. The single-port, full overlap model...
详细信息
ISBN:
(纸本)0769523129
The problem of scheduling independent tasks on heterogeneous trees is considered. The nodes of the tree may have different processing times, and links different communication times. The single-port, full overlap model is used for modeling the activities of the nodes. A distributed method for determining the maximum steady-state throughput of a tree is presented. Then, we show how each node can build up its own local schedule independently of the rest of the platform. In addition, the final schedule is asynchronous and event-driven, meaning that each node (except the root) acts without any time-related information. A local scheduling strategy which aims at minimizing the amount of tasks buffered at node locations during steady-state is introduced. As a consequence, the lengths of the start-up and winddown phases are considerably reduced.
The MEAD system that we are developing employs a synergistic combination of a reactive and a proactive fault-tolerance approach in order to address unanticipated events and hazards in real-time, fault-tolerant distrib...
详细信息
ISBN:
(纸本)0769523129
The MEAD system that we are developing employs a synergistic combination of a reactive and a proactive fault-tolerance approach in order to address unanticipated events and hazards in real-time, fault-tolerant distributed systems. The reactive fault-tolerance approach involves active monitoring of the system to adapt the provided QoS and to allocate resources based on current conditions in the system. The proactive approach involves monitoring both the distributed applications and the network to seek pre-cursors to imminent failures, and then to trigger fault-recovery mechanisms in advance of the occurrence of the failure. The underlying ideas of the MEAD system have demonstrated initial promise through our enhanced capabilities to handle failures and unanticipated events, and to reduce jitter under faulty conditions.
This paper describes the design and implementation of an address generator for stream-based computation. The unit can generate addresses by a 1,2 or 3-dimensional mapping from a linear data string in memory. A process...
详细信息
In this paper, a parallel algorithm for computing the roots of a given polynomial of degree n on a ring of processors is proposed. The algorithm implements Durand-Kerner's method and consists of two phases: initia...
详细信息
ISBN:
(纸本)0769523129
In this paper, a parallel algorithm for computing the roots of a given polynomial of degree n on a ring of processors is proposed. The algorithm implements Durand-Kerner's method and consists of two phases: initialization, and iteration. In the initialization phase all the necessary preparation steps are realized to start the parallel computation. It includes register initialization and initial approximation of roots requiring 3n-2 communications, 2 exponentiation, one multiplications, 6 divisions, and 4n-3 additions. In the iteration phase, these initial approximated roots are corrected repeatedly and converge to their accurate values. The iteration phase is composed of some iteration steps, each consisting of 3n communications, 4n+3 additions, 3n+1 multiplications, and one division.
A large number of bioinformatics analysis tools available today are processor intensive. Keeping in mind that the amount of biological data to be analyzed is growing steadily, and these tools are not easily deployable...
详细信息
The scheduler is a key component in determining the overall performance of a parallel computer, and as we show here, the schedulers in wide use today exhibit large unexplained gaps in performance during their operatio...
详细信息
ISBN:
(纸本)0769523129
The scheduler is a key component in determining the overall performance of a parallel computer, and as we show here, the schedulers in wide use today exhibit large unexplained gaps in performance during their operation. Also, different scheduling algorithms often vary in the gaps they show, suggesting that choosing the correct scheduler for each time frame can improve overall performance. We present two adaptive algorithms that achieve this: One chooses by recent past performance, and the other by the recent average degree of parallelism, which is shown to be correlated to algorithmic superiority. Simulation results for the algorithms on production workloads are analyzed, and illustrate unique features of the chaotic temporal structure of parallel workloads. We provide best parameter configurations for each algorithm, which both achieve average improvements of 10% in performance and 35% in stability for the tested workloads.
The multiple genome sequence alignment problem falls in the domain of problems that can be parallelized to address large sequence lengths. Although there is communication required for the computation of the aligned se...
详细信息
ISBN:
(纸本)0769523129
The multiple genome sequence alignment problem falls in the domain of problems that can be parallelized to address large sequence lengths. Although there is communication required for the computation of the aligned sequences, the proper distribution can reduce the overall problem to a set of tasks to be solved independently and then merged. A parallel algorithm for the alignment of multiple genome sequences is described. The algorithm is experimentally evaluated in a distributed Grid environment that provides very scalable and low cost computation performance. The Grid environment is evaluated with respect to a traditional cluster environment and results are compared to evaluate the effectiveness of a Grid environment for large computational biology.
暂无评论