highperformance clusters (HPCs) based on commodity hardware are becoming more and more popular in the parallel computing community. these new platforms offer a hardware capable of a very low latency and a very high t...
详细信息
highperformance clusters (HPCs) based on commodity hardware are becoming more and more popular in the parallel computing community. these new platforms offer a hardware capable of a very low latency and a very highthroughput at an unbeatable cost, making them attractive for a large variety of parallel and distributed applications. With adequate communication software, HPCs have the potential to achieve a level of performance similar to massively parallel computers. However, for parallel applications that present a high communication/computation ratio, it is still essential to provide the lowest latency in order to minimize the communication overhead. In this paper, we are investigating message aggregation techniques to improve parallel simulations of fine-grain ATM communication network models. Even if message aggregation is a well-known solution for improving the communication performance of high latency interconnection networks, the complex interaction between message aggregation and the underlying communication software is often ignored. We show that message aggregation must carefully take into account the characteristics of the communication software to be efficient on an HPC. this methodology can be applied as a preliminary step to tune a message aggregation algorithm for a given combination of hardware architecture and communication software layer.
Building dependable distributed systems using ad hoc methods is a challenging task. Without proper support, an application programmer must face the daunting requirement of having to provide fault tolerance at the appl...
详细信息
In this paper, we present the design and implementation of Dodo, an efficient user-level system for harvesting idle memory in off-the-shelf clusters of workstations. Dodo enables data-intensive applications to use rem...
详细信息
In this paper, we present the design and implementation of Dodo, an efficient user-level system for harvesting idle memory in off-the-shelf clusters of workstations. Dodo enables data-intensive applications to use remote memory in a cluster as an intermediate cache between local memory and disk. It requires no modifications to the operating system and/or processor firmware and is hence portable to multiple platforms. Further, the memory recruitment policy used by Dodo is designed to minimize any delays experienced by the owner of desktop machines whose memory is harvested by Dodo. Our implementation of Dodo is operational and currently runs on Linux 2.0.35. For communication, Dodo can use either UDP/IP or U-Net. the low-latency user-level network architecture developed by von Eicken et al. We evaluated the performance improvements that can be achieved by using Dodo for two real applications and three synthetic benchmarks. Our results show that speedups obtained for an application are highly dependent on its I/O access pattern and data set sizes. Significant speedups (between 2 and 3) were obtained for applications whose working sets are larger than the local memory on a workstation but smaller than aggregate memory available on the cluster and for applications that can benefit from the zero-seek nature of remote memory.
this paper addresses the design idea of the MorphoSys Reconfigurable processor developed by the researchers in the UC, Irvine. Withthe demand to perform the multimedia operations efficiently, it is one of the directi...
详细信息
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wi...
详细信息
this paper presents new techniques for architecture and performance driven compilation of SW programs into RW (reconfigurable HW). these new techniques effectively improve on the complex resource sharing approaches ty...
详细信息
the next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilation and CAD tools ...
详细信息
the next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that effectively meet the needs of each application. In this paper we present a novel compilation system that allows sequential programs, written in C or FORTRAN, to be compiled directly into custom silicon or reconfigurable architectures. the capability is also interesting because trends in computerarchitecture are moving towards more reconfigurable hardware-like substrates, such as FPGA based systems. Our system works by successfully combining two resource-efficient computing disciplines: Small Memories and Virtual Wires. For a given application, the compiler first analyzes the memory access patterns of pointers and arrays in the program and constructs a partitioned memory system made up of many small memories. the computation is implemented by active computing elements that are spatially distributed within the memory array. A space-time scheduler assigns instructions to the computing elements in a way that maximizes locality and minimizes physical communication distance. It also generates an efficient static schedule for the interconnect. Finally, specialized hardware for the resulting schedule of memory accesses, wires, and computation is generated as a multi-process state machine in synthesizable Verilog. Withthis system, implemented as a set of SUIF compiler passes, we have successfully compiled programs into hardware and achieve specialization performance enhancements by up to an order of magnitude versus a single general-purpose processor. We also achieve additional parallelization speedups similar to those obtainable using a tightly-interconnected multiprocessor.
the proceedings contain 35 papers. the topics discussed include: scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters;a system for evaluating performance and cost of SIMD array des...
ISBN:
(纸本)0769500870
the proceedings contain 35 papers. the topics discussed include: scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters;a system for evaluating performance and cost of SIMD array designs;design trade-offs of low-cost multicomputer network switches;the cactus computational collaboratory: enabling technologies for relativistic astrophysics, and a toolkit for solving PDEs by communities in science and engineering;the PETSc library for scientific software;distributed applet-based certifiable processing in client/server environments;large-scale distributed computational fluid dynamics on the information power grid using Globus;a framework for generating task parallel programs;HPF implementation of ARC3D;efficient VLSI layouts of hypercubic networks;adapting to load on workstation clusters;parallel simulation of two-phase flow problems using the finite element method;a data-parallel algorithm for iterative tomographic image reconstruction;parallel rendering of 3D AMR data on the SGUCray T3E;a recursive PVM implementation of an image segmentation algorithm withperformance results comparing the hive and the Cray T3E;Delphi: an integrated, language-directed performance prediction, measurement and analysis environment;poems-end to end performance models for dynamic parallel and distributed systems;MPI: the only programming model for managing memory;distributed control parallelism for multidisciplinary design of a high speed civil transport;implementing MM5 on NASA Goddard space flight center computing systems: a performance study;and material science electronic structure calculations on massively parallel systems: an algorithmic and computational challenge.
Reconfigurable computing is emerging as a viable design alternative to implement a wide range of computationally intensive applications. the scheduling problem becomes a really critical issue in achieving the high per...
详细信息
ISBN:
(纸本)076950356X
Reconfigurable computing is emerging as a viable design alternative to implement a wide range of computationally intensive applications. the scheduling problem becomes a really critical issue in achieving the highperformancethat these kind of applications demands. this paper describes the different aspects regarding the scheduling problem in a reconfigurable architecture. We also propose a general strategy in order to perform at compilation time a scheduling that includes all possible optimizations regarding context (configuration) and data transfers. In particular, we focus especially on the methodology and mechanisms to solve the context scheduling. Some experimental results are presented to validate our assumptions. Finally, the problem of data transfers is only formulated and will be addressed in future work.
this paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, fault-tolerant, high...
详细信息
this paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, fault-tolerant, highly available, and dynamic as a system internally, and in supporting fault-tolerance and dynamicity for its application programs as well. Starfish achieves these goals by combining group communication technology with checkpoint/restart, and uses a novel architecturethat is both flexible and portable and keeps group communication outside the critical data path, for maximum performance.
暂无评论