Heterogeneity in computing environments is becoming increasingly common. Some consider this a problem, while others (including ourselves) prefer to think of it as a benefit. By exploiting the different features and ca...
详细信息
The use of software implemented distributed shared memory (SDSM) to provide shared memory programming environments on networks of workstations and message-passing parallel computers has become quite popular. However, ...
详细信息
The use of software implemented distributed shared memory (SDSM) to provide shared memory programming environments on networks of workstations and message-passing parallel computers has become quite popular. However, the memory reference patterns of many shared memory programs lead to poor performance on such systems. The authors propose hardware assist to improve the performance of SDSM systems faced with problematic reference patterns. An example of such a system is described. Operating system software in Mach is used to provide internode sharing in the example system, but is assisted through hardware support for maintaining update-based coherence of replicated pages. Simulations driven by hardware-collected parallel reference traces are used to provide an indication of the expected performance of the system.< >
Compile-time code transformations which expose instruction-level parallelism (ILP) typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportuni...
详细信息
ISBN:
(纸本)9780818652806
Compile-time code transformations which expose instruction-level parallelism (ILP) typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase ILP along some execution sequences if the constraints from alternative execution sequences can be ignored. Traditionally, profile information has been used to identify important execution sequences for aggressive compiler optimization and scheduling. The paper presents a set of static program analysis heuristics used in the IMPACT compiler to identify execution sequences for aggressive optimization. The authors show that the static program analysis heuristics identify execution sequences without hazardous conditions that tend to prohibit compiler optimizations. As a result, the static program analysis approach often achieves optimization results comparable to profile information in spite of its inferior branch prediction accuracies. This observation makes a strong case for using static program analysis with or without profile information to facilitate aggressive compiler optimization and scheduling.< >
We have located five zero‐angular‐momentum resonance states in the funnel associated with the lowest conical intersection of the Na (3p) · H2 exciplex, and we have characterized the four narrowest of these in t...
In order to provide shared memory in large-scale multiprocessors, techniques to hide the latency of shared memory accesses must be developed. In this paper, we describe the latency hiding mechanisms employed by the Ga...
In order to provide shared memory in large-scale multiprocessors, techniques to hide the latency of shared memory accesses must be developed. In this paper, we describe the latency hiding mechanisms employed by the Galactica Net scalable distributed shared memory architecture being developed at the center for highperformancecomputing. We introduce our novel technique for maintaining the coherence of shared data caches, based on a flexible hardware-supported but software-controlled mechanism supporting both update and invalidate based protocols. We also consider the use of alternative memory consistency models, and find that the use of weaker consistency models is an effective way to hide memory reference latency in the Galactica Net architecture. Preliminary performance evaluations indicate that together these mechanisms are able to hide a significant amount of the memory reference latency, thus increasing the scalability of the architecture.
The class of NUMA (nongniform Memory Access time) shared memory architectures is becoming increasingly important with the desire for larger scale multiprocessors. Insuchmachines, the placement and movement of code and...
详细信息
This paper looks at the problem of ensuring performance of real-time applications hosted on Galactica Net∗, a mesh-based distributed cache coherent shared memory multiprocessing system. A method for determining strict...
详细信息
Mixing of particles by chaotic flow fields was simulated on the Connection Machine. We assigned each cell to the processor and kept the coordinates of particles residing on the cell in the local memory of the processo...
详细信息
Steady-state and time-dependent two-dimensional thermal convection in a Boussinesq, infinite-Prandtl-number fluid with stress-free boundaries has been investigated. Two independent numerical methods have been employed...
Steady-state and time-dependent two-dimensional thermal convection in a Boussinesq, infinite-Prandtl-number fluid with stress-free boundaries has been investigated. Two independent numerical methods have been employed to calculate the evolution of convective flows in a rectangular box with aspect ratio λ=1.8 in a Rayleigh-number (Ra) range of 106109. With increasing Ra, greater than 107, the flow reveals the presence of disconnected thermals, rather than connected plumes, driven by a persistent large-scale circulation. Such features have also been reported from laboratory convection experiments in the regime of hard turbulence. Extensive calculations were performed (up to 140 overturns) in order to reach the statistically stationary regime for strongly chaotic flows. A Gaussian distribution with a mean value Nut was derived from the time history of the Nusselt (Nu) numbers. The value of Nut can be directly obtained by solving the steady-state equations via an iteration procedure. Thus the stationary flow obtained from the steady-state method resembles the turbulent flow in a statistical sense. Since the iteration procedure is about 104 times faster than calculating the full time-dependent evolution, it allows for the systematic investigation of the heat-transfer Nu-Ra relationship and other types of scaling laws. The steady-state and time-dependent experiments indicate that a power-law exponent of β=0.315 holds for the Nu-Ra relation for stress-free boundaries in the entire range of Ra. No indication of a jump in the exponent was found in the transition to hard turbulence.
暂无评论