检索结果-内蒙古大学图书馆

Heterogeneous by design: An environment for exploiting heterogeneity 7

Heterogeneous by design: An environment for exploiting heter...

1993 Workshop on Heterogeneous Processing, VVHP 1993, Held in conjunction with the 7th International Parallel Processing Symposium

作者： LaRowe, Richard P. Probert, Thomas H. Center for High Performance Computing Worcester Polytechnic Institute United States

ISBN: (纸本)0818635312

Heterogeneity in computing environments is becoming increasingly common. Some consider this a problem, while others (including ourselves) prefer to think of it as a benefit. By exploiting the different features and capabilities of computer nodes in a heterogeneous computing network, higher levels of performance can be attained than is possible using any single type of computer found in the network. In this paper, we present the preliminary design of a heterogeneous computing environment being developed at CHPC. The environment includes a high-speed interconnection architecture capable of supporting shared memory, as well as a new programming environment for developing heterogeneous applications that exploit the available hardware. © 1993 IEEE.

关键词： Computer programming

来源：评论

学校读者我要写书评

暂无评论

Hardware assist for distributed shared memory

Hardware assist for distributed shared memory

引用

International Conference on Distributed computing Systems

作者： A.W. Wilson R.P. LaRowe M.J. Teller Center for High Performance Computing Worcester Polytechnic Institute USA

The use of software implemented distributed shared memory (SDSM) to provide shared memory programming environments on networks of workstations and message-passing parallel computers has become quite popular. However, the memory reference patterns of many shared memory programs lead to poor performance on such systems. The authors propose hardware assist to improve the performance of SDSM systems faced with problematic reference patterns. An example of such a system is described. Operating system software in Mach is used to provide internode sharing in the example system, but is assisted through hardware support for maintaining update-based coherence of replicated pages. Simulations driven by hardware-collected parallel reference traces are used to provide an indication of the expected performance of the system.< >

关键词： Hardware Costs Protocols high performance computing Operating systems Coherence Parallel programming Workstations Computer networks Concurrent computing

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous by Design: An Environment for Exploiting Heterogeneity

Heterogeneous by Design: An Environment for Exploiting Heter...

引用

Workshop on Heterogeneous Processing

作者： R.P. LaRowe T.H. Probert Center for High Performance Computing Worcester Polytechnic Institute USA

来源：评论

学校读者我要写书评

暂无评论

GPMB—software pipelining branch-intensive loops 26

GPMB—software pipelining branch-intensive loops

引用

IEEE/ACM International Symposium on Microarchitecture (MICRO)

作者： Zhihong Tang Gang Chen Chihong Zhang Yingwei Zhang Bogong Su Stanley Habib Center for Reliable and High-Performance Computing University of Illinois Urbana IL USA Department of Electrical Engineering Huazhong University of Science and Technology Wuhan Hubei China Department of Electrical Engineering University of Calgary Calgary AB Canada

ISBN: (纸本)9780818652806

Compile-time code transformations which expose instruction-level parallelism (ILP) typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase ILP along some execution sequences if the constraints from alternative execution sequences can be ignored. Traditionally, profile information has been used to identify important execution sequences for aggressive compiler optimization and scheduling. The paper presents a set of static program analysis heuristics used in the IMPACT compiler to identify execution sequences for aggressive optimization. The authors show that the static program analysis heuristics identify execution sequences without hazardous conditions that tend to prohibit compiler optimizations. As a result, the static program analysis approach often achieves optimization results comparable to profile information in spite of its inferior branch prediction accuracies. This observation makes a strong case for using static program analysis with or without profile information to facilitate aggressive compiler optimization and scheduling.< >

关键词： Processor scheduling Program processors Optimizing compilers Information analysis Accuracy VLIW Runtime Computer aided instruction Concurrent computing Parallel processing

来源：评论

学校读者我要写书评

暂无评论

Partial widths of feshbach funnel resonances in the Na(3p) · H₂ exciplex

引用

International Journal of Quantum Chemistry 1993年第27 S期48卷 621-632页

作者： Mielke, Steven L. Tawa, Gregory J. Truhlar, Donald G. Schwenke, David W. Department of Chemistry Supercomputer Institute United States NASA Ames Research Center California 94035-1000 Mail Stop 230-3 Moffett Field United States Army High Performance Computing Research Center University of Minnesota Minneapolis Minnesota 55455-0431 United States

We have located five zero‐angular‐momentum resonance states in the funnel associated with the lowest conical intersection of the Na (3p) · H2 exciplex, and we have characterized the four narrowest of these in t...

We have located five zero‐angular‐momentum resonance states in the funnel associated with the lowest conical intersection of the Na (3p) · H₂ exciplex, and we have characterized the four narrowest of these in terms of total and partial widths. The resonant contributions of the metastable states to state‐to‐state energy transfer greatly exceed the background contributions. © 1993 John Wiley & Sons, Inc. Copyright © 1993 John Wiley & Sons, Inc.

关键词：

来源：评论

学校读者我要写书评

暂无评论

HIDING SHARED MEMORY REFERENCE LATENCY ON THE GALACTICA NET DISTRIBUTED SHARED MEMORY ARCHITECTURE

引用

JOURNAL OF PARALLEL AND DISTRIBUTED computing 1992年第4期15卷 351-367页

作者： WILSON, AW LAROWE, RP Center for High Performance Computing Worcester Polytechnic Institute Marlborough Massachusetts 01752 USA

In order to provide shared memory in large-scale multiprocessors, techniques to hide the latency of shared memory accesses must be developed. In this paper, we describe the latency hiding mechanisms employed by the Galactica Net scalable distributed shared memory architecture being developed at the center for high performance computing. We introduce our novel technique for maintaining the coherence of shared data caches, based on a flexible hardware-supported but software-controlled mechanism supporting both update and invalidate based protocols. We also consider the use of alternative memory consistency models, and find that the use of weaker consistency models is an effective way to hide memory reference latency in the Galactica Net architecture. Preliminary performance evaluations indicate that together these mechanisms are able to hide a significant amount of the memory reference latency, thus increasing the scalability of the architecture.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An analysis of dynamic page placement on a NUMA multiprocessor 92

An analysis of dynamic page placement on a NUMA multiprocess...

引用

1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS/performance 1992

作者： Larowe, Richard P. Holliday, Mark A. Ellis, Carla Schlatter Center for High Performance Computing Worcester Polytechnic Institute MarlboroMA01752 United States Department of Computer Science Duke University DurhamNC27706 United States

ISBN: (纸本)0897915070

The class of NUMA (nongniform Memory Access time) shared memory architectures is becoming increasingly important with the desire for larger scale multiprocessors. Insuchmachines, the placement and movement of code and data are crucial to performance. The operating system can play a role in managing placement through the policies and mechanisms of the virtual memory subsystem. In this paper, we develop an analytic model of memory system performance of a Local/ Remote NUMA architecture based on approximate mean-value analysis techniques. The model assumes that a simple workload model based on a few parameters can often provide insight into the general behavior of real applications. The model is validated against experimental data obtained with the DUnX operating system kernel for the BBN GP1OOO while running a synthetic workload. The results of this validation show that ill general, model predictions are quite good, though in some cases the model fails to include the effect of unexpected behaviors in the implementation. Experiments investigate the effectiveness of dynamic multiple-copy page placement. We investigate the cost of incorrect policy decisions by introducing different percentages of. © ACM 1992.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Determining update latency bounds in Galactica Net 1

Determining update latency bounds in Galactica Net

引用

1st International Symposium on high-performance Distributed computing, HPDC 1992

作者： Clayton, S. Duckworth, R.J. Michalson, W. Wilson, A. Department of Electrical and Computer Engineering Worcester Polytechnic Institute WorcesterMA01609 United States Center for High Performance Computing Worcester Polytechnic Institute WorcesterMA01609 United States

This paper looks at the problem of ensuring performance of real-time applications hosted on Galactica Net∗, a mesh-based distributed cache coherent shared memory multiprocessing system. A method for determining strict upper bounds on worst case latencies in wormhole routed meshes is presented, and it is shown that the update latency of Galactica Net is deterministic. A tool for determining strict upper bounds for shared memory update latencies has been developed so that differing real-time process placements in a Galactica Net system may be compared to minimize update latency bounds. © 1992 IEEE.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

SIMULATION OF PARTICLE MIXING BY TURBULENT CONVECTIVE FLOWS ON THE CONNECTION MACHINE

SIMULATION OF PARTICLE MIXING BY TURBULENT CONVECTIVE FLOWS ...

引用

SUPERcomputing 92 CONF

作者： MALEVSKY, AV YUEN, DA JORDAN, KE Army High Performance Computing and Research Center Minnesota Supercomputer Institute University of Minnesota 1200 Washington Ave. S. Minneapolis 55415 MN United States Thinking Machines Corporation 245 First Street Cambridge 02142-1264 MA United States

ISBN: (纸本)0818626305

Mixing of particles by chaotic flow fields was simulated on the Connection Machine. We assigned each cell to the processor and kept the coordinates of particles residing on the cell in the local memory of the processor. This approach implies the exchange between the local memories, when a particle moves from one cell to another. Approximately 10⁵ particles were injected into a time-dependent flow field obtained by solving the nonlinear system of PDEs, describing turbulent thermal convection. The flow field was calculated on CRAY and data were transferred to CM-200 through high-speed HIPPI channel. © 1992 IEEE.

关键词： Flow fields

来源：评论

学校读者我要写书评

暂无评论

Comparison of steady-state and strongly chaotic thermal convection at high Rayleigh number

引用

Physical Review A 1992年第8期46卷 4742-4742页

作者： U. Hansen D. A. Yuen A. V. Malevsky Minnesota Supercomputer Institute Army High Performance Computing Research Center and Department of Geology and Geophysics University of Minnesota Minneapolis Minnesota 55415

Steady-state and time-dependent two-dimensional thermal convection in a Boussinesq, infinite-Prandtl-number fluid with stress-free boundaries has been investigated. Two independent numerical methods have been employed to calculate the evolution of convective flows in a rectangular box with aspect ratio λ=1.8 in a Rayleigh-number (Ra) range of 106109. With increasing Ra, greater than 107, the flow reveals the presence of disconnected thermals, rather than connected plumes, driven by a persistent large-scale circulation. Such features have also been reported from laboratory convection experiments in the regime of hard turbulence. Extensive calculations were performed (up to 140 overturns) in order to reach the statistically stationary regime for strongly chaotic flows. A Gaussian distribution with a mean value Nut was derived from the time history of the Nusselt (Nu) numbers. The value of Nut can be directly obtained by solving the steady-state equations via an iteration procedure. Thus the stationary flow obtained from the steady-state method resembles the turbulent flow in a statistical sense. Since the iteration procedure is about 104 times faster than calculating the full time-dependent evolution, it allows for the systematic investigation of the heat-transfer Nu-Ra relationship and other types of scaling laws. The steady-state and time-dependent experiments indicate that a power-law exponent of β=0.315 holds for the Nu-Ra relation for stress-free boundaries in the entire range of Ra. No indication of a jump in the exponent was found in the transition to hard turbulence.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：