this work is based on our philosophy of providing interlayer system-level power awareness in computing systems. Here, we couple this approach with our vision of multi-partitioned memory systems, where memory accesses ...
详细信息
In this paper, we present a parallel algorithm for Gaussian elimination: in both a shared memory environment using OpenMP, and in a distributed memory environment using MPI. Parallel LU and Gaussian algorithms for lin...
详细信息
Grid or mesh techniques are frequently used to approximate continuous entities that behave in a wave or fluid-like fashion. Partial Differential Equations (PDEs) are usually involved in the description of such entitie...
详细信息
Grid or mesh techniques are frequently used to approximate continuous entities that behave in a wave or fluid-like fashion. Partial Differential Equations (PDEs) are usually involved in the description of such entities or processes. Distributed parallel computation was used in various computer cluster configurations to calculate PDE solutions of electrostatic field. the study of the efficacy of the selected architecture using mesh techniques was intended. the match between the algorithm and the architecture in achieving maximum computational performance was also investigated. the developed architectures, algorithms, and findings are presented in the paper.
Resource management constitutes an important infrastructural component of a computational grid environment. the aim of grid resource management is to efficiently schedule applications over the available resources prov...
Recently, graphics hardware architectures have begun to emphasize versatility, offering rich new ways to programmatically reconfigure the graphics pipeline. In this paper we explore whether current graphics architectu...
详细信息
ISBN:
(纸本)0769518591
Recently, graphics hardware architectures have begun to emphasize versatility, offering rich new ways to programmatically reconfigure the graphics pipeline. In this paper we explore whether current graphics architectures can be applied to problems where general-purpose vector processors might traditionally be used We develop a programming framework and apply it to a variety of problems, including matrix multiplication and 3-SAT Comparing the speed of our graphics card implementations to standard CPU implementations, we demonstrate startling performance improvements in many cases, as well as room for improvement in others. We analyze the bottlenecks and propose minor extensions to current graphics architectures which would improve their effectiveness for solving general-purpose problems. Based on our results and current trends in microarchitecture, we believe that efficient use of graphics hardware will become increasingly important to high-performancecomputing on commodity hardware.
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending thr...
详细信息
ISBN:
(纸本)0769515258
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. this paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. the framework works by learning and predicting violations, and applying delayed disambiguation, value prediction, and stall and release. the framework is suited for directory-based multiprocessors that track memory accesses at the system level withthe coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words - a sophisticated system that is not compatible with mainstream cache coherence protocols.
Future high-end computers will offer great performance improvements over today's machines, enabling applications of far greater complexity. However, designers must solve the challenge of exploiting massive paralle...
详细信息
the CoStore cluster architecture is proposed to construct a reliable and highly available storage system. A prototype CoStore was implemented and its performance was measured withthe cluster being mirrored in various...
详细信息
the CoStore cluster architecture is proposed to construct a reliable and highly available storage system. A prototype CoStore was implemented and its performance was measured withthe cluster being mirrored in various network environments. the preliminary results demonstrate that there is little impact on performance if the cluster is mirrored in efficient campus network environments withhigh bandwidth and low latency. As a result the CoStore architecture considerably reinforces a storage system's preparedness for disaster recovery without sacrificing performance.
暂无评论