Achieving good scalability for large simulations based on structured adaptive mesh refinement is non-trivial. Performance is limited by the partitioner's ability to efficiently use the underlying parallel computer...
详细信息
ISBN:
(纸本)0769523129
Achieving good scalability for large simulations based on structured adaptive mesh refinement is non-trivial. Performance is limited by the partitioner's ability to efficiently use the underlying parallel computer's resources. Domainbased partitioners serve as a foundation for techniques designed to improve the scalability and they have traditionally been designed on the basis of an independence assumption regarding the computational flow among grid patches at different refinement levels. But this assumption does not hold in practice. Hence the effectiveness of these techniques is significantly impaired. this paper introduces a partitioning method designed on the true premises. the method is tested for four different applications exhibiting different behaviors. the results show that synchronization costs on average can he reduced by 75 percent. the conclusion is that the method is suitable as a foundation in general hierarchical methods designed to improve the scalability of structured adaptive mesh refinement applications.
One of the attractive features of Grid computing is that resources in geographically distant places can be mobilized to meet computational needs as they arise. A particularly challenging issue is that of executing a s...
详细信息
ASC (Associative Computing Model) and MASC (Multiple Associative Computing Model) have long been studied in the Department of Computer Science at Kent State University While the previous studies provide the background...
详细信息
ISBN:
(纸本)0769523129
ASC (Associative Computing Model) and MASC (Multiple Associative Computing Model) have long been studied in the Department of Computer Science at Kent State University While the previous studies provide the background and the basic definition of the model, the description of the interactions between the instruction streams (ISs) is very brief, high level, and incomplete. One change here is that we specify the interaction between ISs and consider that all of the ISs operate on the same clock in order to support predictable worst case computation times, while earlier the ISs were assumed to interact in a MIMD type fashion. this paper provides a detailed explanation as to how these interactions can be supported in the case where only a few ISs are supported.
Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes is granted. Queue waiting times are notoriously hard to predict, making...
详细信息
ISBN:
(纸本)1424403073
Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes is granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled resources the one that will produce the shortest turnaround time. As a result, an increasing number of users resort to "redundant requests": several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job;once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we investigate whether redundant requests are harmful in terms of (i) schedule performance and fairness, (ii) system load, and (iii) system predictability. We find that two main issues with redundant requests are load on the middleware and unfairness towards users who do not use redundant requests, which both depend on the number of users who use redundant requests and on the amount of request redundancy these users employ.
We introduce a set of techniques to both measure and optimize memory access locality of Java applications running on cc-NUMA servers. these techniques work at the object level and use information gathered from embedde...
详细信息
ISBN:
(纸本)0769523129
We introduce a set of techniques to both measure and optimize memory access locality of Java applications running on cc-NUMA servers. these techniques work at the object level and use information gathered from embedded hardware performance monitors. We propose a new NUMAaware Java heap layout. In addition, we propose using dynamic object migration during garbage collection to move objects local to the processors accessing them most. Our optimization technique reduced the number of non-local memory accesses in Java workloads generated from actual runs of the SPECjbb2000 benchmark by up to 41%, and also resulted in 40% reduction in workload execution time.
Most distributed Garbage Collection (DGC) algorithms are not complete as they fail to reclaim distributed cycles of garbage. those that achieve such a level of completeness are very costly as they require either some ...
详细信息
ISBN:
(纸本)0769523129
Most distributed Garbage Collection (DGC) algorithms are not complete as they fail to reclaim distributed cycles of garbage. those that achieve such a level of completeness are very costly as they require either some kind of synchronization or consensus between processes. Others use mechanisms such as backtracking, global counters, a central server, distributed tracing phases, and/or impose additional load and restrictions on local garbage collection. All these approaches hinder scalability and/or performance significantly. We propose a solution to this problem, i.e., we describe a DGC algorithm capable of reclaiming distributed cycles of garbage asynchronously and efficiently. Our algorithm does not require any particular coordination between processes and it tolerates message loss. We have implemented the algorithm both on Rotor (a free source version of ***) and on OBIWAN (a platform supporting mobile agents, object replication and remote invocation);we observed that applications are not disrupted.
this paper proposes a novel common subgraph extraction algorithm which aims to minimize the total number of gates (reconfiguration area overhead) involved in implementing compute-intensive scientific and media applica...
详细信息
ISBN:
(纸本)0769523129
this paper proposes a novel common subgraph extraction algorithm which aims to minimize the total number of gates (reconfiguration area overhead) involved in implementing compute-intensive scientific and media applications using reconfigurable architectures. Motivation behind the proposed research is illustrated using an example from Biochemical Algorithms Library (BALL). the design of novel context adaptable architectures to implement common subgraphs is also proposed with an example from the video warping functions of the MPEG-4 standard. three different models of mapping such architectures onto hybrid/ pure FPGA systems are proposed. Estimates obtained by applying these techniques and architectures for various media and scientific functions are shown.
the Next Generation Grid applications will demand Grid middleware for a flexible negotiation mechanism supporting various ways of Quality-of-Service (QoS) guarantees. In this context, a QoS guarantee covers simultaneo...
详细信息
ISBN:
(纸本)0769523129
the Next Generation Grid applications will demand Grid middleware for a flexible negotiation mechanism supporting various ways of Quality-of-Service (QoS) guarantees. In this context, a QoS guarantee covers simultaneous allocations of various kinds of different resources, such as processor runtime, storage capacity, or network bandwidth, which are specified in the form of Service Level Agreements (SLA). Currently, a gap exists between the capabilities of Grid middleware and the underlying resource management systems concerning their support for QoS and SLA negotiation. In this paper we present an approach which closes this gap. Introducing the architecture of the Virtual Resource Manager, we highlight its main QoS management features like run-time responsibility, co-allocation, and fault tolerance.
In this paper, we describe a prototype software framework that implements a formalized methodology for partitioning computational intensive applications between reconfigurable hardware blocks of different granularity....
详细信息
ISBN:
(纸本)0769523129
In this paper, we describe a prototype software framework that implements a formalized methodology for partitioning computational intensive applications between reconfigurable hardware blocks of different granularity. A hybrid granularity reconfigurable generic architecture is considered for this methodology, so as the methodology is applicable to a large variety of hybrid reconfigurable architectures. Although, the proposed framework is parametrical in respect to the mapping procedures to the fine- and coarse-grain reconfigurable units, we provide mapping algorithms for these types of hardware. the experimental results show the effectiveness of the functionality partitioning framework. We have validated the framework using two real-world applications, an OFDM transmitter and a JPEG encoder. For the OFDM transmitter, a maximum clock cycles decrease of 82% relative to an all fine-grain mapping solution is achieved. the performance improvement for the JPEG is 44%.
the next generation of mobile systems with multimedia processing capabilities and wireless connectivity will be increasingly deployed in highly dynamic and distributed environments for multimedia playback and delivery...
详细信息
ISBN:
(纸本)0769523129
the next generation of mobile systems with multimedia processing capabilities and wireless connectivity will be increasingly deployed in highly dynamic and distributed environments for multimedia playback and delivery (e.g. video streaming, multimedia conferencing). the challenge is to meet the heavy resource demands of multimedia applications under the stringent energy, computational, and bandwidth constraints of mobile systems, while constantly adapting to the global state changes of the distributed environment. In this paper, we present our initiatives under the FORGE framework to address the issue of delivering high quality multimedia content in mobile environments. In order to cope withthe resource intensive nature of multimedia applications and dynamically changing global state (e.g. node mobility, network congestion), an end-to-end approach to QoS aware power optimization is required. We present a framework for coordinating energy optimizing strategies across various layers of system implementation and functionality and discuss techniques that can be employed to achieve energy gains for mobile multimedia systems.
暂无评论