Predictive performance models of e-Commerce applications will allow Grid workload managers to provide e-Commerce clients with qualities of service (QoS) whilst making efficient use of resources. this paper demonstrate...
详细信息
ISBN:
(纸本)0769521320
Predictive performance models of e-Commerce applications will allow Grid workload managers to provide e-Commerce clients with qualities of service (QoS) whilst making efficient use of resources. this paper demonstrates the use of two 'coarse-grained' modelling approaches (based on layered queuing modelling and historical performance data analysis) for predicting the performance of dynamic e-Commerce systems on heterogeneous servers. Results for a popular e-Commerce benchmark show how request response times and server throughputs can be predicted on servers with heterogeneous CPUs at different background loads. the two approaches are compared and their usefulness to Grid workload management is considered.
distributed computing environments, such as the Grid, promise enormous raw computational power, but involve high communication overheads. It is therefore believed that they are primarily suited for "embarrassingl...
详细信息
ISBN:
(纸本)0769521320
distributed computing environments, such as the Grid, promise enormous raw computational power, but involve high communication overheads. It is therefore believed that they are primarily suited for "embarrassingly parallel" applications, such as Monte Carlo, and for certain applications where the loosely-coupled nature of the science involved in the simulations leads to a coarse grained computation. In a typical application, this is not feasible. We discuss our solution strategy, based on scalable functional decomposition, which can be used to keep the computation coarse grained, even on a large number of processors. Such a decomposition can be attempted through a variety of means. We will discuss the use of time parallelization to achieve this. We demonstrate results with a model problem, and then discuss its implementation for an important problem in nanomaterials simulation. We also show that this technique can be extended to make it inherently fault-tolerant.
Since the introduction of the Java language less then a decade ago, there have been several attempts to create a runtime system for distributed execution of multithreaded Java applications. the goal of these attempts ...
详细信息
ISBN:
(纸本)0769521320
Since the introduction of the Java language less then a decade ago, there have been several attempts to create a runtime system for distributed execution of multithreaded Java applications. the goal of these attempts was to gain increased computational power while preserving Java's convenient parallel programming paradigm. this paper gives a detailed overview of the existing distributed runtime systems for Java and presents a new approach, implemented in a system called JavaSplit. Unlike previous works, which either forfeit Java's portability or introduce unconventional programming constructs, JavaSplit is able to execute standard multithreaded Java while preserving portability. JavaSplit works by rewriting the bytecodes of a given parallel application, transforming it into a distributed application that incorporates all the runtime logic. Each runtime node carries out its part of the resulting distributed computation using nothing but its local standard (unmodified) Java Virtual Machine (JVM).
A superscalar microprocessor with a variable number of execution units which are dynamically configured during program execution has been modeled. the runtime behaviour of an executed application is determined using a...
详细信息
ISBN:
(纸本)0769521320
A superscalar microprocessor with a variable number of execution units which are dynamically configured during program execution has been modeled. the runtime behaviour of an executed application is determined using a Trace Cache and the most suitable hardware configuration is loaded dynamically. this paper discusses major design aspects of the ongoing implementation process based on a partial reconfiguration design flow. thus, some microarchitectural components are put together to form a fixed module while different sets of execution units build up reconfigurable ones. the communication between fixed and reconfigurable modules is assured by Xilinx Bus Macros.
Dynamic structured adaptive mesh refinement (SAMR) techniques along withthe emergence of the computational Grid offer the potential for realistic scientific and engineering simulations of complex physical phenomena. ...
详细信息
ISBN:
(纸本)0769521320
Dynamic structured adaptive mesh refinement (SAMR) techniques along withthe emergence of the computational Grid offer the potential for realistic scientific and engineering simulations of complex physical phenomena. However, the inherent dynamic nature of SAMR applications coupled withthe heterogeneity and dynamism of the underlying Grid environment present significant research challenges. this paper presents proactive runtime partitioning strategies based on performance prediction functions that are experimentally formulated in terms of system parameters such as CPU load and available memory. these proactive partitioning strategies form a part of the GridARM autonomic framework which enables self-managing, self-adapting, and self-optimizing SAMR applications on the Grid. Experimental evaluation of the proactive schemes using the 3-D Richtmyer-Meshkov compressible fluid dynamics kernel for different system configurations and workloads demonstrates the improvement in overall runtime performance.
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance effects on parallel systems. the increasing use of parallel supercomputers and cluster systems to solve large-scale s...
详细信息
ISBN:
(纸本)0769521320
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance effects on parallel systems. the increasing use of parallel supercomputers and cluster systems to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. In this paper, we describe a compiler tool to automate performance prediction for execution times of parallel programs by runtime formulas in closed form. For an arbitrary parallel MPI source program the tool generates a corresponding runtime function modeling the CPU execution time and the message passing overhead. the environment is proposed to support the development process and the performance engineering activities that accompany the whole software life cycle. the performance prediction tool is shown to be effective in analyzing a representative application for varying problem sizes on several platforms using different numbers of processors.
Reducing the effect of hot spots is increasingly important to gain performance out of modem processor clusters. Traditionally, compiler techniques have been used for static analysis of hot spot patterns in parallel ap...
详细信息
ISBN:
(纸本)0769521320
Reducing the effect of hot spots is increasingly important to gain performance out of modem processor clusters. Traditionally, compiler techniques have been used for static analysis of hot spot patterns in parallelapplications. the operating system then performs the optimization to reduce the overhead of hot spots. However, hot spots cannot be avoided due to the dynamic nature of applications. We propose a new hot spot optimization scheme based on a broadcast-based optical interconnection network, the SOME-Bus, where each node has a dedicated broadcast channel to connect with other nodes without any contention. the scheme introduces additional hardware to considerably reduce the latency of hot spot request/acknowledges. Hot spots are assumed to be identifiable either though static analysis, or by a run-time profiler. Our scheme then provides a way to cache these hot spot blocks much closer to the network/channel, thereby providing a very low latency path between the input and the output queues in the network. the technique has been implemented in a SOME-Bus simulator, and verified with popular parallel algorithms like matrix-matrix multiplication. Preliminary results show that the scheme results in the reduction of completion times of applications by up to 24% over a system without channel caching.
Real-time distributedapplications complexity is steadily increasing. A well-known technique used to manage such complexity consists in decomposing the whole system in different quasi-independent distributed subsystem...
详细信息
ISBN:
(纸本)0769521320
Real-time distributedapplications complexity is steadily increasing. A well-known technique used to manage such complexity consists in decomposing the whole system in different quasi-independent distributed subsystems. Inter-subsystem communication, when necessary, is performed via gateway nodes that filter in and outgoing traffic. For real-time systems, this architecture poses additional design challenges, since it becomes necessary to consider both intra and inter-network message exchanges with real-time constraints. In the work carried out so far, the FTT communication paradigm has been provided with tools for supporting flexible real-time communication on isolated networks. this work presents a first approach to incorporate multi-segment support into the FTT protocol family. Particularly, two approaches are presented, analyzed and compared, which allow breaking end-to-end deadlines into parameters that are local to each one of the interconnected networks.
this paper builds upon our previous work in which we proposed an architecture and a general optimization framework for dynamic, distributed real-time systems. Interesting features of this model include the considerati...
详细信息
ISBN:
(纸本)0769521320
this paper builds upon our previous work in which we proposed an architecture and a general optimization framework for dynamic, distributed real-time systems. Interesting features of this model include the consideration of adaptive applications and utility functions. We extend our earlier work by formalizing the corresponding multi-criterial optimization problem. As the most difficult part of this problem, we identified the evaluation and comparison of the quality of single allocations and sets of allocations, respectively. To this end, we propose and examine metrics for measuring the goodness of solutions within our general resource management framework. these metrics lay the basis for further work on developing both online and off-line algorithms to tackle the general optimization problem and provide an efficient adaptive resource manager for dynamic, distributed real-time systems.
Many distributedapplications in the real world now require real time services in which aggregate queries need to be computed over a set of values. these applications can often tolerate varying degrees of inaccuracy i...
详细信息
ISBN:
(纸本)0769521320
Many distributedapplications in the real world now require real time services in which aggregate queries need to be computed over a set of values. these applications can often tolerate varying degrees of inaccuracy in the results. System designers, on the other hand, would like to provide services with low inaccuracy and minimal management overhead. In this paper, we focus on addressing the tradeoffs between timeliness, accuracy and cost for data aggregation in distributed environments. Specifically, we address the problem of time-sensitive computation of aggregate queries(count, sum and min) over a set of values represented by intervals with lower and upper bounds. these intervals are approximations based on most recent values about distributed sources. In order to meet the precision constraints from users, a subset of sources needs to be probed for exact values. We first propose algorithms for batch selection of the probing set, where selection is done before probing without the knowledge of the actual values. In addition, we propose an iterative selection approach where the selection of the next probing source depends on the previous returned value.
暂无评论