Common Language Infrastructure, or CLI, is a standardized virtual machine, which increasingly becomes popular on a wide range of platforms. In this paper we developed three I/O-intensive benchmarks for the CLI using v...
详细信息
ISBN:
(纸本)0769523129
Common Language Infrastructure, or CLI, is a standardized virtual machine, which increasingly becomes popular on a wide range of platforms. In this paper we developed three I/O-intensive benchmarks for the CLI using various techniques. the first benchmark is designed in accordance with an application behavioural model that rebuilds the behavior of real world I/O-intensive applications. the second benchmark is a trace driven simulator that simulates five I/O-intensive applications. the third benchmark is a micro I/O-Intensive benchmark used to emulate a simple web server. In addition, the performances of the benchmarks are evaluated on the SSCLI. the results suggest that the CLI is a potential virtual machine for I/O-intensive computing.
As Grid architectures provide execution environments that are distributed, parallel and dynamic, applications require to be not only parallel and distributed, but also able to adapt themselves to their execution envir...
详细信息
Robust systems are designed to deal with some anticipated, though probably unusual, situations such as equipment failure and overloads. they must also be designed to deal with unanticipated situations: no matter how m...
详细信息
ISBN:
(纸本)0769523129
Robust systems are designed to deal with some anticipated, though probably unusual, situations such as equipment failure and overloads. they must also be designed to deal with unanticipated situations: no matter how much careful planning is carried out, unexpected events may occur. this paper describes a system, EPL 5.0, for sensing unusual situations and responding appropriately. this paper also describes an event processing language that helps business users specify both conditions that require responses, and the appropriate responses to these conditions. Unanticipated situations are dealt with by specifying normal conditions, and defining an anomaly as a significant deviation from normality. the system also facilitates machine learning of normal and anomalous conditions from examples.
As high-performance computing increases in popularity and performance, the demand for similarly capable input and output systems rises. parallel I/O takes advantage of many data server machines to provide linearly sca...
详细信息
ISBN:
(纸本)0769523129
As high-performance computing increases in popularity and performance, the demand for similarly capable input and output systems rises. parallel I/O takes advantage of many data server machines to provide linearly scaling performance to parallelapplicationsthat access storage over the system area network. the demands placed on the network by a parallel storage system are considerably different than those imposed by message-passing algorithms or data-center operations;and, there are many popular and varied networks in use in modern parallel machines. these considerations lead us to develop a network abstraction layer for parallel I/O which is efficient and thread-safe, provides operations specifically required for I/O processing, and supports multiple networks. the Buffered Message Interface (BMI) has low processor overhead, minimal impact on latency, and can improve throughput for parallel file system workloads by as much as 40% compared to other more generic network abstractions.
Optimizing the performance of dynamic load balancing toolkits and applications requires the adjustment of several runtime parameters;however, determining sufficiently good values for these parameters through repeated ...
详细信息
ISBN:
(纸本)0769523129
Optimizing the performance of dynamic load balancing toolkits and applications requires the adjustment of several runtime parameters;however, determining sufficiently good values for these parameters through repeated experimentation can be an expensive and prohibitive process. We describe an analytic modeling method which allows developers to study and optimize adaptive application performance in the presence of dynamic load balancing. To aid tractibility, we first derive a "bi-modal" step function which simplifies and approximates task execution behavior. this allows for the creation of an analytic modeling function which captures the dynamic behavior of adaptive and asynchronous applications, enabling accurate predictions of runtime performance. We validate our technique using synthetic micro-benchmarks and a parallel mesh generation application and demonstrate that this technique, when used in conjunction withthe PREMA runtime toolkit, can offer users significant performance improvements over several well-known load balancing tools used in practice today.
Grid applications have to cope with dynamically changing computing resources as machines may crash or be claimed by other, higher-priority applications. In this paper, we propose a mechanism that enables fault-toleran...
详细信息
ISBN:
(纸本)0769523129
Grid applications have to cope with dynamically changing computing resources as machines may crash or be claimed by other, higher-priority applications. In this paper, we propose a mechanism that enables fault-tolerance, malleability (e.g. the ability to cope with a dynamically changing number of processors) and migration for divide-and-conquer applications on the Grid. the novelty of our approach is restructuring the computation tree which eliminates redundant computation and salvages partial results computed by the processors leaving the computation. this enables the applications to adapt to dynamically changing numbers of processors and to migrate the computation without loss of work. Our mechanism is easy to implement and deploy in grid environment. the overhead it incurrs is close to zero. We have implemented our mechanism in the Satin system. We have evaluated the performance of our system on the DAS-2 wide-are system and on the testbed of the European GridLab project.
We describe the design and implementation of MOCCA, a distributed CCA framework implemented using the H2O metacomputing system. Motivated by the quest for appropriate metasystem programming models for large scale scie...
详细信息
ISBN:
(纸本)0769523129
We describe the design and implementation of MOCCA, a distributed CCA framework implemented using the H2O metacomputing system. Motivated by the quest for appropriate metasystem programming models for large scale scientific applications, MOCCA combines the advantages of component orientation withthe flexible and reconfigurable H2O middleware. By exploiting unique capabilities in H2O, including client-provider separation, security, and negotiable transport protocols, enhancements to both functionality and performance could be attained. the design and implementation of MOCCA highlights the natural match between CCA components and H2O pluglets, both in structure and invocation methodology. An outline of how native CCA modules can be supported in the MOCCA framework describes the potential for future deployment of legacy codes on metacomputing systems. We also report on preliminary experiences with test applications and sample performance measurements that favorably compare MOCCA to alternative component frameworks for tightly- and loosely-coupled metacomputing systems.
A novel bitstream generation algorithm and its software implementation are introduced. Although this tool was developed for the configuration of AMDREL FPGA reconfigurable platform [13], it could be used to program an...
详细信息
ISBN:
(纸本)0769523129
A novel bitstream generation algorithm and its software implementation are introduced. Although this tool was developed for the configuration of AMDREL FPGA reconfigurable platform [13], it could be used to program any other compatible device. this tool is the only one known academic implementation for FPGA configuration with such features. Among them are the run-time-, partial- and dynamic-reconfiguration, the memory management, the bitstream compression and encryption, the read-back technique, the bitstream reallocation, the used low-power techniques as well as the Graphical User Interface.
Power consumption is a troublesome design constraint for emergent systems such as IBM's BlueGene /L. If current trends continue, future petaflop systems will require 100 megawatts of power to maintain high-perform...
详细信息
ISBN:
(纸本)0769523129
Power consumption is a troublesome design constraint for emergent systems such as IBM's BlueGene /L. If current trends continue, future petaflop systems will require 100 megawatts of power to maintain high-performance. To address this problem the power and energy characteristics of high-performance systems must be characterized. To date, power-performance profiles for distributed systems have been limited to interactive commercial workloads. However, scientific workloads are typically non-interactive (batched) processes riddled with interprocess dependences and communication. We present a framework for direct, automatic profiling of power consumption for non-interactive, parallel scientific applications on high-performance distributed systems. though our approach is general, we use our framework to study the power-performance efficiency of the NAS parallel benchmarks on a 32-node Beowulf cluster. We provide profiles by component (CPU, memory, disk, and NIC), by node (for each of 32 nodes), and by system scale (2, 4, 8, 16, and 32 nodes). Our results indicate power profiles are often regular corresponding to application characteristics and for fixed problem size increasing the number of nodes always increases energy consumption but does not always improve performance. this finding suggests smart schedulers could be used to optimize for energy while maintaining performance.
GRAPE (Graph processing Environment) is an industrial distributed computer vision system currently in use in Orbotech's Automated Optical Inspection (AOI) machines. these machines are designed for the automatic de...
详细信息
ISBN:
(纸本)0769523129
GRAPE (Graph processing Environment) is an industrial distributed computer vision system currently in use in Orbotech's Automated Optical Inspection (AOI) machines. these machines are designed for the automatic detection of defects in Flat Panel Displays (FPD), Printed Circuit Boards (PCB) and Ball Grid Arrays (BGA). the GRAPE system is designed to be easy to use for algorithm and systems engineers with little or no special training in parallel or distributed systems. Algorithms are written in standard C++ and joined together in a visual dataflow graph. the user then partitions the graph into "contexts" which are used by the system to automatically parallelize the computation. the underlying execution model of GRAPE is based on a large-grained dynamic data-flow paradigm. In contrast to traditional dataflow engines GRAPE algorithms can hold "state" over multiple executions while also making use of data parallelism. this is useful for computer vision applications, which typically need to assemble and process data collected over many execution cycles. In this paper we present an overview of the GRAPE system with its context oriented parallelism and synchronization.
暂无评论