UPC is a parallel programming language based on the concept of partitioned shared memory. There are now several UPC compilers available and several different parallel architectures that support one or more of these co...
详细信息
ISBN:
(纸本)0769523129
UPC is a parallel programming language based on the concept of partitioned shared memory. There are now several UPC compilers available and several different parallel architectures that support one or more of these compilers. This paper is the first to compare the performance of most of the currently available UPC implementations on several commonly used parallel platforms. These compilers are the GASNet UPC compiler from UC Berkeley, the v1.1 MuPC compiler from Michigan Tech, the HewletPackard v2.2 compiler, and the Intrepid UPC compiler. The parallel architectures used in this study are a 16-node x86 Myrinet cluster, a 32-processor AlphaServer SC-40, and a 48-processor Cray T3E. A STREAM-like microbenchmark was developed to measure fine- and course-grained shared memory accesses. Also measured are five NPB kernels using existing UPC implementations. These measurements and associated observations provide a snapshot of the relative performance of current UPC platforms.
Wireless ad-hoc sensor networks are composed of a large number of tiny sensors that have limited resources and yet must form a connected network. A group of sensors is said to cover a certain region when the union of ...
详细信息
ISBN:
(纸本)0769525091
Wireless ad-hoc sensor networks are composed of a large number of tiny sensors that have limited resources and yet must form a connected network. A group of sensors is said to cover a certain region when the union of the sensing disks of these sensors completely cover this region. Given a query over a sensor network the minimum connected sensor cover problem is to select a minimum, or nearly minimum, set of sensors such that the selected sensors cover the query region and form a connected network. One proposed solution for the minimum connected sensor cover problem can be to first form a minimum connected dominating set (MCDS), and then to include all sensors in the MCDS in the final cover set. In this paper, we use this approach to present a fully distributed, strictly localized, scalable, self-* solution to the minimum connected sensor cover problem.
We introduce a new clustering algorithm called WINP for very large databases. Two different sizes of handling objects were used in WINP to acquire high accuracy and efficiency. WINP creates a window to detect approxim...
详细信息
When distributing data across several nodes, two different approaches exist. The first one consists in distribution of the data object itself, e.g. in striping. The second approach is aggregation of local storages, wh...
详细信息
ISBN:
(纸本)0769523129
When distributing data across several nodes, two different approaches exist. The first one consists in distribution of the data object itself, e.g. in striping. The second approach is aggregation of local storages, whereby each data object is assigned to a home storage node. From the viewpoint of fault-tolerant data layouts, these schemes seem to be similar. In both cases the addition of parity, e.g. RAID level 3, level 5 or Reed-Solomon codes provide tolerance against node failures. A closer look shows differences in reachable access rates, needed number of messages and recovery cost. In this paper we compare both approaches and provide a method for self reconfiguration. The transformation from a parity grouping layout to a striping layout is shown to be feasible for stepwise and concurrent operation during data access.
Consider a workload in which massively parallel tasks that require large resource pools are interleaved with short tasks that require fast response but consume fewer resources. We aim at achieving high throughput and ...
详细信息
ISBN:
(纸本)1424403073
Consider a workload in which massively parallel tasks that require large resource pools are interleaved with short tasks that require fast response but consume fewer resources. We aim at achieving high throughput and short response time when scheduling such a workload over a set of uncoordinated grids of varying sizes and performance characteristics. We propose the concept of a grid execution hierarchy, where available grids are sorted according to their size, and the execution overheads increase with the size of the grids. We devise a scheduling algorithm for this execution hierarchy of grids by adapting the multilevel feedback queue approach to a multi-grid environment. The algorithm finds a grid of the size, availability, and overhead that best matches a task's resource requirements and expected turnaround time. Our approach is inspired by the Shortest Processing Time first policy (SPTF), in the sense that the task's processing demands are constantly reevaluated during its run, so that a task is migrated to a more suitable level of the execution hierarchy when appropriate. We evaluate our approach in the context of the Superlink-online system for processing genetic linkage analysis tasks - a production system consisting of several grids and utilizing tens of thousands of CPU hours a month [32]. With our approach the system provides nearly interactive response time for shorter tasks, while simultaneously serving throughput-oriented massively parallel tasks in an efficient manner(1).
Common Language Infrastructure, or CLI, is a standardized virtual machine, which increasingly becomes popular on a wide range of platforms. In this paper we developed three I/O-intensive benchmarks for the CLI using v...
详细信息
ISBN:
(纸本)0769523129
Common Language Infrastructure, or CLI, is a standardized virtual machine, which increasingly becomes popular on a wide range of platforms. In this paper we developed three I/O-intensive benchmarks for the CLI using various techniques. The first benchmark is designed in accordance with an application behavioural model that rebuilds the behavior of real world I/O-intensive applications. The second benchmark is a trace driven simulator that simulates five I/O-intensive applications. The third benchmark is a micro I/O-Intensive benchmark used to emulate a simple web server. In addition, the performances of the benchmarks are evaluated on the SSCLI. The results suggest that the CLI is a potential virtual machine for I/O-intensive computing.
In this paper we propose a framework and algorithm for dynamic resource management in a distributed real-time system. Our assumptions are as follows: first, multiple real-time & non real-time processes are active ...
详细信息
ISBN:
(纸本)0769523129
In this paper we propose a framework and algorithm for dynamic resource management in a distributed real-time system. Our assumptions are as follows: first, multiple real-time & non real-time processes are active throughout the system. Those processes in the critical path for a given task, i.e., autopilot, fire control (as in firing weapons), surveillance, collaborative planning, are RT for the duration of the task and may or may not be party to multiple tasks in either critical or ancillary capacities. For instance, the radar may be part of the critical path during surveillance, but have uses other thaan that, say to take a snampshot during a collaborative planning sessiong that may serve an ancillary use (as a supplementary illusatration for discussion, e.g., "this is the depot we will go after tomorrow during a flyover") But then, if you can fly over it, why not go after it then? Another example: during a coordinated maneuver, plane-to-plane communications are in the critical path but during fire control they are not. Second, the operating system or run-time environment has task migration capabilities. Third, storage is cheap - can store images of multiple processes in different states on each computing device for purpose of instantiating one or more in any combination on that device and across devices for reconfigurable distributed computing. This paper presents a software architecture and an algorithm for resource management in such systems.
More and more pieces of hardware are being connected to the Internet every day. Technologies such as Bluetooth or Wi-Fi make this evolution even faster. To make these equipments cooperate and communicate with each oth...
详细信息
ISBN:
(纸本)0769523129
More and more pieces of hardware are being connected to the Internet every day. Technologies such as Bluetooth or Wi-Fi make this evolution even faster. To make these equipments cooperate and communicate with each other several paradigms such as mobile codes, mobile agents and remote procedure calls are particularly well adapted. These paradigms enable to execute a code that is either coming from somewhere over the network, or that is local but managed remotely. Security is then one of the main concerns that has to be dealt with. We believe that smart cards, and more precisely Java Cards can help to cope with this challenge. This is a position paper where we present the first results obtained on a Java Card based platform that we have set up for experimentation purpose. These experiments raise many questions we are currently working on.
Divisible load applications consist of a load, that is input data and associated computation, that can be divided arbitrarily into independent pieces. Such applications arise in many fields and are ideally suited to a...
详细信息
ISBN:
(纸本)0769523129
Divisible load applications consist of a load, that is input data and associated computation, that can be divided arbitrarily into independent pieces. Such applications arise in many fields and are ideally suited to a master-worker execution, but they pose several scheduling challenges. While the "Divisible Load Scheduling" (DLS) problem has been studied extensively from a theoretical standpoint, in this paper we focus on practical issues: we extend a production Grid application execution environment, APST, to support divisible load applications;we implement previously proposed DLS algorithms as part of APST;we evaluate and compare these algorithms on a real-world two-cluster platform;we show in a case study how a user can easily and effectively run a real-world divisible load application;and we uncover several issues that are critical for using DLS theory in practice. To the best of our knowledge the software resulting from this work, APST-DV, is the first usable and generic tool for deploying divisible load applications on distributed computing platforms.
An architecture for a reconfigurable superscalar processor is described in which some of its execution units are implemented in reconfigurable hardware. The overall configuration of the processor is defined according ...
详细信息
ISBN:
(纸本)0769523129
An architecture for a reconfigurable superscalar processor is described in which some of its execution units are implemented in reconfigurable hardware. The overall configuration of the processor is defined according to how its reconfigurable execution units are configured. An efficient micro-architectural solution to configuration management is presented that effectively steers the current processor configuration toward a configuration that is well matched with the execution unit requirements of instructions being scheduled for execution. The approach first selects the best matched among four steering configurations based on the number and type of execution units required by the instructions. One of the steering configurations is dynamically defined as the current configuration;the other three are statically predefined. Once a steering configuration is selected, portions of it begin loading on corresponding reconfigurable execution units that are not busy. The active configuration of the processor is generally the overlap of two or more steering configurations.
暂无评论