An important class of parallelprocessing jobs on clusters today are workflow-based applications that process large amounts of data in parallel. Traditional cluster performance tools are designed for tightly coupled p...
详细信息
UPC is a parallel programming language based on the concept of partitioned shared memory. There are now several UPC compilers available and several different parallel architectures that support one or more of these co...
详细信息
ISBN:
(纸本)0769523129
UPC is a parallel programming language based on the concept of partitioned shared memory. There are now several UPC compilers available and several different parallel architectures that support one or more of these compilers. This paper is the first to compare the performance of most of the currently available UPC implementations on several commonly used parallel platforms. These compilers are the GASNet UPC compiler from UC Berkeley, the v1.1 MuPC compiler from Michigan Tech, the HewletPackard v2.2 compiler, and the Intrepid UPC compiler. The parallel architectures used in this study are a 16-node x86 Myrinet cluster, a 32-processor AlphaServer SC-40, and a 48-processor Cray T3E. A STREAM-like microbenchmark was developed to measure fine- and course-grained shared memory accesses. Also measured are five NPB kernels using existing UPC implementations. These measurements and associated observations provide a snapshot of the relative performance of current UPC platforms.
In this paper we study the development of parallel algorithms to solve advection-diffusion equations. Both synchronous and asynchronous algorithms contexts are considered. The solver we present is based on the multisp...
详细信息
ISBN:
(纸本)0769523129
In this paper we study the development of parallel algorithms to solve advection-diffusion equations. Both synchronous and asynchronous algorithms contexts are considered. The solver we present is based on the multisplitting Newton method that provides a coarse-grained scheme. Experiments are carried out in an heterogeneous grid environment in which both parallel algorithms are analyzed. Experiments allow us to draw some conclusions about the use of parallel iterative algorithms in a grid computing environment.
A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. The system has been successfully deployed on over 200 com...
详细信息
ISBN:
(纸本)0769523129
A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. The system has been successfully deployed on over 200 computers, which were distributed over a number of locations, and has been successfully used to process bioinformatics, biomedical engineering, and cryptography applications. We present two bioinformatics applications, DSEARCH, which performs sensitive database and DPRml which performs distributed phytogeny reconstruction by maximum likelihood.
Large-scale scientific computing applications frequently make use of closely-coupled distributedparallel components. The performance of such scientific applications is therefore dependent on the component parts and t...
详细信息
ISBN:
(纸本)0769523129
Large-scale scientific computing applications frequently make use of closely-coupled distributedparallel components. The performance of such scientific applications is therefore dependent on the component parts and their interaction at run-time. This paper describes a methodology for predictive performance modelling of parallel applications composed of multiple interacting components. In this paper, the fundamental steps and required operations involved in the modelling process are identified - including inter-component dataflow analysis, MxN communication performance evaluation and composite performance model evaluation. A case study is presented to illustrate the modelling process and the methodology is verified through experimental analysis.
Consider a workload in which massively parallel tasks that require large resource pools are interleaved with short tasks that require fast response but consume fewer resources. We aim at achieving high throughput and ...
详细信息
ISBN:
(纸本)1424403073
Consider a workload in which massively parallel tasks that require large resource pools are interleaved with short tasks that require fast response but consume fewer resources. We aim at achieving high throughput and short response time when scheduling such a workload over a set of uncoordinated grids of varying sizes and performance characteristics. We propose the concept of a grid execution hierarchy, where available grids are sorted according to their size, and the execution overheads increase with the size of the grids. We devise a scheduling algorithm for this execution hierarchy of grids by adapting the multilevel feedback queue approach to a multi-grid environment. The algorithm finds a grid of the size, availability, and overhead that best matches a task's resource requirements and expected turnaround time. Our approach is inspired by the Shortest processing Time First policy (SPTF), in the sense that the task's processing demands are constantly reevaluated during its run, so that a task is migrated to a more suitable level of the execution hierarchy when appropriate. We evaluate our approach in the context of the Superlink-online system for processing genetic linkage analysis tasks - a production system consisting of several grids and utilizing tens of thousands of CPU hours a month [32]. With our approach the system provides nearly interactive response time for shorter tasks, while simultaneously serving throughput-oriented massively parallel tasks in an efficient manner(1).
In this paper we study the problem of multimedia streaming and transcoding in P2P systems. We propose a multimedia streaming architecture in which transcoding services coordinate to transform the streaming data into d...
详细信息
ISBN:
(纸本)0769523129
In this paper we study the problem of multimedia streaming and transcoding in P2P systems. We propose a multimedia streaming architecture in which transcoding services coordinate to transform the streaming data into different formats and adapt to both the QoS requirements of the applications and to the availability of the system resources. Our techniques are entirely distributed, use only local knowledge and scale well with the size of the system. Extensive simulation results validate the performance of our approach.
In this paper we propose a framework and algorithm for dynamic resource management in a distributed real-time system. Our assumptions are as follows: first, multiple real-time & non real-time processes are active ...
详细信息
ISBN:
(纸本)0769523129
In this paper we propose a framework and algorithm for dynamic resource management in a distributed real-time system. Our assumptions are as follows: first, multiple real-time & non real-time processes are active throughout the system. Those processes in the critical path for a given task, i.e., autopilot, fire control (as in firing weapons), surveillance, collaborative planning, are RT for the duration of the task and may or may not be party to multiple tasks in either critical or ancillary capacities. For instance, the radar may be part of the critical path during surveillance, but have uses other thaan that, say to take a snampshot during a collaborative planning sessiong that may serve an ancillary use (as a supplementary illusatration for discussion, e.g., "this is the depot we will go after tomorrow during a flyover") But then, if you can fly over it, why not go after it then? Another example: during a coordinated maneuver, plane-to-plane communications are in the critical path but during fire control they are not. Second, the operating system or run-time environment has task migration capabilities. third, storage is cheap - can store images of multiple processes in different states on each computing device for purpose of instantiating one or more in any combination on that device and across devices for reconfigurable distributed computing. This paper presents a software architecture and an algorithm for resource management in such systems.
One of the first steps in starting a program on a cluster is to get the executable, which generally resides on some network file server. This creates not only contention on the network, but causes unnecessary strain o...
详细信息
ISBN:
(纸本)0769523129
One of the first steps in starting a program on a cluster is to get the executable, which generally resides on some network file server. This creates not only contention on the network, but causes unnecessary strain on the network file system as well, which is busy serving other requests at the same time. This approach is certainly not scalable as clusters grow larger. We present a new approach that uses a high speed interconnect, novel network features, and a scalable design. We provide a fast, efficient, and scalable solution to the distribution of executable files on production parallel machines.
We introduce a continuous convergence protocol for handling locally committed and possibly conflicting updates to replicated data. The protocol supports local consistency and predictability while allowing replicas to ...
详细信息
ISBN:
(纸本)0769523129
We introduce a continuous convergence protocol for handling locally committed and possibly conflicting updates to replicated data. The protocol supports local consistency and predictability while allowing replicas to deterministically diverge and converge as updates are committed and replicated. We discuss how applications may exploit the protocol characteristics and describe an implementation where conflicting updates are detected, qualified by a partial update order, and resolved using application-specific forward conflict resolution.
暂无评论