Functions that invoke operations on multiple objects atomically are a useful extension of object-based parallel languages, such as Orca. This paper introduces atomic functions and shows how compile-time information ca...
详细信息
Functions that invoke operations on multiple objects atomically are a useful extension of object-based parallel languages, such as Orca. This paper introduces atomic functions and shows how compile-time information can drive run-time optimizations of such functions.
Traditional parallel programming forces the programmer, apart from designing the application, to analyse the performance of this recently built application. This difficult task of testing the behaviour of the program ...
详细信息
Traditional parallel programming forces the programmer, apart from designing the application, to analyse the performance of this recently built application. This difficult task of testing the behaviour of the program can be avoided with the use of an automatic performance analysis tool. Users are released from having to understand the enormous amount of performance information obtained from the execution of a program. The automatic analysis bases its work on the use of a predefined list of logical rules of production of performance problems. These rules form the "knowledge base" of the tool. When the tool analyses an application, it looks for the occurrence of an element in the list of performance problems recorded in the "knowledge base". When one of the problems is found (a "match" in the list), the tool analyses the cause of the performance problem and builds a recommendation to the user to direct the possible modifications the code of the application.
The Caravela platform has been designed to develop a parallel and distributed stream-based computing paradigm, namely supported on the pipeline processing approach herein designated by meta-pipeline. This paper is foc...
详细信息
The Caravela platform has been designed to develop a parallel and distributed stream-based computing paradigm, namely supported on the pipeline processing approach herein designated by meta-pipeline. This paper is focused on the design and implementation of a modeling tool for the meta-pipeline, namely to tackle the deadlock problem due to uninitialized input data stream in a pipeline-model. A new efficient algorithm is proposed to prevent deadlock situations by detecting uninitialized edges in a pipeline graph. The algorithm identifies the cyclic paths in a pipeline-graph and builds a reduced list with only the true cyclic paths that have to be really initialized. Further optimization techniques are also proposed to reduce the computation time and the required amount of memory. Moreover, this paper also presents a Graphical User Interface (GUI) for easy programming meta-pipeline applications, which provides an automatic validation procedure based on the proposed algorithm. Experimental results presented in this paper show the effectiveness of both the proposed algorithm and the developed GUI.
In this paper, we developed a parallel programming of airborne radar simulation using gcc ver3.0. Changing the variables of Radar (azimuth & elevation beam width, rage resolution, duty cycle, size of the range bin...
详细信息
ISBN:
(纸本)9781849196031
In this paper, we developed a parallel programming of airborne radar simulation using gcc ver3.0. Changing the variables of Radar (azimuth & elevation beam width, rage resolution, duty cycle, size of the range bins) various results have been computed. Results showed that the execution time can be reduced to 30% of its original calculation time.
Grid computing has great potential but to enter the mainstream it must be simplified. Tools and libraries must make it easier to solve problems by being simpler and at the same time more sophisticated. We describe how...
详细信息
Grid computing has great potential but to enter the mainstream it must be simplified. Tools and libraries must make it easier to solve problems by being simpler and at the same time more sophisticated. We describe how grid computing can be achieved through spreadsheets. No parallel programming or complex tools need to be used. So long as dependencies allow it, formulae in a spreadsheet can be evaluated concurrently on the grid. Thus, grid computing becomes accessible to all those who can use a spreadsheet. The story is completed with a sophisticated backend system, NetSolve, which can solve complex linear algebra systems with minimal intervention from the user. We present the architecture of the system for performing such simple yet sophisticated grid computing and a case study which performs a large singular value decomposition.
Game designers commonly use paper prototyping to evaluate educational effectiveness, enjoyment, flow, and usability, while also reducing costs and exploring alternative implementations. However, creating a paper proto...
详细信息
ISBN:
(数字)9798350350678
ISBN:
(纸本)9798350350685
Game designers commonly use paper prototyping to evaluate educational effectiveness, enjoyment, flow, and usability, while also reducing costs and exploring alternative implementations. However, creating a paper prototype that yields actionable feedback can be challenging due to the wide range of methods available, from low-fidelity sketches to high-fidelity mockups. Most paper prototypes are static and progress discretely, making it difficult to prototype physics-based games effectively, unless focusing on interfaces, narrative, or underlying systems. This paper details the creation of three prototypes of varying fidelity and metaphor for a physics-based educational game on concurrency and parallel programming. Each prototype undergoes playtesting to assess construction methods and their effectiveness in gathering player feedback.
In this paper, we present efficient methods for multidimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block calculation (BBC) and a complete-dimen...
详细信息
ISBN:
(纸本)9780818685910
In this paper, we present efficient methods for multidimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block calculation (BBC) and a complete-dimension calculation (CDC) techniques. We have developed a theoretical model to analyze the computation costs of these two techniques. The theoretical model shows that the BBC method has smaller indexing costs and performs well for the redistribution with small array size. The CDC method has smaller packing/unpacking costs and performs well when the array size is large. We also have implemented these two techniques along with the PITFALLS method and the Prylli's method on an IBM SP2 parallel machine. The experimental results show that the BBC method has the smallest execution time of these four algorithms when the array size is small. The CDC method has the smallest execution time of these four algorithms when the array size is large. Furthermore, the BBC method outperforms the PITFALLS method and the Prylli's method for all test samples.
This paper presents a concurrent visual programming language based on Petri nets. Most concurrent visual programming languages address concurrency by extending a nonconcurrent paradigm and representation with addition...
详细信息
This paper presents a concurrent visual programming language based on Petri nets. Most concurrent visual programming languages address concurrency by extending a nonconcurrent paradigm and representation with additional control and synchronisation mechanisms and notation. It is argued here that clearer and more concise concurrent program representations are possible if the concurrency is inherent in the paradigm. The language described demonstrates that Petri nets provide such a paradigm.
One of the fundamental goals of parallel computing is to develop a framework that will support portable and efficient application programs. The Bulk-Synchronous parallel (BSP) model was proposed to help achieve this g...
详细信息
One of the fundamental goals of parallel computing is to develop a framework that will support portable and efficient application programs. The Bulk-Synchronous parallel (BSP) model was proposed to help achieve this goal. The BSP model is intended to be a "unifying model"-it addresses both software and hardware issues by allowing theoretical analysis to coexist with practical physical implementations. For several years the BSP model has been supported mainly by theoretical results. Recent experiments, however, have begun to demonstrate the practicality of the model for real architectures running real applications. The goal of this paper is to describe the methodology used to construct an efficient BSP library on the BBN Butterfly GP1000. Our results are relevant for BSP library implementations on shared-memory systems in general and for NUMA (nonuniform m-memory-access) machines in particular.
Although deadlock is not completely avoidable in distributed and parallel programming, we here describe theory and practice of a system that allows us to limit deadlock to situations in which there are true circular d...
详细信息
Although deadlock is not completely avoidable in distributed and parallel programming, we here describe theory and practice of a system that allows us to limit deadlock to situations in which there are true circular data dependences or failure of processes that compute data needed at other processes. This allows us to guarantee absence of deadlock in SPMD computations absent process failure. Our system guarantees optimal ordering of communication statements. We gratefully acknowledge the support of the US National Science Foundation under Award CISE EIA 9810708 without which this work would not have been possible.
暂无评论