We present a novel state management mechanism that can be used to capture the complete execution state of distributed Python applications. This mechanism can serve as the foundation for a variety of dependability stra...
详细信息
We present a novel state management mechanism that can be used to capture the complete execution state of distributed Python applications. This mechanism can serve as the foundation for a variety of dependability strategies including checkpointing, replication, and migration. Python is increasingly used for rapid prototyping parallel pro grams and, in some cases, used for high-performance application development using libraries such as NumPy. Building on Stackless Python and the River parallel and distributed programming environment, we have developed mechanisms for state capture at the language level. Our approach allows for migration and checkpointing of applications in heterogeneous environments. In addition, we allow for preemptive state capture so that programmers need not introduce explicit snapshot requests. Our mechanism can be extended to support application or domain-specific state capture. To our knowledge, this is the first general checkpointing scheme for Python. We describe our system, the implementation, and give some initial performance figures.
In the framework of GridRPC, a new function that allows direct data transfer between RPC servers is implemented for efficient execution of a Task Sequencing job in a grid environment. In Task Sequencing, RPC requires ...
详细信息
In the framework of GridRPC, a new function that allows direct data transfer between RPC servers is implemented for efficient execution of a Task Sequencing job in a grid environment. In Task Sequencing, RPC requires dependency between input and output parameters, which means output of a previous RPC becomes the input of the next RPC. In this study, the direct transfer of data is implemented using the grid filesystem without destroying the GridRPC programming model and without changing very many parts of the existing Ninf-G implementation. Our Task Sequencing API library analyzes RPC arguments to detect intermediate data after task submissions, and reports the information to GridRPC servers so that the intermediate data is created on the grid filesystem. Through our performance evaluation on LAN and on the Japan-US grid environment, it was verified that the function achieved performance improvement in distributed Task Sequencing.
In this paper we present first experiences concerning the integration of MPI-based numerical software into an advanced programming environment for building parallel and distributed high-performance applications, which...
详细信息
In this paper we present first experiences concerning the integration of MPI-based numerical software into an advanced programming environment for building parallel and distributed high-performance applications, which is under development in the context of Italian national research projects. Such a programming environment, named ASSIST, is based on a combination of the concepts of structured parallel programming and component-based programming. Some activities within the projects are devoted to the definition, implementation and testing of a methodology for the integration of a parallel numerical library into ASSIST. The goal is providing a set of efficient, accurate and reliable tools that can be easily used as building blocks for high-performance scientific applications. We focus on the integration of existing and widely used MPI-based numerical library modules. To this aim, we propose a general approach to embed MPI computations into the ASSIST basic programming unit. This approach has been tested using the MPICH implementation of MPI for networks of workstations. Some modifications have been applied to the MPICH process startup procedure, in order to make it compliant with the ASSIST environment. Results of experiments concerning the integration of routines from a well-known FFT package are discussed.
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on parallel Computing Systems (PCS) has...
详细信息
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on parallel Computing Systems (PCS) has been designed to provide an understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. An activity based learning approach was adopted for teaching the course and several parallel programming paradigms and technologies such OpenMP, MPI, and CUDA have been covered. This course was offered as a required course to graduate students. This paper describes the implementation of the course at Thiagarajar College of Engineering. Evaluation of the implementation of the course reveals that for students who have not been exposed to parallel and distributed computing, i) activity based learning results in better knowledge gain compared to the traditional approach, ii) learning OpenMP was much easier than MPI or CUDA, iii) some parallel and Distributed Computing (PDC) concepts such as false sharing were harder to grasp compared to basic concepts, and iv) it is essential to introduce parallel computing in the undergraduate curriculum.
In distributed Java environments, locality of objects and threads is crucial for the performance of parallel applications. We introduce dynamic locality optimizations in the context of JavaParty, a programming and run...
详细信息
In distributed Java environments, locality of objects and threads is crucial for the performance of parallel applications. We introduce dynamic locality optimizations in the context of JavaParty, a programming and runtime environment for parallel Java applications. Until now, an optimal distribution of the individual objects of an application has to be found manually, which has several drawbacks. Based on a former static approach, we develop a dynamic methodology for automatic locality optimizations. By measuring processing and communication times of remote method calls at runtime, a placement strategy can be computed that maps each object of the distributed system to its optimal virtual machine. Objects then are migrated between the processing nodes in order to realize this placement strategy. We evaluate our approach by comparing the performance of two benchmark applications with manually distributed versions. It is shown that our approach is particularly suitable for dynamic applications where the optimal object distribution varies at runtime.
The authors describe the motivation, design, and performance of Midway, a programming system for a distributed shared memory multicomputer (DSM) such as an ATM-based cluster, a CM-5, or a Paragon. Midway supports a no...
详细信息
The authors describe the motivation, design, and performance of Midway, a programming system for a distributed shared memory multicomputer (DSM) such as an ATM-based cluster, a CM-5, or a Paragon. Midway supports a novel memory consistency model called entry consistency (EC). EC guarantees that shared data become consistent at a processor when the processor acquires a synchronization object known to guard the data. EC is weaker than other models described in the literature, such as processor consistency and release consistency, but it makes possible higher performance implementations of the underlying consistency protocols. Midway programs are written in C, and the association between synchronization objects and data must be made with explicit annotations. As a result, pure entry consistent programs can require more annotations than programs written to other models. Midway also supports the stronger release consistent and processor consistent models at the granularity of individual data items.< >
This paper illustrates that several of the new features specified in the revised Ada standard facilitate programming real-time distributed/parallel applications. In particular, the Ada Distributed Systems Annex suppor...
详细信息
This paper illustrates that several of the new features specified in the revised Ada standard facilitate programming real-time distributed/parallel applications. In particular, the Ada Distributed Systems Annex supports both statically bound and the more object-oriented dynamically bound remote procedure calls. These features are used to implement a paradigm for composing asynchronous remote procedure calls when both input and output parameters are required. The paradigm is based upon the notion of a distributed object through which the output parameters may be returned without blocking the execution of the caller. Such paradigms, when combined with the enhanced features for concurrency and data synchronization, suggest that Ada will contribute towards understanding some of the issues relevant to developing efficient implementations of distributed objects to support the next generation of real time systems.< >
A scheme for adding speculative evaluation to the distributed implementation of a lazy functional language is presented. The scheme assigns reduced scheduling priorities to speculative computations to prevent them fro...
详细信息
A scheme for adding speculative evaluation to the distributed implementation of a lazy functional language is presented. The scheme assigns reduced scheduling priorities to speculative computations to prevent them from overwhelming processing resources or altering the program's semantics. Scheduling priorities are dynamically adjusted during execution as speculative computations are found to be needed. By terminating computations associated with reclaimed pieces of graph, a distributed reference counting algorithm can be used to reclaim garbage nodes and to detect and terminate computations that are not required. A scheduling scheme and load balancing that operate in the presence of prioritised computations are briefly presented.< >
An efficient execution model for tree structured computations is presented. A general framework for analyzing the performance of this type of computation for any given topology is discussed. The framework is used to d...
详细信息
An efficient execution model for tree structured computations is presented. A general framework for analyzing the performance of this type of computation for any given topology is discussed. The framework is used to derive models for two widely used parallel programming strategies: processor farms and divide and conquer. The models were validated on a large multicomputer, and it was shown that their accuracy is such that they can be used to predict the performance of applications that use the above strategies. The use of these models to evaluate performance and to restructure the application to improve performance is discussed.< >
Spatial structures are particularly suited to the definition of parallel programs, due to their homogeneity. The Movie-based programming Framework allows specification of computations on regular networks of processors...
详细信息
Spatial structures are particularly suited to the definition of parallel programs, due to their homogeneity. The Movie-based programming Framework allows specification of computations on regular networks of processors, and the visualization of the computation progress as processors are activated. Computations over spatial structures are specified by composing independent views on propagation of control flows and formulae defining local computations. A shape pattern indicates which processors have to be active during a specific phase of the computation. A visit pattern defines the law of propagation for actual processor activation. By combining these types of patterns, we achieve sophisticated forms of specification. In particular, one specifies visitors implementing collective communication schemas widely used in parallel programming: broadcast, gather, scatter and reduction. As a result, automatic generation of visit algorithms adapted to different network configurations is made possible, thus facilitating experimentation with different laws and their visualization.
暂无评论