Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on parallel Computing Systems (PCS) has...
详细信息
Due to the rapid growth in the multicore and GPU based computing devices, the need to teach parallel computing in CS/CE curriculum has become almost mandatory nowadays. A course on parallel Computing Systems (PCS) has been designed to provide an understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. An activity based learning approach was adopted for teaching the course and several parallel programming paradigms and technologies such OpenMP, MPI, and CUDA have been covered. This course was offered as a required course to graduate students. This paper describes the implementation of the course at Thiagarajar College of Engineering. Evaluation of the implementation of the course reveals that for students who have not been exposed to parallel and distributed computing, i) activity based learning results in better knowledge gain compared to the traditional approach, ii) learning OpenMP was much easier than MPI or CUDA, iii) some parallel and Distributed Computing (PDC) concepts such as false sharing were harder to grasp compared to basic concepts, and iv) it is essential to introduce parallel computing in the undergraduate curriculum.
In the framework of GridRPC, a new function that allows direct data transfer between RPC servers is implemented for efficient execution of a Task Sequencing job in a grid environment. In Task Sequencing, RPC requires ...
详细信息
In the framework of GridRPC, a new function that allows direct data transfer between RPC servers is implemented for efficient execution of a Task Sequencing job in a grid environment. In Task Sequencing, RPC requires dependency between input and output parameters, which means output of a previous RPC becomes the input of the next RPC. In this study, the direct transfer of data is implemented using the grid filesystem without destroying the GridRPC programming model and without changing very many parts of the existing Ninf-G implementation. Our Task Sequencing API library analyzes RPC arguments to detect intermediate data after task submissions, and reports the information to GridRPC servers so that the intermediate data is created on the grid filesystem. Through our performance evaluation on LAN and on the Japan-US grid environment, it was verified that the function achieved performance improvement in distributed Task Sequencing.
The GPUs (Graphics Processing Units) have evolved into extremely powerful and flexible processors, allowing its usage for processing different data. This advantage can be used in game development to optimize the game ...
详细信息
The GPUs (Graphics Processing Units) have evolved into extremely powerful and flexible processors, allowing its usage for processing different data. This advantage can be used in game development to optimize the game loop. Most GPGPU works deals only with some steps of the game loop, allowing to the CPU to process most of the game logic. This work differ from the traditional approach, by presenting and implementing practically the entire game loop inside the GPU. This is a big breakthrough on game development, since the CPUs are evolving to multi-core, and future games will need similar parallelism as the GPUs programs.
In this paper we present first experiences concerning the integration of MPI-based numerical software into an advanced programming environment for building parallel and distributed high-performance applications, which...
详细信息
In this paper we present first experiences concerning the integration of MPI-based numerical software into an advanced programming environment for building parallel and distributed high-performance applications, which is under development in the context of Italian national research projects. Such a programming environment, named ASSIST, is based on a combination of the concepts of structured parallel programming and component-based programming. Some activities within the projects are devoted to the definition, implementation and testing of a methodology for the integration of a parallel numerical library into ASSIST. The goal is providing a set of efficient, accurate and reliable tools that can be easily used as building blocks for high-performance scientific applications. We focus on the integration of existing and widely used MPI-based numerical library modules. To this aim, we propose a general approach to embed MPI computations into the ASSIST basic programming unit. This approach has been tested using the MPICH implementation of MPI for networks of workstations. Some modifications have been applied to the MPICH process startup procedure, in order to make it compliant with the ASSIST environment. Results of experiments concerning the integration of routines from a well-known FFT package are discussed.
We present a novel state management mechanism that can be used to capture the complete execution state of distributed Python applications. This mechanism can serve as the foundation for a variety of dependability stra...
详细信息
We present a novel state management mechanism that can be used to capture the complete execution state of distributed Python applications. This mechanism can serve as the foundation for a variety of dependability strategies including checkpointing, replication, and migration. Python is increasingly used for rapid prototyping parallel pro grams and, in some cases, used for high-performance application development using libraries such as NumPy. Building on Stackless Python and the River parallel and distributed programming environment, we have developed mechanisms for state capture at the language level. Our approach allows for migration and checkpointing of applications in heterogeneous environments. In addition, we allow for preemptive state capture so that programmers need not introduce explicit snapshot requests. Our mechanism can be extended to support application or domain-specific state capture. To our knowledge, this is the first general checkpointing scheme for Python. We describe our system, the implementation, and give some initial performance figures.
With the progress of semiconductor technologies and the advent of multi-core processor, parallel programming models are evolving and the education is needed to help sequential programmers adapt to the requirements of ...
详细信息
With the progress of semiconductor technologies and the advent of multi-core processor, parallel programming models are evolving and the education is needed to help sequential programmers adapt to the requirements of those new technologies and architectures. Now multi-core related contents have been adopted into curricula syllabus of more than 100 universities in China, but how those contents be organized and delivered to students are still a big challenge. In this paper, we present the current status of multi-core education in China and try to divide related contents into several parts, we also introduce "contracted Problem/Project Based Learning (cP 2 BL)" strategy that have been adopted into teaching curricula "Multi-core Architecture and Multithreaded programming Technologies", which runs well in Wuhan University.
Peterson's solution is a classical algorithm for mutual exclusion problem. But rigorous works on analyzing its properties of safety or liveness are rare so far. In theorem prover Isabelle/HOL, we formally modelled...
详细信息
ISBN:
(纸本)9781509035403
Peterson's solution is a classical algorithm for mutual exclusion problem. But rigorous works on analyzing its properties of safety or liveness are rare so far. In theorem prover Isabelle/HOL, we formally modelled Peterson's solution for two processes, and proved that it satisfies mutual exclusion property. With Paulson's inductive approach, the algorithm is inductively defined as a set of all possible event lists of two concurrent processes, in which event is defined as atomic action of concurrent processe. All of the reasoning codes have been checked by Isabelle/HOL. Comparing with those works based on model checking, our work can be easily generalized to the analysis of Peterson's solution for n (n>2) processes. And the model we defined for Peterson's solution could be extended to analyze liveness property of Peterson's solution. The process of proving also produces some good advices on how to programming Peterson's solution.
Spatial structures are particularly suited to the definition of parallel programs, due to their homogeneity. The Movie-based programming Framework allows specification of computations on regular networks of processors...
详细信息
Spatial structures are particularly suited to the definition of parallel programs, due to their homogeneity. The Movie-based programming Framework allows specification of computations on regular networks of processors, and the visualization of the computation progress as processors are activated. Computations over spatial structures are specified by composing independent views on propagation of control flows and formulae defining local computations. A shape pattern indicates which processors have to be active during a specific phase of the computation. A visit pattern defines the law of propagation for actual processor activation. By combining these types of patterns, we achieve sophisticated forms of specification. In particular, one specifies visitors implementing collective communication schemas widely used in parallel programming: broadcast, gather, scatter and reduction. As a result, automatic generation of visit algorithms adapted to different network configurations is made possible, thus facilitating experimentation with different laws and their visualization.
The authors describe the motivation, design, and performance of Midway, a programming system for a distributed shared memory multicomputer (DSM) such as an ATM-based cluster, a CM-5, or a Paragon. Midway supports a no...
详细信息
The authors describe the motivation, design, and performance of Midway, a programming system for a distributed shared memory multicomputer (DSM) such as an ATM-based cluster, a CM-5, or a Paragon. Midway supports a novel memory consistency model called entry consistency (EC). EC guarantees that shared data become consistent at a processor when the processor acquires a synchronization object known to guard the data. EC is weaker than other models described in the literature, such as processor consistency and release consistency, but it makes possible higher performance implementations of the underlying consistency protocols. Midway programs are written in C, and the association between synchronization objects and data must be made with explicit annotations. As a result, pure entry consistent programs can require more annotations than programs written to other models. Midway also supports the stronger release consistent and processor consistent models at the granularity of individual data items.< >
A scheme for adding speculative evaluation to the distributed implementation of a lazy functional language is presented. The scheme assigns reduced scheduling priorities to speculative computations to prevent them fro...
详细信息
A scheme for adding speculative evaluation to the distributed implementation of a lazy functional language is presented. The scheme assigns reduced scheduling priorities to speculative computations to prevent them from overwhelming processing resources or altering the program's semantics. Scheduling priorities are dynamically adjusted during execution as speculative computations are found to be needed. By terminating computations associated with reclaimed pieces of graph, a distributed reference counting algorithm can be used to reclaim garbage nodes and to detect and terminate computations that are not required. A scheduling scheme and load balancing that operate in the presence of prioritised computations are briefly presented.< >
暂无评论