As a network middleware, Tuplespace provides a powerful way for distributed computing. Having used the TSpaces implementation, we describe the design and implementation of TSPI, a Tuplespace based parallel programming...
详细信息
ISBN:
(纸本)9780769533087
As a network middleware, Tuplespace provides a powerful way for distributed computing. Having used the TSpaces implementation, we describe the design and implementation of TSPI, a Tuplespace based parallel programming library which can be called in C. Especially, TSPI supports some important functions such as reliability, high performance, computers joining and quitting dynamically. Furthermore, the corresponding parallel program structure is proposed Compared with MPI, TSPI is simple, supporting dynamic environment and load balance.
In the last decades, the continuous proliferation of High-Performance Computing (HPC) systems and data centers has augmented the demand for expert HPC system designers, administrators, and programmers. For this reason...
详细信息
ISBN:
(纸本)9781728101903
In the last decades, the continuous proliferation of High-Performance Computing (HPC) systems and data centers has augmented the demand for expert HPC system designers, administrators, and programmers. For this reason, most universities have introduced courses on HPC systems and parallel programming in their degrees. However, the laboratory assignments of these courses generally use clusters that are owned, managed and administrated by the university. This methodology has been shown effective to teach parallel programming, but using a remote cluster prevents the students from experimenting with the design, set up and administration of such systems. This paper presents a methodology and framework to teach HPC systems and parallel programming using a small-scale cluster of single-board computers. These boards are very cheap, their processors are fundamentally very similar to the ones found in HPC, and they are ready to execute Linux out of the box. So they represent a perfect laboratory playground for students experiencing how to assemble a cluster, setting it up, and configuring its system software. Also, we show that these small-scale clusters can be used as evaluation platforms for both, introductory and advanced parallel programming assignments.
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM c...
详细信息
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM cell. Inspired by the spike-timing-dependent-plasticity (STDP), the programming strength is encoded into the spike firing rate (i.e., pulse frequency) and the overlap time (i.e., duty cycle) of the pre-synaptic node and post-synaptic node, and simultaneously applied to all RRAM cells in the cross-point array. Such an approach achieves parallel programming of the entire RRAM array, only requiring local information from pre-synaptic and post-synaptic nodes to each RRAM cell. As demonstrated by digital peripheral circuits implemented in 65 nm CMOS, the programming time of a 40 kb RRAM array is 84 ns, indicating 900X speedup as compared to state-of-the-art software approach of sparse coding in image feature extraction.
Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over...
详细信息
ISBN:
(纸本)9781479927289
Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over different environments. This paper presents a parameterised protocol description language, Pabble, which can guarantee safety and progress in a large class of practical, complex parameterised message-passing programs through static checking. Pabble can describe an overall interaction topology, using a concise and expressive notation, designed for a variable number of participants arranged in multiple dimensions. These parameterised protocols in turn automatically generate local protocols for type checking parameterised MPI programs for communication safety and deadlock freedom. In spite of undecidability of endpoint projection and type checking in the underlying parameterised session type theory, our method guarantees the termination of endpoint projection and type checking.
This paper presents an experience of Problem-based learning in a parallel programming course. The course includes the basics of parallel programming, from methodological and technological aspects to the analysis and d...
详细信息
ISBN:
(纸本)9781509036820
This paper presents an experience of Problem-based learning in a parallel programming course. The course includes the basics of parallel programming, from methodological and technological aspects to the analysis and design of parallel algorithms. The students work with an optimization problem in the field of parallel Computing. The execution time and the energy consumption of a simplified master-slave scheme in a simplified heterogeneous system are optimized, so treating it as a bi-objective optimization problem, which is addressed with sequential, shared-memory, message-passing and hybrid parallel programming. In this way, the students follow the various parts of the syllabus of the course by working with a problem in which topics studied in previous courses are combined (green computing, computational systems architecture, optimization, heuristics), and this contributes to a deeper understanding of these topics and motivates the introduction of new concepts.
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style ...
详细信息
ISBN:
(纸本)9781581134094
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style algorithms;(2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access patterns, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.
Process-thread hybrid programming paradigm is commonly employed in SMP clusters. XPFortran, a parallel programming language that specifies a set of compiler directives and library routines, can be used to realize proc...
详细信息
ISBN:
(纸本)9783642132162
Process-thread hybrid programming paradigm is commonly employed in SMP clusters. XPFortran, a parallel programming language that specifies a set of compiler directives and library routines, can be used to realize process-level parallelism in distributed memory systems. In this paper, we introduce hybrid parallel programming by XPFortran to SNIP clusters, in which thread-level parallelism is realized by OpenMP. We present the language support and compiler implementation of OpenMP directives in XPFortran, and show sonic of our experiences in XPFortran-OpenMP hybrid programming. For nested loops parallelized by process-thread hybrid programming, it's common sense to use process parallelization for outer loops and thread parallelization for inner ones. However, we have found that in sonic cases it's possible to write XPFortran-OpenMP hybrid program in a reverse way, i.e., OpenMP outside, XPFortran inside. Our evaluation results show that this programming style sometimes delivers better performance than the traditional one. We therefore recommend using the hybrid parallelization flexibly.
parallel programming can be extremely challenging. programming models have been proposed to simplify this task, but wide acceptance of these remains elusive for many reasons, including the demand for greater accessibi...
详细信息
ISBN:
(纸本)9781538683194
parallel programming can be extremely challenging. programming models have been proposed to simplify this task, but wide acceptance of these remains elusive for many reasons, including the demand for greater accessibility and productivity. In this paper, we introduce a parallel programming model and framework called CharmPy, based on the Python language. CharmPy builds on Charm++, and runs on top of its C++ runtime. It presents several unique features in the form of a simplified model and API, increased flexibility, and the ability to write everything in Python. CharmPy is a high-level model based on the paradigm of distributed migratable objects. It retains the benefits of the Charm++ runtime, including dynamic load balancing, asynchronous execution model with automatic overlap of communication and computation, high performance, and scalability from laptops to supercomputers. By being Python-based, CharmPy also benefits from modern language features, access to popular scientific computing and data science software, and interoperability with existing technologies like C, Fortran and OpenMP. To illustrate the simplicity of the model, we will show how to implement a distributed parallel map function based on the Master-Worker pattern using CharmPy, with support for asynchronous concurrent jobs. We also present performance results running stencil code and molecular dynamics mini-apps fully written in Python, on Blue Waters and Cori supercomputers. For stencil3d, we show performance similar to an equivalent MPI-based program, and significantly improved performance for imbalanced computations. Using Numba to JIT-compile the critical parts of the code, we show performance for both mini-apps similar to the equivalent C++ code.
This paper introduces an aspect-oriented library aimed to support efficient execution of Java applications on multi-core systems. The library is coded in AspectJ and provides a set of parallel programming abstractions...
详细信息
ISBN:
(纸本)9780769551173
This paper introduces an aspect-oriented library aimed to support efficient execution of Java applications on multi-core systems. The library is coded in AspectJ and provides a set of parallel programming abstractions that mimics the OpenMP standard. The library supports the migration of sequential Java codes to multi-core machines with minor changes to the base code, intrinsically supports the sequential semantics of OpenMP and provides improved integration with object-oriented mechanisms. The aspect-oriented nature of library enables the encapsulation of parallelism-related code into well-defined modules. The approach makes the parallelisation and the maintenance of large-scale Java applications more manageable. Furthermore, the library can be used with plain Java annotations and can be easily extended with application-specific mechanisms in order to tune application performance. The library has a competitive performance, in comparison with traditional parallel programming in Java, and enhances programmability, since it allows an independent development of parallelism-related code.
暂无评论