This contribution presents a computational framework for simulation and gradient-based structural optimization of geometrically nonlinear and large-scale structural finite element models. CAGD-free optimization method...
详细信息
ISBN:
(纸本)9781905088416
This contribution presents a computational framework for simulation and gradient-based structural optimization of geometrically nonlinear and large-scale structural finite element models. CAGD-free optimization methods have been developed to integrate shape optimization in an early stage of design and to reduce the related modelling effort. To overcome the problem of an increasing numerical cost due to the large design space, the design sensitivities for objectives and constraints are evaluated via adjoint formulations. A new parallel computation strategy for sensitivity evaluation is presented which takes advantage of a completely parallelized simulation and optimization environment. Two application examples illustrate the method and demonstrate the high parallel efficiency.
Process-thread hybrid programming paradigm is commonly employed in SMP clusters. XPFortran, a parallel programming language that specifies a set of compiler directives and library routines, can be used to realize proc...
详细信息
ISBN:
(纸本)9783642132162
Process-thread hybrid programming paradigm is commonly employed in SMP clusters. XPFortran, a parallel programming language that specifies a set of compiler directives and library routines, can be used to realize process-level parallelism in distributed memory systems. In this paper, we introduce hybrid parallel programming by XPFortran to SNIP clusters, in which thread-level parallelism is realized by OpenMP. We present the language support and compiler implementation of OpenMP directives in XPFortran, and show sonic of our experiences in XPFortran-OpenMP hybrid programming. For nested loops parallelized by process-thread hybrid programming, it's common sense to use process parallelization for outer loops and thread parallelization for inner ones. However, we have found that in sonic cases it's possible to write XPFortran-OpenMP hybrid program in a reverse way, i.e., OpenMP outside, XPFortran inside. Our evaluation results show that this programming style sometimes delivers better performance than the traditional one. We therefore recommend using the hybrid parallelization flexibly.
parallel programming can be extremely challenging. programming models have been proposed to simplify this task, but wide acceptance of these remains elusive for many reasons, including the demand for greater accessibi...
详细信息
ISBN:
(纸本)9781538683194
parallel programming can be extremely challenging. programming models have been proposed to simplify this task, but wide acceptance of these remains elusive for many reasons, including the demand for greater accessibility and productivity. In this paper, we introduce a parallel programming model and framework called CharmPy, based on the Python language. CharmPy builds on Charm++, and runs on top of its C++ runtime. It presents several unique features in the form of a simplified model and API, increased flexibility, and the ability to write everything in Python. CharmPy is a high-level model based on the paradigm of distributed migratable objects. It retains the benefits of the Charm++ runtime, including dynamic load balancing, asynchronous execution model with automatic overlap of communication and computation, high performance, and scalability from laptops to supercomputers. By being Python-based, CharmPy also benefits from modern language features, access to popular scientific computing and data science software, and interoperability with existing technologies like C, Fortran and OpenMP. To illustrate the simplicity of the model, we will show how to implement a distributed parallel map function based on the Master-Worker pattern using CharmPy, with support for asynchronous concurrent jobs. We also present performance results running stencil code and molecular dynamics mini-apps fully written in Python, on Blue Waters and Cori supercomputers. For stencil3d, we show performance similar to an equivalent MPI-based program, and significantly improved performance for imbalanced computations. Using Numba to JIT-compile the critical parts of the code, we show performance for both mini-apps similar to the equivalent C++ code.
The use of modern browsers reveals itself more and more essential to the world. Features like Web Workers are becoming more adopted over the most used browsers of the Internet, enabling performance enhancements in web...
详细信息
ISBN:
(纸本)9789819735556;9789819735563
The use of modern browsers reveals itself more and more essential to the world. Features like Web Workers are becoming more adopted over the most used browsers of the Internet, enabling performance enhancements in web applications. As consequence, execution of tasks with higher computational demand inside the browser. Technique of task parallelization using Web Workers, presenting as study case an algorithm of crossword generation, being executed in a browser context. The results show even superlinear speedups for a parallel version of the algorithm.
parallel programming is an important issue for current multi-core processors and necessary for new generations of many-core architectures. This includes processors, computers, and clusters. However, the introduction o...
详细信息
ISBN:
(纸本)9781467313513
parallel programming is an important issue for current multi-core processors and necessary for new generations of many-core architectures. This includes processors, computers, and clusters. However, the introduction of parallel programming in undergraduate courses demands new efforts to prepare students for this new reality. This paper describes an experiment on a traditional Computer Science course during a two-year period. The main focus is the question of when to introduce parallel programming models in order to improve the quality of learning. The goal is to propose a method of introducing parallel programming based on OpenMP (a shared-variable model) and MPI (a message-passing model). Results show that when the OpenMP model is introduced before the MPI model the best results are achieved. The main contribution of this paper is the proposed method that correlates several concepts such as concurrency, parallelism, speedup, and scalability to improve student motivation and learning.
The paper discusses the relationships between hierarchically composite MPP architectures and the software technology derived from the structured parallel programming methodology, in particular the architectural suppor...
详细信息
ISBN:
(纸本)0818684275
The paper discusses the relationships between hierarchically composite MPP architectures and the software technology derived from the structured parallel programming methodology, in particular the architectural support to successive modular refinements of parallel applications, and the architectural support to the parallel programming paradigms and their combinations. The structured parallel programming methodology referred here is an application of the Skeletons model. The considered hierarchically composite architectures are MPP machine models for PetaFlops computing, composed of proper combinations of current architectural models of different granularities, where the Processors-In-Memory model is adopted at the finest granularity level. The methodologies are discussed with reference to the current PQE2000 Project on MPP general purpose systems.
As a network middleware, Tuplespace provides a powerful way for distributed computing. Having used the TSpaces implementation, we describe the design and implementation of TSPI, a Tuplespace based parallel programming...
详细信息
ISBN:
(纸本)9780769533087
As a network middleware, Tuplespace provides a powerful way for distributed computing. Having used the TSpaces implementation, we describe the design and implementation of TSPI, a Tuplespace based parallel programming library which can be called in C. Especially, TSPI supports some important functions such as reliability, high performance, computers joining and quitting dynamically. Furthermore, the corresponding parallel program structure is proposed Compared with MPI, TSPI is simple, supporting dynamic environment and load balance.
In the last decades, the continuous proliferation of High-Performance Computing (HPC) systems and data centers has augmented the demand for expert HPC system designers, administrators, and programmers. For this reason...
详细信息
ISBN:
(纸本)9781728101903
In the last decades, the continuous proliferation of High-Performance Computing (HPC) systems and data centers has augmented the demand for expert HPC system designers, administrators, and programmers. For this reason, most universities have introduced courses on HPC systems and parallel programming in their degrees. However, the laboratory assignments of these courses generally use clusters that are owned, managed and administrated by the university. This methodology has been shown effective to teach parallel programming, but using a remote cluster prevents the students from experimenting with the design, set up and administration of such systems. This paper presents a methodology and framework to teach HPC systems and parallel programming using a small-scale cluster of single-board computers. These boards are very cheap, their processors are fundamentally very similar to the ones found in HPC, and they are ready to execute Linux out of the box. So they represent a perfect laboratory playground for students experiencing how to assemble a cluster, setting it up, and configuring its system software. Also, we show that these small-scale clusters can be used as evaluation platforms for both, introductory and advanced parallel programming assignments.
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM c...
详细信息
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM cell. Inspired by the spike-timing-dependent-plasticity (STDP), the programming strength is encoded into the spike firing rate (i.e., pulse frequency) and the overlap time (i.e., duty cycle) of the pre-synaptic node and post-synaptic node, and simultaneously applied to all RRAM cells in the cross-point array. Such an approach achieves parallel programming of the entire RRAM array, only requiring local information from pre-synaptic and post-synaptic nodes to each RRAM cell. As demonstrated by digital peripheral circuits implemented in 65 nm CMOS, the programming time of a 40 kb RRAM array is 84 ns, indicating 900X speedup as compared to state-of-the-art software approach of sparse coding in image feature extraction.
Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over...
详细信息
ISBN:
(纸本)9781479927289
Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over different environments. This paper presents a parameterised protocol description language, Pabble, which can guarantee safety and progress in a large class of practical, complex parameterised message-passing programs through static checking. Pabble can describe an overall interaction topology, using a concise and expressive notation, designed for a variable number of participants arranged in multiple dimensions. These parameterised protocols in turn automatically generate local protocols for type checking parameterised MPI programs for communication safety and deadlock freedom. In spite of undecidability of endpoint projection and type checking in the underlying parameterised session type theory, our method guarantees the termination of endpoint projection and type checking.
暂无评论