The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the f...
详细信息
ISBN:
(纸本)9781509035267
The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming capability in one line of code. LLMapReduce supports all programming languages and many schedulers. LLMapReduce can work with any application without the need to modify the application. Furthermore, LLMapReduce can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel programming model. These features allow users to reduce the computational overhead by more than 10x compared to standard map-reduce for certain applications. LLMapReduce is widely used by hundreds of users at MIT. Currently LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF.
This paper present experiment done with mapping of Algorithmic structure pattern with implementation pattern. Selection of implementation patterns and data structures needs to consider parallel platform for which they...
详细信息
ISBN:
(纸本)9781509006700
This paper present experiment done with mapping of Algorithmic structure pattern with implementation pattern. Selection of implementation patterns and data structures needs to consider parallel platform for which they are developed and they also affects the performance of program. The experiment results supports need of Adaptive patterns for parallel programming to develop software's runs on different parallel environments.
High-Performance Computing (HPC) is becoming increasingly required by scientists of all branches in order to achieve their desired research results. However, carrying out their research in an HPC center can be a diffi...
详细信息
High-Performance Computing (HPC) is becoming increasingly required by scientists of all branches in order to achieve their desired research results. However, carrying out their research in an HPC center can be a difficult task when they are new to parallel programming. These users need support in the parallelization and optimization of their codes, in order to obtain reliable results as well as make efficient use of the available resources. For this purpose, a novel code analyzer for automatic parallelization of sequential codes is presented, focused on resource management of a supercomputing center, where efficient scheduling decisions and energy saving become key challenges. Thus, this paper aims to introduce the analyzer so as to demonstrate the importance of using it, specially in terms of efficiency, when running parallel codes in HPC centers.
According to recent studies, the current state of Science, Technology, Engineering, and Mathematics (STEM) education in the U.S. has not been impressive. In this paper, we introduce an interdisciplinary learnercentere...
详细信息
According to recent studies, the current state of Science, Technology, Engineering, and Mathematics (STEM) education in the U.S. has not been impressive. In this paper, we introduce an interdisciplinary learnercentered computational experience in nanotechnology for undergraduate STEM students. Three important tasks associated with this work are applying power-aware data-regrouping based parallel computation to analyze nanoscale materials; updating and/or developing “handson computational experience in nanotechnology” courses; and assessing students' learning experience and interest in high performance computing (HPC) simulation for nanotechnology. The proposed activities have potential to improve motivation, engagement, and learning of STEM students, enhancing the Engaged Student Learning environment. The tasks described in this work incorporate many-core computing, nanomanufacturing, and energy savings, and are aimed at advancing HPC with fundamental understanding of nanostructured fiber behavior, which in turn will allow the use of effective materials for renewable energy conversion. Activities to address industry-oriented realworld problems will attract new students to the STEM education, as the job market in related fields is growing.
In this article, an efficient parallel algorithm for a hybrid CPU-GPU platform is proposed to enable large-scale molecular dynamics (MD) simulations of the metal solidification process. The results, implemented the pa...
详细信息
ISBN:
(纸本)9781509040940
In this article, an efficient parallel algorithm for a hybrid CPU-GPU platform is proposed to enable large-scale molecular dynamics (MD) simulations of the metal solidification process. The results, implemented the parallel algorithm program on the hybrid CPU-GPU platform shows better performance than the program based on previous algorithms running on the CPU cluster platform. By contrast, the total execution time of the new program has been obviously decreased. Particularly, because of the use of the modified load balancing method, the neighbor list update time is approximately zero. The parallel program based on the CUDA+OpenMP model shows a factor of 6 16-core calculation speedups compared to the parallel program based on the MPI+OpenMP model, and the optimal computational efficiency is achieved in the simulation system including 10,000,000 aluminum atoms. Finally, the good consistency between them verifies the correctness of the algorithm efficiently, by comparison of the theoretical results and experimental results.
This paper extends the existing theory on maximally permissive liveness-enforcing supervision of resource allocation systems (RAS) so that it can handle RAS with reader / writer (R/W-) locks. A key challenge that is p...
详细信息
This paper extends the existing theory on maximally permissive liveness-enforcing supervision of resource allocation systems (RAS) so that it can handle RAS with reader / writer (R/W-) locks. A key challenge that is posed by this new RAS class stems from the fact that the underlying state space is not necessarily finite. We effectively address this obstacle by taking advantage of special structure that exists in the set of inadmissible states and enables a finite representation of this set through its minimal elements.
Suzaku is a pattern programming framework that enables programmers to create pattern-based parallel MPI programs without writing the MPI message-passing code implicit in the patterns. The purpose of this framework is ...
详细信息
ISBN:
(纸本)9781509036837
Suzaku is a pattern programming framework that enables programmers to create pattern-based parallel MPI programs without writing the MPI message-passing code implicit in the patterns. The purpose of this framework is to simplify message-passing programming and create better structured programs based upon established parallel design patterns. The focus for developing Suzaku is on teaching parallel programming. This paper covers the main features of Suzaku and describes our experiences using it in parallel programming classes.
With multi-core processors, parallel programming has taken on greater importance. Traditional parallel programming techniques based on critical sections controlled by locking have several well-known drawbacks. To allo...
详细信息
With multi-core processors, parallel programming has taken on greater importance. Traditional parallel programming techniques based on critical sections controlled by locking have several well-known drawbacks. To allow for more efficient parallel programming with higher performance, the IBM POWER8 (TM) processor implements a hardware transactional memory facility. Transactional memory allows groups of load and store operations to execute and commit as a single atomic unit without the use of traditional locks, thereby improving performance and simplifying the parallel programming model. The POWER8 transactional memory facility provides a robust capability to execute transactions that can survive interrupts. It also allows non-speculative accesses within transactions, which facilitates debugging and thread-level speculation. Unique challenges caused by implementing transactional memory on top of the Power ISA (Instruction Set Architecture) weakly consistent memory model are addressed. We detail the Power ISA transactional memory architecture, the POWER8 implementation of this architecture, and two practical uses of this architecture-Transactional Lock Elision (TLE) and Thread-Level Speculation (TLS)-and provide performance results for these uses.
Networks are commonly used to model traffic patterns, social interactions, or web pages. The vertices in a network do not possess the same characteristics: some vertices are naturally more connected and some vertices ...
详细信息
Networks are commonly used to model traffic patterns, social interactions, or web pages. The vertices in a network do not possess the same characteristics: some vertices are naturally more connected and some vertices can be more important. Closeness centrality (CC) is a global metric that quantifies how important is a given vertex in the network. When the network is dynamic and keeps changing, the relative importance of the vertices also changes. The best known algorithm to compute the CC scores makes it impractical to recompute them from scratch after each modification. In this paper, we propose STREAMER, a distributed memory framework for incrementally maintaining the closeness centrality scores of a network upon changes. It leverages pipelined, replicated parallelism, and SpMM-based BFSs, and it takes NUMA effects into account. It makes maintaining the closeness centrality values of real-life networks with millions of interactions significantly faster and obtains almost linear speedups on a 64 nodes 8 threads/node cluster. (C) 2015 Elsevier B.V. All rights reserved.
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary ...
详细信息
ISBN:
(纸本)9781509053087
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface [2].
暂无评论