CHARM++ is a general-purpose framework for developing high-performance parallel applications [1]. Applications written using CHARM++ run at scales spanning mobile devices [2], multi-core processors, multi-processor NU...
详细信息
ISBN:
(纸本)9781510801011
CHARM++ is a general-purpose framework for developing high-performance parallel applications [1]. Applications written using CHARM++ run at scales spanning mobile devices [2], multi-core processors, multi-processor NUMA woprkstations and servers, networked clusters, and the world's largest supercomputers. A selection of CHARM++ applications in production scientific usage is shown in Table 1. These applications consume the second most execution cycles on NSF computing resources, after MPI.
Data-flow is a natural approach to parallelism. However, describing dependencies and control between fine-grained data-flow tasks can be complex and present unwanted overheads. TALM (TALM is an Architecture and Langua...
详细信息
Data-flow is a natural approach to parallelism. However, describing dependencies and control between fine-grained data-flow tasks can be complex and present unwanted overheads. TALM (TALM is an Architecture and Language for Multi-threading) introduces a user-defined coarse-grained parallel data-flow model, where programmers identify code blocks, called super-instructions, to be run in parallel and connect them in a data-flow graph. TALM has been implemented as a hybrid Von Neumann/data-flow execution system: the Trebuchet. We have observed that TALM's usefulness largely depends on how programmers specify and connect super-instructions. Thus, we present Couillard, a full compiler that creates, based on an annotated C-program, a data-flow graph and C-code corresponding to each super-instruction. We show that our toolchain allows one to benefit from data-flow execution and explore sophisticated parallel programming techniques, with small effort. To evaluate our system we have executed a set of real applications on a large multi-core machine. Comparison with popular parallel programming methods shows competitive speedups, while providing an easier parallel programing approach. More specifically, for an application that follows the wavefront method, running with big inputs, Trebuchet achieved up to 4.7% speedup over Intel (R) TBB novel flow-graph approach and up to 44% over OpenMP. (C) 2014 Elsevier B.V. All rights reserved.
This Innovative Practice Full Paper presents BlocklyPar, a set of three tutorial games to move from sequential to parallel programming using a block-based visual language. Block-based tutorial games are attractive too...
详细信息
ISBN:
(纸本)9781665438513
This Innovative Practice Full Paper presents BlocklyPar, a set of three tutorial games to move from sequential to parallel programming using a block-based visual language. Block-based tutorial games are attractive tools for introducing programming to novices. A few of existing tools can express multiple tasks running at the same time, but none of them address parallel programming concepts and terms used in the field of parallel computing. Our tutorial games are targeted for first-year Computer Science students, as a resource to anticipate parallel computing using a self-taught approach with engaging challenges. The challenges involve university students' day-to-day tasks to make the games more meaningful for the audience, thus collaborating with the idea that everyday tasks can benefit from parallel approaches. The first game introduces the programming environment and the sequential blocks;the second introduces the concepts of tasks, resources allocation, and parallel task execution;and the third presents the concepts of computational load distribution and performance metrics for evaluating improvements in a parallel solution. The concepts are expressed through animation components and three new programming blocks. We have conducted preliminary tests with Computer Science students for evaluating the platform usage and parallel programming concepts assessed. The results suggest that the games contribute to the student's learning on parallelism as an extension of practicing sequential programming. It can also motivate students to design parallel solutions to explore today's multi-core and multiprocessor computers.
Over the years, several parallel programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such ...
详细信息
Over the years, several parallel programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to develop software applications. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming paradigms. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short. In contrast, Planning Poker could contribute toward a parallel-aware effort model.
This paper presents experience using a research-infused teaching approach towards an undergraduate parallel programming course. The research-teaching nexus is applied at various levels, first by using research-led tea...
详细信息
ISBN:
(纸本)9781479941162
This paper presents experience using a research-infused teaching approach towards an undergraduate parallel programming course. The research-teaching nexus is applied at various levels, first by using research-led teaching of core parallel programming concepts, as well as teaching the latest developments from the affiliated research group. The bulk of the course, however, focuses more on the student-driven research-based and research-tutored teaching approaches, where students actively participate in groups on research projects;students are fully immersed in the learning activity of their respective project, while at the same time participating in discussions of wider parallel programming topics across other groups. This intimate affiliation between the undergraduate course and the research group results in a wide range of benefits for all those involved.
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM c...
详细信息
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM cell. Inspired by the spike-timing-dependent-plasticity (STDP), the programming strength is encoded into the spike firing rate (i.e., pulse frequency) and the overlap time (i.e., duty cycle) of the pre-synaptic node and post-synaptic node, and simultaneously applied to all RRAM cells in the cross-point array. Such an approach achieves parallel programming of the entire RRAM array, only requiring local information from pre-synaptic and post-synaptic nodes to each RRAM cell. As demonstrated by digital peripheral circuits implemented in 65 nm CMOS, the programming time of a 40 kb RRAM array is 84 ns, indicating 900X speedup as compared to state-of-the-art software approach of sparse coding in image feature extraction.
Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over...
详细信息
ISBN:
(纸本)9781479927289
Many parallel and distributed message-passing programs are written in a parametric way over available resources, in particular the number of nodes and their topologies, so that a single parallel program can scale over different environments. This paper presents a parameterised protocol description language, Pabble, which can guarantee safety and progress in a large class of practical, complex parameterised message-passing programs through static checking. Pabble can describe an overall interaction topology, using a concise and expressive notation, designed for a variable number of participants arranged in multiple dimensions. These parameterised protocols in turn automatically generate local protocols for type checking parameterised MPI programs for communication safety and deadlock freedom. In spite of undecidability of endpoint projection and type checking in the underlying parameterised session type theory, our method guarantees the termination of endpoint projection and type checking.
As the Pawsey Centre project continues, in 2013 iVEC was tasked with deciding which accelerator technology to use in the petascale supercomputer to be delivered in mid 2014. While accelerators provide impressive perfo...
详细信息
As the Pawsey Centre project continues, in 2013 iVEC was tasked with deciding which accelerator technology to use in the petascale supercomputer to be delivered in mid 2014. While accelerators provide impressive performance and efficiency, an important factor in this decision is the usability of the technologies. To assist in the assessment of technologies, iVEC conducted a code sprint where iVEC staff and advanced users were paired to make use of a range of tools to port their codes to two architectures. Results of the sprint indicate that certain subtasks could benefit from using the tools in the code-acceleration process;however, there will be many hurdles users will face in migrating to either of the platforms explored.
The married women who educate simultaneously, are faced to many challenges for managing their time. Since they have multiple and even conflicting roles, their academic achievement or their family life may be at *** pa...
详细信息
The married women who educate simultaneously, are faced to many challenges for managing their time. Since they have multiple and even conflicting roles, their academic achievement or their family life may be at *** parallel planning:a total time management model made by authors,can improve their academic achievement or not?A model which tries firstly to improve some skills about and secondly put together all important tasks .The main goal of this study was determining the effectiveness of instructing and employing this model in academic achievement in the case of married women. For doing so,a single case has been selected, multiple baseline(across subjects) design. The sample included 5, married female subjects who were selected in a purposive sampling way among Payame Noor University 2013 students. The cases average age was 24.2 years. Each subject had atleast 11 instructional, practical and monitoring sessions during 18 weeks. Study had two phases of baseline and treatment(instruction).Subjects entered in instruction respectively in 4 th ,5 th ,6 th ,7 th &8 th session. In each session, each subject responded to totally 18 shortanswer exams(with 20 questions) based on her thermic lesson design, along baseline and instruction phase. The scores reported in a 100point scale and finally graphs and visual analysis prepared on the basis of data. Comparison of the scores of baseline and instruction phase,demonstrated a clear improvement in each subjects’ scores. Based on findings parallel programming instruction was effective on academic achievement.
General purpose graphics processing units (GPGPUs) suitable for general purpose programming have become sufficiently affordable in the last three years to be used in personal workstations. In this paper we assess the ...
详细信息
ISBN:
(纸本)9781479974863
General purpose graphics processing units (GPGPUs) suitable for general purpose programming have become sufficiently affordable in the last three years to be used in personal workstations. In this paper we assess the usefulness of such hardware in the statistical analysis of simulation input and output data. In particular we consider the fitting of complex parametric statistical metamodels to large data samples where optimization of a statistical function of the data is needed and investigate whether use of a GPGPU in such a problem would be worthwhile. We give an example, involving loss-given-default data obtained in a real credit risk study, where use of Nelder-Mead optimization can be efficiently implemented using parallel processing methods. Our results show that significant improvements in computational speed of well over an order of magnitude are possible. With increasing interest in "big data" samples the use of GPGPUs is therefore likely to become very important.
暂无评论