"In silico" experimentation allows us to simulate the effect of different therapies by handling model parameters. Although the computational simulation of tumors is currently a well-known technique, it is ho...
详细信息
ISBN:
(纸本)9783319987026;9783319987019
"In silico" experimentation allows us to simulate the effect of different therapies by handling model parameters. Although the computational simulation of tumors is currently a well-known technique, it is however possible to contribute to its improvement by parallelizing simulations on computer systems of many and multi-cores. This work presents a proposal to parallelize a tumor growth simulation that is based on cellular automata by partitioning of the data domain and by dynamic load balancing. The initial results of this new approach show that it is possible to successfully accelerate the calculations of a known algorithm for tumor-growth.
As the parallelism in high-performance supercomputers continues to grow, new programming models become necessary to maintain programmer productivity at today's levels. Dataflow is a promising execution model becau...
详细信息
ISBN:
(纸本)9783030294007;9783030293994
As the parallelism in high-performance supercomputers continues to grow, new programming models become necessary to maintain programmer productivity at today's levels. Dataflow is a promising execution model because it can represent parallelism at different granularity levels and to dynamically adapt for efficient execution. The downside is the low-level programming interface inherent to dataflow. We present a strategy to translate programs written in Hierarchically Tiled Arrays (HTA) to the dataflow API of Open Community Runtime (OCR) system. The goal is to enable program development in a convenient notation and at the same time take advantage of the benefits of a dataflow runtime system. Using HTA produces more comprehensive codes than those written using the dataflow runtime programming interface. Moreover, the experiments show that, for applications with high asynchrony and sparse data dependences, our implementation delivers superior performance than OpenMP using parallel for loops.
Proposed paper presents a new model-based Gaussian clustering method and defines new optimization criteria for model-based clustering, which are used as fitness functions in genetic algorithm. These optimization crite...
详细信息
ISBN:
(数字)9783030166816
ISBN:
(纸本)9783030166816;9783030166809
Proposed paper presents a new model-based Gaussian clustering method and defines new optimization criteria for model-based clustering, which are used as fitness functions in genetic algorithm. These optimization criteria are based on different properties of covariance matrices. The proposed model-based Gaussian clustering method is compared with the well-known K-Means method that is solved by genetic algorithm or by Particle Swarm Optimization method. Our method achieves higher similarity between real classification and computed clustering results on all six presented real-world datasets. Because of the high computational requirements of the used methods we have focused on their parallelization. Due to the chosen parallel computer architecture we have combined both MPI and OpenMP programing interfaces. We show that parallelization of the proposed method is very effective and scalable on many execution units.
Task parallel programming models such as Habanero Java help developers write idiomatic parallel programs and avoid common errors. Data race freedom is a desirable property for task parallel programs but is difficult t...
详细信息
ISBN:
(纸本)9780983567899
Task parallel programming models such as Habanero Java help developers write idiomatic parallel programs and avoid common errors. Data race freedom is a desirable property for task parallel programs but is difficult to prove because every possible execution of the program must be considered. A partial order over events of an observed program execution induces an equivalence class of executions that the program may also produce. The Does-not-Commute (DC) relation is an efficiently computable partial order used for data race detection. As a relatively weak partial order, the DC relation can represent relatively large equivalence classes of program executions. However, some of these executions may be infeasible, thus leading to false data race reports. The contribution of this paper is a mechanized proof that the DC relation is actually sound for commonly used task parallel programming models. Sound means that the first data race identified by the DC relation is guaranteed to be a real data race. A prototype analysis in the Java Pathfinder model checker shows that the DC relation can significantly reduce the number of explored states required to prove data race freedom in Habanero Java programs. In this application, the search for data race using the DC relation is both sound and complete.
The introductory-level courses on parallel programming, typically, do not cover the topic of code correctness. Often, students learn about the logical errors in parallel programs and troubleshoot them through trial an...
详细信息
ISBN:
(纸本)9781728148946
The introductory-level courses on parallel programming, typically, do not cover the topic of code correctness. Often, students learn about the logical errors in parallel programs and troubleshoot them through trial and error, and spend a significant amount of time and effort in the process. A systematic pedagogical approach to teaching parallel code correctness is therefore needed to enhance the productivity of students and instructors. In this paper, we describe some theoretical and practical approaches that can be adopted for assessing and teaching parallel code correctness. The theoretical approaches include using formal methods (e.g., Petri nets and Hoare Logic). We apply these approaches on the test cases discussed in this paper. The practical approach involves teaching code correctness through demonstrations. For enabling this, we have not only curated a repository of parallel programs with commonly made logical errors but have also added a high-level interface on top of the repository for quickly comparing fixed and incorrect versions of the sample code in the repository, seeing the explanation text on the errors, and searching the repository on the basis of the causes and symptoms of logical errors. The work presented in this paper can potentially motivate the instructors in including the content on code correctness in their parallel programming courses and trainings.
parallel ultra low power computing is emerging as an enabler to meet the growing performance and energy efficiency demands in deeply embedded systems such as the end-nodes of the internet-of-things (IoT). The parallel...
详细信息
ISBN:
(纸本)9783981926323
parallel ultra low power computing is emerging as an enabler to meet the growing performance and energy efficiency demands in deeply embedded systems such as the end-nodes of the internet-of-things (IoT). The parallel nature of these systems however adds a significant degree of complexity as processing elements (PEs) need to communicate in various ways to organize and synchronize execution. Naive implementations of these central and non-trivial mechanisms can quickly jeopardize overall system performance and limit the achievable speedup and energy efficiency. To avoid this bottleneck, we present an event-based solution centered around a technology-independent, light-weight and scalable (up to 16 cores) synchronization and communication unit (SCU) and its integration into a shared-memory multicore cluster. Careful design and tight coupling of the SCU to the data interfaces of the cores allows to execute common synchronization procedures with a single instruction. Furthermore, we present hardware support for the common barrier and lock synchronization primitives with a barrier latency of only eleven cycles, independent of the number of involved cores. We demonstrate the efficiency of the solution based on experiments with a post-layout implementation of the multicore cluster in a 22 nm CMOS process where the SCU constitutes less than 2% of area overhead. Our solution supports parallel sections as small as 100 or 72 cycles with a synchronization overhead of just 10 %, an improvement of up to 14x or 30x with respect to cycle count or energy, respectively, compared to a test-and-set based implementation.
With the decline of Moore's law and the ever increasing availability of cheap massively parallel hardware, it becomes more and more important to embrace parallel programming methods to implement Agent-Based Simula...
详细信息
With the decline of Moore's law and the ever increasing availability of cheap massively parallel hardware, it becomes more and more important to embrace parallel programming methods to implement Agent-Based Simulations (ABS). This has been acknowledged in the field a while ago and numerous research on distributed parallel ABS exists, focusing primarily on parallel Discrete Event Simulation as the underlying mechanism. However, these concepts and tools are inherently difficult to master and apply and often an excess in case implementers simply want to parallelise their own, custom agent-based model implementation. However, with the established programming languages in the field, Python, Java and C++, it is not easy to address the complexities of parallel programming due to unrestricted side effects and the intricacies of low-level locking semantics. Therefore, in this paper we propose the use of a lock-free approach to parallel ABS using Software Transactional Memory (STM) in conjunction with the pure functional programming language Haskell, which in combination, removes some of the problems and complexities of parallel implementations in imperative approaches. We present two case studies, in which we compare the performance of lock-based and lock-free STM implementations in two different well known Agent-Based Models, where we investigate both the scaling performance under increasing number of CPU cores and the scaling performance under increasing number of agents. We show that the lock-free STM implementations consistently outperform the lock-based ones and scale much better to increasing number of CPU cores both on local hardware and on Amazon EC. Further, by utilizing the pure functional language Haskell we gain the benefits of immutable data and lack of unrestricted side effects guaranteed at compile-time, making validation easier and leading to increased confidence in the correctness of an implementation, something of fundamental importance and benefit in paral
We review the HPC training avenues for masters, Ph.D. students, postdocs and young faculty in Indian universities and research institutes. Their interest in HPC arises from their need to use it for research in their s...
详细信息
ISBN:
(纸本)9781728148946
We review the HPC training avenues for masters, Ph.D. students, postdocs and young faculty in Indian universities and research institutes. Their interest in HPC arises from their need to use it for research in their scientific domain having done some background courses in programming with Fortran/C/C++ and numerical methods. Very few Indian educational institutes offer a course/courses in parallel programming and the course is mostly offered in some of the high ranking engineering institutes. Even in the stream of Computer Science, the course if at all offered is an elective course. The non-Computer Science domain users that makeup as the majority of users of HPC in India do not possess sufficient background in Computer Science and gear up as HPC users mostly through self-study or short term training workshops that address their requirements. However, such training programmes are not a regular activity in India and cater to a broad audience with a varied level of aptitude in computing, programming, and simulation via mathematical modelling. In this paper, we discuss the current HPC education scenario and propose an education strategy to broaden the base of HPC users and to help researchers at all levels to effectively use HPC for their academic research and development work.
In today's environment, where every computer, including cell phones, are multicore, it is essential that students develop parallel programming skills. It remains a challenge to develop effective techniques for tea...
详细信息
ISBN:
(纸本)9781538655559
In today's environment, where every computer, including cell phones, are multicore, it is essential that students develop parallel programming skills. It remains a challenge to develop effective techniques for teaching parallel programing skills. Another challenge is finding time within already packed lectures to cover additional material. To that end, we investigate the effectiveness of using Project Based Learning (PBL) to teach parallel programming skills early in the curriculum by developing and incorporating a PBL module into CSc 3210 (Computer Organization and programming). This is a core course taken by all computer science majors and is a prerequisite to many of our senior-level classes. In our case study, 124 students are organized into 26 diverse groups, with four or five students per group, and assigned five project assignments, each of two-weeks duration. Given a Raspberry PI, students will explore its multicore architecture and create programs for shared memory parallelism using OpenMP and C language. Our results show that incorporating this PBL module has a significant and direct effect on the student's growth in parallel programming skills. As a side benefit, we also show that there is a direct improvement on a student's personal growth in terms soft skills, which is essential in the professional development and success in the workplace. By having students experience PBL in an early class, close to the midpoint of the academic program, it can serve as a mini-capstone project. Furthermore, students can collaboratively learn by themselves (through teamwork) and apply the fundamentals of parallel programming skills without the need for separate lectures, labs, or workshops.
Collaborative robots are being applied in a growing number of usage scenarios, but their adoption is slowed down by the high complexity of robot programming. As previous prototype studies have shown, block-based progr...
详细信息
ISBN:
(纸本)9781728122496
Collaborative robots are being applied in a growing number of usage scenarios, but their adoption is slowed down by the high complexity of robot programming. As previous prototype studies have shown, block-based programming environments can enable novice or end users to program industrial single-armed robots. Some existing block-based tools support parallel programming and therefore show potential to be used for multi-armed robot programming as well. We analyze their designs and argue how improved abstractions and visualizations could make multi-armed parallelism accessible to novice users. Based on this analysis, we then extract a list of features that a block-based environment designed for multi-armed robot programming should provide. Finally, we present our design vision for a novel programming environment for two-armed robots, show how it provides these features and discuss how it can enable both novices and experienced intermediate users to perform parallelized programming tasks.
暂无评论