Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will likely increase. T...
详细信息
ISBN:
(纸本)9780738110806
Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will likely increase. Therefore, designing our applications to be resilient is a critical concern in order to retain the reliability of results while meeting the constraints on power budgets. In this paper, we discuss software resilience in AMTs at both local and distributed scale. We choose HPX to prototype our resiliency designs. We implement two resiliency APIs that we expose to the application developers, namely task replication and task replay. Task replication repeats a task n-times and executes them asynchronously. Task replay reschedules a task up to n-times until a valid output is returned. Furthermore, we expose algorithm based fault tolerance (ABFT) using user provided predicates (e.g., checksums) to validate the returned results. We benchmark the resiliency scheme for both synthetic and real world applications at local and distributed scale and show that most of the added execution time arises from the replay, replication or data movement of the tasks and not the boilerplate code added to achieve resilience.
The proceedings contain 17 papers. The special focus in this conference is on Artificial Life and Evolutionary Computation. The topics include: An analysis of cooperative coevolutionary differential evolution as neura...
ISBN:
(纸本)9783030450151
The proceedings contain 17 papers. The special focus in this conference is on Artificial Life and Evolutionary Computation. The topics include: An analysis of cooperative coevolutionary differential evolution as neural networks optimizer;Design and evaluation of a heuristic optimization tool based on evolutionary grammars using PSoCs;how word choice affects cognitive impairment detection by handwriting analysis: A preliminary study;modeling the coordination of a multiple robots using nature inspired approaches;nestedness temperature in the agent-artifact space: Emergence of hierarchical order in the 2000–2014 photonics techno-economic complex system;towards programmable chemistries;studying and simulating the three-dimensional arrangement of droplets;investigating three-dimensional arrangements of droplets;selecting for positive responses to knock outs in boolean networks;avalanches of perturbations in modular gene regulatory networks;the effects of a simplified model of chromatin dynamics on attractors robustness in random boolean networks with self-loops: An experimental study;a memetic approach for the orienteering problem;the detection of dynamical organization in cancer evolution models;the simulation of noise impact on the dynamics of a discrete chaotic map;exploiting distributed discrete-event simulation techniques for parallel execution of cellular automata.
Teaching topics related to high performance computing and parallel and distributed computing in a hands-on manner is challenging, especially at introductory, undergraduate levels. There is a participation challenge du...
详细信息
ISBN:
(纸本)9781728159751
Teaching topics related to high performance computing and parallel and distributed computing in a hands-on manner is challenging, especially at introductory, undergraduate levels. There is a participation challenge due to the need to secure access to a platform on which students can learn via hands-on activities, which is not always possible. There are also pedagogic challenges. For instance, any particular platform provided to students imposes constraints on which learning objectives can be achieved. These challenges become steeper as the topics being taught target more heterogeneous, more distributed, and/or larger platforms, as needed to prepare students for using and developing Cyberinfrastructure. To address the above challenges, we have developed a set of pedagogic activities that can be integrated piecemeal in university courses, starting at freshman levels. These activities use simulation so that students can experience hands-on any relevant application and platform scenarios. This is achieved by capitalizing on the capabilities of the WRENCH and SimGrid simulation frameworks. After describing our approach and the pedagogic activities currently available, we present results from an evaluation performed in an undergraduate university course.
Feature-driven in situ data reduction can overcome the I/O bottleneck that large simulations face in modern supercomputer architectures in a semantically meaningful way. In this work, we make use of pattern detection ...
The proceedings contain 10 papers. The topics discussed include: teaching parallel and distributed computing concepts in simulation with WRENCH;assessing the integration of parallel and distributed computing in early ...
ISBN:
(纸本)9781728159751
The proceedings contain 10 papers. The topics discussed include: teaching parallel and distributed computing concepts in simulation with WRENCH;assessing the integration of parallel and distributed computing in early undergraduate computer science curriculum using unplugged activities;successful systems in production graduate teaching;teaching concurrent and distributed programming with concepts over mathematical proofs;toward improving collaborative behaviour during competitive programming assignments;and teaching on demand: an HPC experience.
Peachy parallel assignments are high-quality assignments for teaching parallel and distributed computing. They have been successfully used in class and are selected on the basis of their suitability for adoption and f...
详细信息
ISBN:
(纸本)9781728159751
Peachy parallel assignments are high-quality assignments for teaching parallel and distributed computing. They have been successfully used in class and are selected on the basis of their suitability for adoption and for being cool and inspirational for students. Here we present a fire fighting simulation, thread-to-core mapping on NUMA nodes, introductory cloud computing, interesting variations on prefix-sum, searching for a lost PIN, and Big Data analytics.
The proceedings contain 10 papers. The topics discussed include: research on the resilience of the intelligent integrated energy systems;intellectual technology for computation control in the package of applied micros...
The paper demonstrates how the automatic theorem proving technique of the PCF calculus is applied to construct parallel composition of automata. parallel composition plays an essential role in the supervisory control ...
详细信息
The proceedings contain 10 papers. The special focus in this conference is on Massively Multi-agent Systems. The topics include: Injecting (Micro)Intelligence in the IoT: Logic-Based Approaches for (M)MAS;integrating ...
ISBN:
(纸本)9783030209360
The proceedings contain 10 papers. The special focus in this conference is on Massively Multi-agent Systems. The topics include: Injecting (Micro)Intelligence in the IoT: Logic-Based Approaches for (M)MAS;integrating Internet of Services and Internet of Things from a Multiagent Perspective;two-Layer Architecture for distributed Massively Multi-agent Systems;multi-agent Social simulation for Social Service Design;inverse Reinforcement Learning for Agents Behavior in a Crowd Simulator;FARM: Architecture for distributed Agent-Based Social simulations;diversity in Massively Multi-agent Systems: Concepts, Implementations, and Normal Accidents;CARAVAN: A Framework for Comprehensive simulations on Massive parallel Machines.
To minimize data movement, many parallel applications statically distribute computational tasks among the processes. However, modern simulations often encounters irregular computational tasks whose computational loads...
详细信息
ISBN:
(纸本)9781728159799
To minimize data movement, many parallel applications statically distribute computational tasks among the processes. However, modern simulations often encounters irregular computational tasks whose computational loads change dynamically at runtime or are data dependent. As a result, load imbalance among the processes at each step of simulation is a natural situation that must be dealt with at the programming level. The de facto parallel programming approach, flat MPI (one process per core), is hardly suitable to manage the lack of balance, imposing significant idle time on the simulation as processes have to wait for the slowest process at each step of simulation. One critical application for many domains is the LU factorization of a large dense matrix stored in the Block Low-Rank (BLR) format. Using the low-rank format can significantly reduce the cost of factorization in many scientific applications, including the boundary element analysis of electrostatic field. However, the partitioning of the matrix based on underlying geometry leads to different sizes of the matrix blocks whose numerical ranks change at each step of factorization, leading to the load imbalance among the processes at each step of factorization. We use BLR LU factorization as a test case to study the programmability and performance of five different programming approaches: (1) flat MPI, (2) Adaptive MPI (Charm++), (3) MPI + OpenMP, (4) parameterized task graph (PTG), and (5) dynamic task discovery (DTD). The last two versions use a task-based paradigm to express the algorithm;we rely on the PaRSEC runtime system to execute the tasks. We first point out programming features needed to efficiently solve this category of problems, hinting at possible alternatives to the MPI+X programming paradigm. We then evaluate the programmability of the different approaches, detailing our experience implementing the algorithm using each of the models. Finally, we show the performance result on the Intel Haswell-
暂无评论