Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will likely increase. T...
详细信息
ISBN:
(纸本)9780738110806
Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will likely increase. Therefore, designing our applications to be resilient is a critical concern in order to retain the reliability of results while meeting the constraints on power budgets. In this paper, we discuss software resilience in AMTs at both local and distributed scale. We choose HPX to prototype our resiliency designs. We implement two resiliency APIs that we expose to the application developers, namely task replication and task replay. Task replication repeats a task n-times and executes them asynchronously. Task replay reschedules a task up to n-times until a valid output is returned. Furthermore, we expose algorithm based fault tolerance (ABFT) using user provided predicates (e.g., checksums) to validate the returned results. We benchmark the resiliency scheme for both synthetic and real world applications at local and distributed scale and show that most of the added execution time arises from the replay, replication or data movement of the tasks and not the boilerplate code added to achieve resilience.
In heterogeneous computing systems, general purpose CPUs are coupled with co-processors of different architectures, like GPUs and FPGAs. Applications may take advantage of this heterogeneous device ensemble to acceler...
详细信息
ISBN:
(纸本)9781728199245
In heterogeneous computing systems, general purpose CPUs are coupled with co-processors of different architectures, like GPUs and FPGAs. Applications may take advantage of this heterogeneous device ensemble to accelerate execution. However, developing heterogeneous applications requires specific programming models, under which applications unfold into code components targeting different computing devices. OpenCL is one of the main programming models for heterogeneous applications, set apart from others due to its openness, vendor independence and support for different co-processors. In the original OpenCL application model, a heterogeneous application starts in a certain host node, and then resorts to the local co-processors attached to that host. Therefore, co-processors at other nodes, networked with the host node, are inaccessible and cannot be used to accelerate the application. rOpenCL (remote OpenCL) overcomes this limitation for a significant set of the OpenCL 1.2 API, offering OpenCL applications transparent access to remote devices through a TPC/IP based network. This paper presents the architecture and the most relevant implementation details of rOpenCL, together with the results of a preliminary set of reference benchmarks. These prove the stability of the current prototype and show that, in many scenarios, the network overhead is smaller than expected.
A large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the...
详细信息
A large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the difficult semantic issues raised by the use of non-monotonic aggregates in recursion are solved by Pre-Mappability (PreM), a property that assures that for a program with aggregates in recursion there is an equivalent aggregate-stratified program. In this paper we show that, by bringing together the formal abstract semantics of stratified programs with the efficient operational one of unstratified programs, PreM can also facilitate and improve their parallel execution. We prove that PreM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs. Therefore, PreM can be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models. In addition, we show that non-linear recursive queries can be evaluated using a hybrid stale synchronous parallel (SSP) model on distributed environments. After providing a formal correctness proof for the recursive query evaluation with PreM under this relaxed synchronization model, we present experimental evidence of its benefits.
We discuss a general approach to deriving the g-good-neighbor conditional diagnosability of interconnection networks. As demonstrative examples, we derive the 1- and 2-good-neighbor conditional diagnosabilities of the...
详细信息
We discuss a general approach to deriving the g-good-neighbor conditional diagnosability of interconnection networks. As demonstrative examples, we derive the 1- and 2-good-neighbor conditional diagnosabilities of the arrangement graphs under both the commonly adopted PMC and MM* model. We also derive the general g-good-neighbor conditional diagnosability of the (n, k)-star graphs under the PMC model for g is an element of [1, n - k], and under the MM* model for g is an element of [2, n - k], as well as that of the related graphs, such as the star graph, the alternating group graph, and the alternating group network. (C) 2018 Elsevier B.V. All rights reserved.
The genetic algorithm (GA), one of the best-known metaheuristic algorithms, has been extensively utilized in various fields of management science, operational research, and industrial engineering. The efficiency of GA...
详细信息
The genetic algorithm (GA), one of the best-known metaheuristic algorithms, has been extensively utilized in various fields of management science, operational research, and industrial engineering. The efficiency of GAs in solving large-scale optimization problems would be enhanced if the iterative processes required by the genetic operators can be implemented in a parallel and distributed computing architecture. Apache Hadoop has recently been one of the most popular systems for distributed storage and parallel processing of big data. By integrating the GA highly into Apache Hadoop, this study proposes an advanced GA parallel and distributed computing architecture that achieves the effectiveness and efficiency of GA evolution. Characterized by the sophisticated mechanism of dispatching the GA core operators into Apache Hadoop, the developed computing framework fits well with the cloud computing model. The presented GA parallelization architecture outperforms the state-of-the-art reference architectures according to the computational experiments where the testing instances of traveling salesman problems are employed. Our numerical experiments also demonstrate that the proposed architecture can readily be extended to Apache Spark. (C) 2020 Elsevier B.V. All rights reserved.
Daily precipitation has an enormous impact on human activity, and the study of how it varies over time and space, and what global indicators influence it, is of paramount importance to Australian agriculture. We analy...
详细信息
Daily precipitation has an enormous impact on human activity, and the study of how it varies over time and space, and what global indicators influence it, is of paramount importance to Australian agriculture. We analyze over 294 million daily rainfall measurements since 1876, spanning 17,606 sites across continental Australia. The data are not only large but also complex, and the topic would benefit from a common and publicly available statistical framework. We propose a Bayesian hierarchical mixture model that accommodates mixed discrete-continuous data. The observational level describes site-specific temporal and climatic variation via a mixture-of-experts model. At the next level of the hierarchy, spatial variability of the mixture weights' parameters is modeled by a spatial Gaussian process prior. A parallel and distributed Markov chain Monte Carlo sampler is developed which scales the model to large data sets. We present examples of posterior inference on the mixture weights, monthly intensity levels, daily temporal dependence, offsite prediction of the effects of climate drivers and long-term rainfall trends across the entire continent. Computer code implementing the methods proposed in this paper is available as an R package.
Because of the importance of Delaunay Triangulation in science and engineering, researchers have devoted extensive attention to parallelizing this fundamental algorithm. However, generating unstructured meshes for ext...
详细信息
Because of the importance of Delaunay Triangulation in science and engineering, researchers have devoted extensive attention to parallelizing this fundamental algorithm. However, generating unstructured meshes for extremely large point sets remains a barrier for scientists working with large scale or high resolution datasets. In our previous paper, we introduced a novel algorithm - Triangulation of Independent Partitions in parallel (TIPP) which divides the domain into many independent partitions that can be triangulated in parallel. However, using only a single master process introduced a performance bottleneck and inhibited scalability. In this paper, we refine our description of the original TIPP algorithm, and also extend TIPP to employ multiple master processes, distributing computational load across several machines. This new design improves both performance and scalability, and can produce 20 billion triangles using only 10 commodity nodes in under 30 minutes.
The paper describes the use of invented,developed,and tested in different countries of the high-level spatial grasp model and technology capable of solving important problems in large social systems,which may be repre...
详细信息
The paper describes the use of invented,developed,and tested in different countries of the high-level spatial grasp model and technology capable of solving important problems in large social systems,which may be represented as dynamic,self-evolving and distributed social *** approach allows us to find important solutions on a holistic level by spatial navigation and parallel pattern matching of social networks with active self-propagating scenarios represented in a special recursive *** approach effectively hides inside the distributed and networked language implementation traditional system management routines,often providing hundreds of times shorter and simpler high-level solution *** paper highlights the demands to efficient simulation of social systems,briefs the technology used,and provides some programming examples for solutions of practical problems.
This paper proposes dependable parallel multi-swarm canonical differential evolutionary particle swarm optimization with migration (DPMS-CDEEPSOw/M) for voltage and reactive power control (Volt/Var Control: VVC). The ...
详细信息
This paper proposes dependable parallel multi-swarm canonical differential evolutionary particle swarm optimization with migration (DPMS-CDEEPSOw/M) for voltage and reactive power control (Volt/Var Control: VVC). The proposed DPMS-CDEEPSOw/M is a general evolutionary computation technique for dependable and fast optimization applications. So far, applications of evolutionary computation methods such as Genetic Algorithm (GA), advanced PSOs, and Differential Evolution (DE) have been studied to VVC because VVC is a mixed integer nonlinear optimization programming (MINLP) problem. Considering recent progress of deregulated environment and large renewable energy penetration in power systems, Fast VVC is strongly needed. Utilization of parallel and distributed computing may solve the challenge. However, since power system is a social infrastructure, both fast computation and sustainable (dependable) control are eagerly awaited for VVC. A multi-swarm evolutionary computation technique is verified to improve quality of solution. Therefore, its application to VVC has a possibility to increase dependability. The simulation results indicate that the proposed DPMS-CDEEPSOw/M based method can speed up computation and improve dependability by comparison with the conventional dependable parallel C-DEEPSO based method. (C) 2019, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Heterogeneous architectures have emerged as a dominant platform, not only in high-performance computing but also in mobile processing, cloud computing and the Internet of Things (IoTs). Because the undergraduate compu...
详细信息
ISBN:
(纸本)9781728159751
Heterogeneous architectures have emerged as a dominant platform, not only in high-performance computing but also in mobile processing, cloud computing and the Internet of Things (IoTs). Because the undergraduate computer science curriculum is over-crowded in its current state, it is difficult to include a new course as a required part of the curriculum without increasing the number of hours to graduation. Integration of heterogeneous computing content requires a module-based approach, such as those undertaken for introducing parallel and distributed computing. In this paper, we present a teaching module that introduces CS1 students to some of the fundamental concepts in heterogeneous computing. The goal of this module is not to teach students how to program heterogeneous systems but rather expose them to this emerging trend and prepare them for material they are expected to see in future classes. Although concepts are covered at a high-level, the module emphasizes active learning and includes a lab assignment that provides students with hands-on experience with respect to task mapping and performance evaluation of a heterogeneous system. The module was implemented at our home institution in Fall 2018. Initial evaluation results are quite encouraging both in terms of learning outcomes and student engagement and interest.
暂无评论