Data Science applications represent a growing fraction of the scientific computing workload, many of them written in Python. The goal of this paper is to compare two popular parallel programming models, namely MPI and...
详细信息
ISBN:
(纸本)9781728174457
Data Science applications represent a growing fraction of the scientific computing workload, many of them written in Python. The goal of this paper is to compare two popular parallel programming models, namely MPI and Apache Spark for Python based Data Science applications. The paper presents communication and file I/O microbenchmarks to evaluate the MPI support for Python applications, and uses two applications use-cases from Natural Language processing to compare the performance of the MPI and the Spark versions. Our results indicate that the MPI version shows better scalability and performance than the PySpark version of the code. On the other hand, the MPI applications are significantly larger than their PySpark counterparts, and took significantly longer to develop due to the necessity to implement some of the built-in functionality provided by Spark.
As the explosive growth of energy consumption in current heterogeneous distributed systems, energy consumption constraint has been one of the primary design issues Minimizing the schedule length while satisfying the e...
详细信息
ISBN:
(纸本)9781538637906
As the explosive growth of energy consumption in current heterogeneous distributed systems, energy consumption constraint has been one of the primary design issues Minimizing the schedule length while satisfying the energy consumption constraint of parallelapplications is one of the most important problem which has been studied recently. Previous studies have proposed a preassignment approach which tried to presuppose the minimum energy consumption assignment for unassigned tasks to solve the problem based on the dynamic voltage and frequency scaling (DVFS) technique. However, the preassignment of unassigned tasks with the minimum energy consumption does not necessarily lead to the minimization of the schedule length. In this study, we propose an efficient scheduling algorithm using a relative average assignments for tasks. The results of experiments on two real parallelapplications validate that the proposed algorithm can obtain shorter schedule length while satisfying the energy consumption constraint compared with the state-ofthe-art methods in various situations.
distributed embedded systems are increasingly prevalent in numerous applications, and with pervasive network access within these systems, security is also a critical design concern. In this paper, we present a modelin...
详细信息
ISBN:
(纸本)9781509036820
distributed embedded systems are increasingly prevalent in numerous applications, and with pervasive network access within these systems, security is also a critical design concern. In this paper, we present a modeling and optimization framework for distributed reconfigurable embedded systems, which maps tasks on a distributed embedded system with the goal of optimizing latency, energy, and/or security across all computing and communication levels. The proposed modeling framework for dataflow applications integrates models for computational latency, security levels for inter-task and intra-task communication, communication latency, and power consumption. We evaluate the proposed methodology using a video-based object detection and tracking application.
This paper presents many typical problems that are encountered when executing large scale scientific applications over distributed architectures. The causes and effects of these problems are explained and a solution f...
详细信息
ISBN:
(纸本)9781424437511
This paper presents many typical problems that are encountered when executing large scale scientific applications over distributed architectures. The causes and effects of these problems are explained and a solution for some classes of scientific applications is also proposed. This solution is the combination of the asynchronous iteration model with JACEP2P-V2 which is a filly decentralized and fault tolerant platform dedicated to executing parallel asynchronous applications over volatile distributed architectures. We explain in detail how our approach deals whit each of these problems. Then we present two large scale numerical experiments that prove the efficiency and the robustness of our approach.
parallelapplications typically run. in batch mode, sometimes after long waits in a scheduler queue. In some situations, it would be desirable to interactively add new functionality to the running application, without...
详细信息
ISBN:
(纸本)9781424437511
parallelapplications typically run. in batch mode, sometimes after long waits in a scheduler queue. In some situations, it would be desirable to interactively add new functionality to the running application, without having to recompile and rerun it. For example, a debugger could upload code to perforin consistency checks, or a data analyst could upload code to perform new statistical tests. This paper presents a scalable technique to dynamically insert code into running parallelapplications. We describe and evaluate an implementation of this idea that allows a user to upload Python code into running parallelapplications. This uploaded code will run in concert with the main code. We prove the effectiveness of this technique in two case studies: parallel debugging to support introspection and data analysis of large cosmological datasets.
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. We compare four algorithms for executing such applications on GPUs. We demonstrate that for...
详细信息
ISBN:
(纸本)9781665435772
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. We compare four algorithms for executing such applications on GPUs. We demonstrate that for different distributions, problem sizes, and platforms the best strategy varies. We support our analytic results by extensive experiments on two different GPUs, from different sides of the performance spectrum: A high performance GPU (Nvidia Volta) and an energy saving system on chip (Jetson Nano).
In recent years, ieee 802.11 wireless networks become one of the most important components in wireless networks, since compared with other wireless technologies, ieee 802.11 devices are inexpensive and easier to be co...
详细信息
ISBN:
(纸本)9781424437511
In recent years, ieee 802.11 wireless networks become one of the most important components in wireless networks, since compared with other wireless technologies, ieee 802.11 devices are inexpensive and easier to be configured. To provide seamless roaming in the ieee 802.11 wireless networks, MAC layer handoff latency should be minimized to support real-time applications. This paper proposes a novel MAC layer handoff protocol over ieee 802.11 wireless networks by using an advertisement message. The experiment results illustrate that our solution can reduce MAC layer handoff latency to less than 50 ms required by real-tine applications.
Building large scientific applications by composing multiple smaller applications is one of current research directions. If the individual component applications are executed together on one compute node, we need to a...
详细信息
ISBN:
(纸本)9781728174457
Building large scientific applications by composing multiple smaller applications is one of current research directions. If the individual component applications are executed together on one compute node, we need to allocate resources to the components. While the operating system is already capable of doing this, it might be possible to get higher efficiency with a more specialized solution. Our goal is to describe opportunities and challenges faced by anyone designing such a system. We look into the specific case where a dynamic runtime system (or multiple such runtime systems) are used by the applications, since we believe the fundamental design of these runtime systems makes them especially suitable for the role. The ideas described in this paper are based on our prior experience in building such a runtime system and our early experiments with cooperating applications. We will focus on CPU core allocation and point out the importance of making non-uniform memory access architectures a prime consideration in such work.
The processing of graphs is of increasing importance in many applications, with the size of such graphs growing rapidly. As with scientific computing, there is a growing need to understand the relationship between sys...
详细信息
ISBN:
(纸本)9781509036820
The processing of graphs is of increasing importance in many applications, with the size of such graphs growing rapidly. As with scientific computing, there is a growing need to understand the relationship between system architectures and graph algorithms, especially as both the scale of the system and the size of the graph increase. To date there is one such graph benchmark that has several hundred comparative reports available, namely Breadth First Search, which has over the last few years fueled new algorithms that have improved typical performance very significantly. This paper suggests an additional benchmark based on the computation of neighborhoods and Jaccard coefficients that is of both a different intrinsic complexity and can be recast in multiple ways that may be suitable for different classes of real-world applications.
As processors and systems on chip in the embedded world increasingly become multicore, parallel programming remains a difficult, time-consuming and complicated task. End users who are not parallel programming experts ...
详细信息
ISBN:
(纸本)9781479942930
As processors and systems on chip in the embedded world increasingly become multicore, parallel programming remains a difficult, time-consuming and complicated task. End users who are not parallel programming experts have a need to exploit such processors and architectures, using high level programming languages, like Scilab or MATLAB. The ALMA toolset solves this problem: it takes Scilab code as input and produces parallel code for embedded multiprocessor systems on chip, using platform quasi-agnostic optimizations. The platform information is provided by an architecture description language designed for the purpose of a flexible system description as well as simulation. A hierarchical system description in combination with a parameterizable simulation environment allows fine-grained trade-offs between simulation performance and simulation accuracy.
暂无评论