检索结果-内蒙古大学图书馆

Workshop on Exascale MPI at Supercomputing Conference (ExaMPI)

作者： Sri Raj Paul Akihiro Hayashi Matthew Whitlock Seonmyeong Bak Keita Teranishi Jackson Mayo Max Grossman Vivek Sarkar Georgia Institute of Technology Atlanta USA Sandia National Laboratories Livermore USA

ISBN: (数字)9781665415613

ISBN: (纸本)9781665415620

Achieving fault tolerance is one of the significant challenges of exascale computing due to projected increases in soft/transient failures. While past work on software-based resilience techniques typically focused on traditional bulk-synchronous parallel programming models, we believe that Asynchronous Many-Task (AMT) programming models are better suited to enabling resiliency since they provide explicit abstractions of data and tasks which contribute to increased asynchrony and latency tolerance. In this paper, we extend our past work on enabling application-level resilience in single node AMT programs by integrating the capability to perform asynchronous MPI communication, thereby enabling resiliency across multiple nodes. We also enable resilience against fail-stop errors where our runtime will manage all re-execution of tasks and communication without user intervention. Our results show that we are able to add communication operations to resilient programs with low overhead, by offloading communication to dedicated communication workers and also recover from fail-stop errors transparently, thereby enhancing productivity.

关键词： Task analysis Resilience Runtime Computational modeling Fault tolerant systems Fault tolerance parallel programming

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of parallel Image Processing Operations

Performance Analysis of Parallel Image Processing Operations

引用

International Conference on Communications and Signal Processing

作者： Saurabh Zade Rushikesh Korde Rahul Sonone Medha Shah Department of Computer Science and Engineering Walchand College of Engineering Sangli India

ISBN: (数字)9781728149882

ISBN: (纸本)9781728149899

Image processing promotes many of the technological advancements these days. The main aspect while performing image processing operations is the time taken to deal with the application of different routines on these images. Thus, time is an important criterion for the efficiency of the systems. With the given situation, the idea of giving images to the processors and then depending upon code all the cores will be either dealing with one image and performing operations on the image or distributing the images to each core to perform the operations. This uses the idea of parallel programming i.e. the use of all computer resources that are cores here. The paper focuses on implementing different image-enhancing techniques integrated into a system that will execute it on single as well as multiple cores. The image processing operations implemented sequentially as well as parallelly in this paper are Image Blurring, Edge Detection, Contrast Stretching, and Image Negation the average speed for all the operations obtained when executed on multiple cores are 9.94, 9.54, 11.12, and 11.21 respectively.

关键词： Image edge detection parallel processing Libraries parallel programming Program processors Linux

来源：评论

学校读者我要写书评

暂无评论

An OpenMP-Based Algorithm for Multi-Nodes Computational of Super-Resolution

An OpenMP-Based Algorithm for Multi-Nodes Computational of S...

引用

2019 High Performance Computing and Computational Intelligence Conference, HPCCI 2019

作者： Ma, Jun Wang, Xiaoyong Li, Feng Yang, Xue Spatial Data Processing Technology Laboratory of Henan University Kaifeng China Qian Xuesen Laboratory of Space Technology Beijing China

A new common OpenMP based parallel programming method MPMC (multi-node paralleling model base on multiprocessor devices) is proposed and implemented for data separation based to accelerate Super-Resolution (SR) task. PanguOS, a common parallel programming system designed with MPMC, is deployed with Secure Shell (SSH) to control devices and Secure Copy (SCP) to transmit the data stream on Ubuntu 16.04, and it has a good performance for SR task with remote sensing images. Experiments with images from geostationary-orbit earth observing satellite GaoFen(GF)-4, the method proposed can achieve almost 2.95 times acceleration at PanguOS, deployed with 3 Jetson TX2s, than a single Jetson TX2. © Published under licence by IOP Publishing Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Deadlock detection for concurrent programs using resource footprints 12

Deadlock detection for concurrent programs using resource fo...

引用

12th IEEE/ACM International Conference on Utility and Cloud Computing, UCC Companion 2019

作者： Sherpa, Sonam Vicenciodelmoral, Abdi Zhao, Xinghui Washington State University VancouverWA United States

ISBN: (纸本)9781450370448

Concurrency bugs are difficult to diagnose and fix, due to the nature of the bugs and how they manifest themselves during execution. Traditional approaches for diagnosing concurrency bugs attempt to reproduce the exact execution schedule which reveals the bug, resulting in high runtime overhead. In this paper, we present our work in identifying concurrency bugs using resource consumption footprints. This is based on the observation that resource access and consumption patterns are critical indications of the run-time behavior of concurrent software, and can be used as a powerful mechanism to guide the software debugging process. We demonstrate that monitoring resource footprints at runtime can effectively help detect software bugs. Specifically, for MPI programs, a simple SVM classifier can detect deadlocks with high accuracy using only the CPU usage patterns. © 2019 Copyright held by the owner/author(s).

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Towards Automatically Optimizing PySke Programs

Towards Automatically Optimizing PySke Programs

引用

2019 International Conference on High Performance Computing and Simulation, HPCS 2019

作者： Philippe, Jolan Loulergue, Frederic Northern Arizona University School of Informatics Computing and Cyber Systems Flagstaff United States

ISBN: (纸本)9781728144849

Explicit parallel programming for shared and distributed memory architectures is an efficient way to deal with data intensive computations. However approaches such as explicit threads or MPI remain difficult solutions for most programmers. Indeed they have to face different constraints such as explicit inter-processors communications or data distribution. © 2019 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Modelling of parallel threads synchronization in hybrid MPI + threads programs 22

Modelling of parallel threads synchronization in hybrid MPI ...

引用

22nd International Conference on Soft Computing and Measurements, SCM 2019

作者： Tabakov, Andrey V. Paznikov, Alexey A. Saint-Petersburg Electrotechnical University 'LETI' St. Petersburg Russia

ISBN: (纸本)9781728136028

parallel computing is one of the top priorities in computer science. The main means of parallel processing information is a distributed computing system (CS)-a composition of elementary machines that interact through a communication medium. Modern distributed VSs implement thread-level parallelism (TLP) within a single computing node (multi-core CS with shared memory), as well as process-level parallelism (PLP) process-level parallelism for the entire distributed CS. The main tool for developing parallel programs for such systems is the MPI standard. The need to create scalable parallel programs that effectively use compute nodes with shared memory has determined the development of the MPI standard, which today supports the creation of hybrid multi-threaded MPI programs. A hybrid multi-threaded MPI program is the combination of the computational capabilities of processes and threads. The standard defines four types of multithreading: Single-one thread of execution;Funneled-a multi-threaded program, but only main thread can perform MPI operations;Serialized-only one thread at the exact same time can make a call to MPI functions;Multiple-each program flow can perform MPI functions at any time. The main task of the multiple mode is the need to synchronize the communication flows within each process. This paper presents an overview of the work that addresses the problem of synchronizing processes running on remote machines and synchronizing internal program threads. Method for synchronization of threads based on queues with weakened semantics of operations is proposed. © 2019 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

An OpenMP translator for the GAP8 MPSoC

arXiv

引用

arXiv 2020年

作者： Agostinho de Souza Filho, Reinaldo Cirilo do Nascimento, Diego V. Xavier-de-Souza, Samuel Universidade Federal do Rio Grande do Norte Natal-RN Brazil Instituto Federal do Rio Grande do Norte Natal-RN Brazil

One of the barriers to the adoption of parallel computing is the inherent complexity of its programming. The Open Multi-Processing (OpenMP) Application programming Interface (API) facilitates such implementations, providing high abstraction level directives. On another front, new architectures aimed at low energy consumption have been developed, such as the Greenwaves Technologies GAP8, a Multi-Processor System-on-Chip (MPSoC) based on the parallel Ultra Low Power (PULP) Platform. The GAP8 has an 8-core cluster and a Fabric Controller(FC) master core. parallel programming with GAP8 is very promising on the efficiency side, but its recent development and lack of a robust OS to handle threads and core scheduling complicate a simple implementation of the OpenMP APIs. This project implements a source to source translator that interprets a limited set of OpenMP directives, and is capable of generating parallel microcontroller code manipulating the cores directly. The preliminary results obtained in this work shows a reduction of the code size, if compared with the base implementation, proving the efficiency of the project to ease the programming of the GAP8. Further work is need in order to implement more OpenMP directives. Copyright © 2020, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Automatic code parallelization for data-intensive computing in multicore systems 8

Automatic code parallelization for data-intensive computing ...

引用

8th International Conference on Engineering, Mathematics and Physics, ICEMP 2019

作者： Subramanian, Ranjini Zhang, Hui University of Louisville Computer Science Department LouisvilleKY United States

A major driving force behind the increasing popularity of data science is the increasing need for data-driven analytics fuelled by massive amounts of complex data. Increasingly, parallel processing has become a cost-effective method for computationally large and data-intensive problems. Many existing applications are sequential in nature and if such applications are ported to multi-processor systems for execution, they would make use of only one core and the optimal usage of all cores is not guaranteed. Knowledge of parallel programming is necessary to ensure the use of processing power offered by multi-processor systems in order to achieve better performance. However, many users do not possess the skills and knowledge required to convert existing sequential code to parallel code to achieve speedups and scalability. In this paper, we introduce a framework that automatically transforms existing sequential code to parallel code while ensuring functional correctness using divide-and-conquer paradigm, so that the benefits offered by multi-core systems can be maximized. The paper will outline the implementation of the framework and demonstrate its usage with practical use cases. © Published under licence by IOP Publishing Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Practical parallelization of Scientific Applications

Practical Parallelization of Scientific Applications

引用

Euromicro Conference on parallel, Distributed and Network-Based Processing

作者： Valentina Cesare Iacopo Colonnelli Marco Aldinucci Physics and Computer Science Departments University of Turin Turin Italy

ISBN: (数字)9781728165820

ISBN: (纸本)9781728165837

This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We propose an automatable methodology to parallelize scientific applications designed with a purely sequential programming mindset, thus possibly using global variables, aliasing, random number generators, and stateful functions. We demonstrate the methodology by way of an astrophysical application, where we model at the same time the kinematic profiles of 30 disk galaxies with a Monte Carlo Markov Chain (MCMC), which is sequential by definition. The parallel code exhibits a 12 times speedup on a 48-core platform.

关键词： parallel processing Data models parallel programming Tools Software Task analysis

来源：评论

学校读者我要写书评

暂无评论

AMCilk: A Framework for Multiprogrammed parallel Workloads

AMCilk: A Framework for Multiprogrammed Parallel Workloads

引用

International Conference on High Performance Computing

作者： Zhe Wang Chen Xu Kunal Agrawal Jing Li Washington University in St. Louis New Jersey Institute of Technology

Modern parallel platforms, such as clouds or servers, are often shared among many different jobs. However, existing parallel programming runtime systems are designed and optimized for running a single parallel job, so it is generally hard to directly use them to schedule multiple parallel jobs without incurring high overhead and inefficiency. In this work, we develop AMCilk (Adaptive Multiprogrammed Cilk), a novel runtime system framework, designed to support multiprogrammed parallel workloads. AMCilk has client-server architecture where users can dynamically submit parallel jobs to the system. AMCilk has a single runtime system that runs these jobs while dynamically reallocating cores, last-level cache, and memory bandwidth among these jobs according to the scheduling policy. AMCilk exposes the interface to the system designer, which allows the designer to easily build different scheduling policies meeting the requirements of various application scenarios and performance metrics, while AMCilk transparently (to designers) enforces the scheduling policy. The primary feature of AMCilk is the low-overhead and responsive preemption mechanism that allows fast reallocation of cores between jobs. Our empirical evaluation indicates that AMCilk incurs small overheads and provides significant benefits on application-specific criteria for a set of 4 practical applications due to its fast and low-overhead core reallocation mechanism.

关键词： Performance evaluation Schedules Runtime parallel programming High performance computing Conferences Bandwidth

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：