Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writin...
详细信息
Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallelapplications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.
With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to the...
详细信息
With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to their low diameter, low average shortest path length and high scalability. However, existing supercomputers still prefer torus and fat-tree topologies, because a number of existing parallel algorithms are optimized for them and the interconnect implementation is more straight-forward in terms of floor layout. In this paper, we investigate the performance of traditional and emerging parallel workloads on these network topologies, using a event-discrete simulation called SimGrid. We observe that random topology is better for Fourier Transform (FT), Graph500, Himeno benchmarks, and its improvement over the counterpart torus is 18 percent in average. Through this study, our recommendation is to use random topology in current and future supercomputers for these scientific and big-data analysis parallelapplications.
In 1994, the OSI(Open System Interconnection)/IEC 7498-1:1994 Model was published. The two-sided point-to-pointcommunication TCP/IP protocol became widely accepted as the standard building block for all distributed ap...
详细信息
ISBN:
(纸本)9781509028269
In 1994, the OSI(Open System Interconnection)/IEC 7498-1:1994 Model was published. The two-sided point-to-pointcommunication TCP/IP protocol became widely accepted as the standard building block for all distributedapplications. The OSI Standards states "Transport Layer: Reliable transmission of data segments between points on a network, including segmentation, acknowledgement and multiplexing".In the same year, "The Network is Reliable" was named thetop fallacy in distributed computing by Peter Deutsch andhis colleagues. Understanding these two conflicting descriptions for the seemingly same object has been the lasting myth for distributed and parallelprocessing. Investigating this myth under the lens of extreme scale computing revealed that it is impossible to implement reliablecommunications in the face of crashes of either the sender or the receiver in every point-to-point communication channel. The impossibility theory exposed two unacceptable risks: a) arbitrary data loss, and b) increasing probability of datalosses as the infrastructure expands in size. Therefore, thedirect use of point-to-point protocols in distributed and parallelapplications is the cause of the notorious scalability dilemma. The top fallacy allegation in distributed computingis supported by theory and practice. The scalability dilemma is responsible for the growing planned and unplanned downtimes in existing infrastructures. It is also responsible for the under reported data and service losses in mission critical applications. It makes large scale computation's reproducibility increasingly more difficult. The growing instability is also linked to our inability to quantify parallel application's scalability. This paper reports a statistic multiplexed computing (SMC) paradigm designed for solving the scalability and reproducibility problems. Preliminary computational results are reported in support of the proposed solution.
Sentinel-1 is the first of a family of satellites designed to provide a data stream for the European environmental monitoring program known as Copernicus. Sentinel-1 constellation has been specifically designed to per...
详细信息
Sentinel-1 is the first of a family of satellites designed to provide a data stream for the European environmental monitoring program known as Copernicus. Sentinel-1 constellation has been specifically designed to perform, over land, advanced Differential Interferometric Synthetic Aperture Radar (DInSAR) analyses for the investigation of Earth's surface displacements. In particular, owning to its 6-day revisit time and its innovative acquisition mode, which is referred to as Terrain Observation by Progressive Scans (TOPS) and is fundamental for guaranteeing a global spatial coverage, Sentinel-1 constellation is contributing to the creation of a framework for the exploitation of "Big Data" for Earth Observation (EO) applications. In this paper, we present an efficient and automatic implementation of the parallel Small BAseline Subset (P-SBAS) DInSAR algorithm, specifically intended for the processing of Sentinel-1 SAR data. The algorithm is able to run on distributed computing infrastructures by effectively exploiting a large number of resources, and allows the generation of ground displacement time-series. The aim of this paper is to show that it is possible to automatically and continuously process, in a short time frame, very large sequences of Sentinel-1 data, thus allowing us to perform advanced interferometric analyses at an unprecedented large scale. In addition, the proposed Sentinel-1 P-SBAS algorithm has also been tested on commercial public cloud computing platforms, such as those provided by the Amazon Web Services. The presented Sentinel-1 P-SBAS processing chain is well suited to build up operational services for the easy and rapid generation of advanced interferometric products, which can be very useful not only for scientific purposes but also for the risk management and the natural hazard monitoring. (C) 2016 The Authors. Published by Elsevier B.V.
Online Public Opinion Systems (OPOS) target at collecting, analyzing, summarizing and monitoring massive public opinions on the Internet in real time. Meanwhile, OPOS often have the ability to identify the key or sudd...
详细信息
ISBN:
(纸本)9781509036776
Online Public Opinion Systems (OPOS) target at collecting, analyzing, summarizing and monitoring massive public opinions on the Internet in real time. Meanwhile, OPOS often have the ability to identify the key or sudden events, and thus notify related people immediately for rapid responses to these events. As part of this endeavor, this paper introduces the architecture and techniques of an OPOS that has been used by several large enterprises. This self-designed OPOS generally contains data layer, computation layer and application layer from bottom to up. We first introduce the basic function and key techniques of each layer, and then present several typical yet important algorithms on the computation layer. Experimental results on real-world data validate the effectiveness of algorithms fixed in our system. Last but not the least, a system demonstration in a ship-building company is provided to justify the value of our OPOS for real enterprises.
Many applications regularly generate large graph data. Many of these graphs change dynamically, and analysis techniques for static graphs are not suitable in these cases. This thesis proposes an architecture to proces...
详细信息
In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based ...
详细信息
In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high-level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as general interpreters of user-defined functional tasks over node-local portions of distributed data structures. We envision coupling a simple yet powerful programming model with a lightweight, locality-aware distributed runtime as a promising step along the road towards high-performance data analytics, in particular under the perspective of the upcoming exascale era. We implemented the proposed approach in SkeDaTo, a prototyping C++ library of data-parallel skeletons exploiting cluster-as-accelerator at the bottom layer of the runtime software stack.
This paper describes the implementation of a preconditioned CG (Conjugate Gradient) method on GPUs and evaluates the performance compared with CPUs. Our CG method utilizes SP (Splitting-Up) preconditioner, which is su...
详细信息
This paper describes the implementation of a preconditioned CG (Conjugate Gradient) method on GPUs and evaluates the performance compared with CPUs. Our CG method utilizes SP (Splitting-Up) preconditioner, which is suitable for parallelprocessing because other dimensions except for one dimension are independent. In order to enhance the memory bandwidth to the global memory of GPUs, our implementation utilizes a pseudo matrix transposition before and after a tridiagonal matrix solver, which results in coalesced memory accesses. In addition, the number of pseudo matrix transpositions can be reduced to only one by using a rotation configuration technique. By these techniques, the speedups of our approach can be enhanced by up to 102.2%.
SkePU is a state-of-the-art skeleton programming library for high-level portable programming and efficient execution on heterogeneous parallel computer systems, with a publically available implementation for general-p...
详细信息
SkePU is a state-of-the-art skeleton programming library for high-level portable programming and efficient execution on heterogeneous parallel computer systems, with a publically available implementation for general-purpose multicore CPU and multi-GPU systems. This paper presents the design, implementation and evaluation of a new back-end of the SkePU skeleton programming library for the new low-power multicore processor Myriad2 by Movidius Ltd. This enables seamless code portability of SkePU applications across both HPC and embedded (Myriad2) parallel computing systems, with decent performance, on these architecturally very diverse types of execution platforms.
Globalization and cloud computing have allowed major strides forward in terms of communication possibilities, but it is also illuminating how many different resource options and formats exist access to which would dra...
详细信息
暂无评论