This paper aims at investigating the feasibility of using ParaView as visualization software for the analysis and optimization of parallel CFD codes39; performance. The currently available software tools for reading...
详细信息
ISBN:
(纸本)9783030483401;9783030483395
This paper aims at investigating the feasibility of using ParaView as visualization software for the analysis and optimization of parallel CFD codes' performance. The currently available software tools for reading profiling data do not match the generated measurements to the simulation's original mesh and somehow aggregate them (rather than showing them on a time-step basis). A plugin for the open-source performance tool Score-P has been developed, which intercept an arbitrary number of manually selected code regions (mostly functions) and send their respective measurements - amount of executions and cumulative time spent - to ParaView (through its in situ library, Catalyst), as if they were any other flow-related variable. Results show that (i) the impact of mesh partition algorithms on code performance and (ii) the load imbalances (and their eventual relationship to mesh size/simulation physics) become easier to investigate.
Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating comp...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.
Blockchain technologies have been very effective in processingdistributed transactions securely. They have many applications including in handling bitcoin cryptocurrencies and smart contracts. More recently the use o...
详细信息
ISBN:
(纸本)9781665440738;9781665411639
Blockchain technologies have been very effective in processingdistributed transactions securely. They have many applications including in handling bitcoin cryptocurrencies and smart contracts. More recently the use of blockchain has been explored for data science applications. This paper examines blockchain technologies and discusses their applications in data science and cyber security.
Thread level speculation (TLS) technology has gained substantial international recognition due to its unique parallel program execution. It uses the potential parallel execution of programs to improve the utilization ...
详细信息
Thread level speculation (TLS) technology has gained substantial international recognition due to its unique parallel program execution. It uses the potential parallel execution of programs to improve the utilization of multi-core resources. However, the kernel benchmark of TACLeBench has not effectively been analyzed using TLS parallelization. In response to this problem, we select 7 programs in the TACLeBench kernel benchmark to analyze its loop and procedure level speculation executions to measure their maximum potential parallelism. Furthermore, we discuss its runtime characteristics (thread size, speculative parallelism coverage, dependency feature) and the influence of program source code on speedup. Our experimental results illustrate that: 1) Most applications achieve impressive results. Bsort reaches 20.79x speedup in loop level speculation, and lms has 9.51x speedup in procedure level speculation; 2) By using TLS to accelerate the TACLeBench kernel benchmark, most applications effectively utilize computing resources from 4 to 16 cores; 3) The kernel benchmark is more suitable for developing parallelism in loop level speculation.
Skeletal parallelism is a model of parallelism where parallel constructs are provided to the programmer as usual patterns of parallel algorithms. High-level skeleton libraries often offer a global view of programs ins...
详细信息
ISBN:
(纸本)9783030389918;9783030389901
Skeletal parallelism is a model of parallelism where parallel constructs are provided to the programmer as usual patterns of parallel algorithms. High-level skeleton libraries often offer a global view of programs instead of the common Single Program Multiple Data view in parallel programming. A program is written as a sequential program but operates on parallel data structures. Most of the time, skeletons on a parallel data structure have counterparts on a sequential data structure. For example, the map function that applies a given function to all the elements of a sequential collection (e.g., a list) has a map skeleton counterpart that applies a sequential function to all the elements of a distributed collection. Two of the challenges a programmer faces when using a skeleton library that provides a wide variety of skeletons are: which are the skeletons to use, and how to compose them? These design decisions may have a large impact on the performance of the parallel programs. However, skeletons, especially when they do not mutate the data structure they operate on, but are rather implemented as pure functions, possess algebraic properties that allow to transform compositions of skeletons into more efficient compositions of skeletons. In this paper, we present such an automatic transformation framework for the Python skeleton library PySke and evaluate it on several example applications.
The proceedings contain 145 papers. The special focus in this conference is on Algorithms and Architectures for parallelprocessing. The topics include: An Improved Heterogeneous Dynamic List Schedule Algorithm;fastTh...
ISBN:
(纸本)9783030602383
The proceedings contain 145 papers. The special focus in this conference is on Algorithms and Architectures for parallelprocessing. The topics include: An Improved Heterogeneous Dynamic List Schedule Algorithm;fastThetaJoin: An Optimization on Multi-way Data Stream θ -join with Range Constraints;a distributed Framework for Online Stream Data Clustering;end-System Aware Large File Transfer Solution for Rich Media applications over 5G Mobile Networks;broad Learning System with Proportional-Integral-Differential Gradient Descent;Accelerating De Novo Assembler WTDBG2 on Commodity Servers;Typing Everywhere with an EMG Keyboard: A Novel Myo Armband-Based HCI Tool;accelerating Pattern Matching on Intel Xeon Phi Processors;Redistributing and Optimizing High-Resolution Ocean Model POP2 to Million Sunway Cores;Efficient Sorting and Join on NVM-Based Hybrid Memory;performance Optimization for Feature Extraction Section of DeepChem;principal Component Analysis for Fingerprint Positioning;priority Based Service Placement Strategy in Heterogeneous Mobile Edge Computing;VTC: A Scheduling Framework Between Soft Real-Time and Hard Real-Time on Multimedia OS;A BSP Based Approach for NFAs Intersection;tight Bound of parallel Request Latency for Erasure-Coded distributed Storage System;High-Performance Simulations on GPUs Using Adaptive Time Steps;Performance Modeling of Stencil Computation on SW26010 Processors;Optimizing B+ -Tree Searches on Coupled CPU-GPU Architectures;OCVM: Optimizing the Isolation of Virtual Machines with Open-Channel SSDs;parallel SCC Detection Based on Reusing Warps and Coloring Partitions on GPUs;CANRT: A Client-Active NVM-Based Radix Tree for Fast Remote Access;distributed and parallel Ensemble Classification for Big Data Based on Kullback-Leibler Random Sample Partition;SWAF: A distributed Solar WSN Adaptive Framework.
The proceedings contain 145 papers. The special focus in this conference is on Algorithms and Architectures for parallelprocessing. The topics include: An Improved Heterogeneous Dynamic List Schedule Algorithm;fastTh...
ISBN:
(纸本)9783030602444
The proceedings contain 145 papers. The special focus in this conference is on Algorithms and Architectures for parallelprocessing. The topics include: An Improved Heterogeneous Dynamic List Schedule Algorithm;fastThetaJoin: An Optimization on Multi-way Data Stream θ -join with Range Constraints;a distributed Framework for Online Stream Data Clustering;end-System Aware Large File Transfer Solution for Rich Media applications over 5G Mobile Networks;broad Learning System with Proportional-Integral-Differential Gradient Descent;Accelerating De Novo Assembler WTDBG2 on Commodity Servers;Typing Everywhere with an EMG Keyboard: A Novel Myo Armband-Based HCI Tool;accelerating Pattern Matching on Intel Xeon Phi Processors;Redistributing and Optimizing High-Resolution Ocean Model POP2 to Million Sunway Cores;Efficient Sorting and Join on NVM-Based Hybrid Memory;performance Optimization for Feature Extraction Section of DeepChem;principal Component Analysis for Fingerprint Positioning;priority Based Service Placement Strategy in Heterogeneous Mobile Edge Computing;VTC: A Scheduling Framework Between Soft Real-Time and Hard Real-Time on Multimedia OS;A BSP Based Approach for NFAs Intersection;tight Bound of parallel Request Latency for Erasure-Coded distributed Storage System;High-Performance Simulations on GPUs Using Adaptive Time Steps;Performance Modeling of Stencil Computation on SW26010 Processors;Optimizing B+ -Tree Searches on Coupled CPU-GPU Architectures;OCVM: Optimizing the Isolation of Virtual Machines with Open-Channel SSDs;parallel SCC Detection Based on Reusing Warps and Coloring Partitions on GPUs;CANRT: A Client-Active NVM-Based Radix Tree for Fast Remote Access;distributed and parallel Ensemble Classification for Big Data Based on Kullback-Leibler Random Sample Partition;SWAF: A distributed Solar WSN Adaptive Framework.
The proceedings contain 145 papers. The special focus in this conference is on Algorithms and Architectures for parallelprocessing. The topics include: An Improved Heterogeneous Dynamic List Schedule Algorithm;fastTh...
ISBN:
(纸本)9783030602475
The proceedings contain 145 papers. The special focus in this conference is on Algorithms and Architectures for parallelprocessing. The topics include: An Improved Heterogeneous Dynamic List Schedule Algorithm;fastThetaJoin: An Optimization on Multi-way Data Stream θ -join with Range Constraints;a distributed Framework for Online Stream Data Clustering;end-System Aware Large File Transfer Solution for Rich Media applications over 5G Mobile Networks;broad Learning System with Proportional-Integral-Differential Gradient Descent;Accelerating De Novo Assembler WTDBG2 on Commodity Servers;Typing Everywhere with an EMG Keyboard: A Novel Myo Armband-Based HCI Tool;accelerating Pattern Matching on Intel Xeon Phi Processors;Redistributing and Optimizing High-Resolution Ocean Model POP2 to Million Sunway Cores;Efficient Sorting and Join on NVM-Based Hybrid Memory;performance Optimization for Feature Extraction Section of DeepChem;principal Component Analysis for Fingerprint Positioning;priority Based Service Placement Strategy in Heterogeneous Mobile Edge Computing;VTC: A Scheduling Framework Between Soft Real-Time and Hard Real-Time on Multimedia OS;A BSP Based Approach for NFAs Intersection;tight Bound of parallel Request Latency for Erasure-Coded distributed Storage System;High-Performance Simulations on GPUs Using Adaptive Time Steps;Performance Modeling of Stencil Computation on SW26010 Processors;Optimizing B+ -Tree Searches on Coupled CPU-GPU Architectures;OCVM: Optimizing the Isolation of Virtual Machines with Open-Channel SSDs;parallel SCC Detection Based on Reusing Warps and Coloring Partitions on GPUs;CANRT: A Client-Active NVM-Based Radix Tree for Fast Remote Access;distributed and parallel Ensemble Classification for Big Data Based on Kullback-Leibler Random Sample Partition;SWAF: A distributed Solar WSN Adaptive Framework.
The proceedings contain 18 papers. The special focus in this conference is on Architecture of Computing Systems. The topics include: Engineering an optimized instruction set architecture for amidar processors;scaling ...
ISBN:
(纸本)9783030527938
The proceedings contain 18 papers. The special focus in this conference is on Architecture of Computing Systems. The topics include: Engineering an optimized instruction set architecture for amidar processors;scaling logic locking schemes to multi-module hardware designs;exploration of power domain partitioning with concurrent task mapping and scheduling for application-specific multi-core socs;scalable, decentralized battery management system based on self-organizing nodes;security improvements by separating the cryptographic protocol from the network stack onto a multi-mcu architecture;equally distributed bus-communication access rights for inter mcu communication using multimaster spi;on the evaluation of seu effects on axi interconnect within ap-socs;satellite onboard data reduction using a risc-v core inside an rtg4-based data processing pipeline;accelerating real-time applications with predictable work-stealing;evaluating dynamic task scheduling with priorities and adaptive aging in a task-based runtime system;an architecture for solving the eigenvalue problem on embedded fpgas;ECC memory for fault tolerant risc-v processors;3D optimisation of software application mappings on heterogeneous mpsocs;towards a priority-based task distribution strategy for an artificial hormone system;*** db: A concept for parallel data processing on heterogeneous hardware;investigating transactional memory for high performance embedded systems.
MapReduce has been widely used to process large-scale data in the past decade. Among the quantity of such cloud computing applications, we pay special attention to distributed mosaic methods based on numerous drone im...
详细信息
暂无评论