Right on time for several large-scale experimental upgrades, the widely used and long-standing air shower simulation toolkit CORSIKA 7 will be updated to a "state of the art"C++ simulation framework. To meet...
详细信息
In this paper, we propose a distributed, unordered, label-correcting distance-1 Grundy (vertex) coloring algorithm, namely, Distributed Control (DC) coloring algorithm. Our algorithm eliminates the need for vertex-cen...
详细信息
ISBN:
(纸本)9781728136134
In this paper, we propose a distributed, unordered, label-correcting distance-1 Grundy (vertex) coloring algorithm, namely, Distributed Control (DC) coloring algorithm. Our algorithm eliminates the need for vertex-centric barriers and global synchronization for color refinement, relying only on atomic operations and local termination detection to update vertex color. DC proceeds optimistically, correcting the colors asynchronously as the algorithm progresses and depends on local ordering of tasks to minimize the execution of sub-optimal work. We implement our DC coloring algorithm and the well-known Jones-Plassmann algorithm and compare their performance with 4 different types of standard RMAT graphs and real-world graphs. We show that the elimination of waiting time of global and vertex-centric barriers and investing this time for local ordering leads to improved scaling for graphs with prominent power-law characteristics and densely interconnected local subgraphs.
In general, model-based design flows start from hardware-agnostic models and finally generate code based on the used model of computation (MoC). the generated code is then manually mapped with an additional non-trivia...
详细信息
ISBN:
(纸本)9789897583582
In general, model-based design flows start from hardware-agnostic models and finally generate code based on the used model of computation (MoC). the generated code is then manually mapped with an additional non-trivial deployment step onto the chosen target architecture. this additional manual step can break all correctness-by-construction guarantees of the used model-based design, in particular, if the chosen architecture employs a different MoC than the one used in the model. To automatically bridge this gap, we envisage a holistic model-based design framework for heterogeneous synthesis that allows the modeling of a system using a combination of different MoCs. Second, it integrates the standard hardware abstractions using the Open Computing Language (OpenCL) to promote the use of vendor-neutral heterogeneous architectures. Altogether, we envision an automatic synthesis that maps models using a combination of different MoCs on heterogeneous hardware architectures. this paper evaluates the feasibility of incorporating OpenCL as a standard hardware abstraction for such a framework. the evaluation is presented as a case study to map a synchronous application on different target architectures using the OpenCL specification.
the remaining useful lifetime (RUL) of assets plays a critical role in machine prognostics and health management (PHM). Accurate RUL predictions can reduce losses caused by equipment faults. Most existing data-driven ...
详细信息
ISBN:
(数字)9781665414890
ISBN:
(纸本)9781665430531
the remaining useful lifetime (RUL) of assets plays a critical role in machine prognostics and health management (PHM). Accurate RUL predictions can reduce losses caused by equipment faults. Most existing data-driven PHM methods rely on long short-term memory (LSTM) networks to model the relationship of time series data and RUL. However, because of the sequential nature of LSTM, it is not conducive to parallel computing. Herein, we propose the Deep & Attention Network, which uses a combination of convolutional neural networks and Attention methodologies instead of LSTM. In the proposed Deep & Attention Network, the Attention component models the temporal property, while the Deep component learns the effect of noise data. Experiments on NASA's Commercial Modular Aero- Propulsion System Simulation datasets demonstrate that the proposed network achieves a level of performance similar to that of other state-of-the-art RUL prediction models. Moreover, compared with LSTM-based methods, our Self-Attention-based method is conducive to parallel computing.
the proceedings contain 66papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Pipelining computation and optimization strategies for scaling GRO...
ISBN:
(纸本)9783319654812
the proceedings contain 66papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Pipelining computation and optimization strategies for scaling GROMACS on the sunway many-core processor;exploring FPGA-GPU heterogeneous architecture for ADAS;new huge page allocator with main memory compression;an fpga-based real-time moving object tracking approach;automatic acceleration of stencil codes in android devices;optimizing concurrent evacuation transfers for geo-distributed datacenters in SDN;energy-balanced and depth-controlled routing protocol for underwater wireless sensor networks;on the energy efficiency of sleeping and rate adaptation for network devices;private and efficient set intersection protocol for big data analytics;a topology-aware framework for graph traversals;adaptive traffic signal control with network-wide coordination;a novel parallel dual-character string matching algorithm on graphical processing units;distributed nonnegative matrix factorization with HALS algorithm on mapreduce;GPU-accelerated block-max query processing;KD-tree and healpix-based distributed cone search indexing system for multi-band astronomical catalogs;an out-of-core branch and bound method for solving the 0-1 knapsack problem on a GPU;the curve boundary design and performance analysis for DGM based on openFOAM;leakage-resilient password-based authenticated key exchange;secure encrypted data deduplication with ownership proof and user revocation;optimally selecting the timing of zero-day attack via spatial evolutionary game and performance analysis of a ternary optical computer based on M/M/1 queueing system.
Most existing airport detection methods for remote sensing image utilizes linear features of the airport runway insufficiently, and the computational complexity is high due to multi-scale anchor matching and global se...
详细信息
ISBN:
(纸本)9781728132990
Most existing airport detection methods for remote sensing image utilizes linear features of the airport runway insufficiently, and the computational complexity is high due to multi-scale anchor matching and global searching within full image. To solve this problem, an airport detection method based on saliency fusion of parallel lines and regions of interest is presented in this paper. Firstly the parallelism feature of the airport runway is extracted based on the prior knowledge of airport, and then regions of interest (ROI) is obtained according to the improved graph based visual saliency (GBVS). the airport is then located through the saliency fusion of parallel lines and regions of interest. Finally airport detection is achieved by transfer learning. the experimental results demonstrate that the proposed method is more advantageous than the comparison algorithms in terms of detection accuracy, processing speed, and false alarm rate. Moreover, the method only requires a small amount of samples for model training.
the complexity in automotive systems engineering is increasing over the last decade. In particular, new comfort functions as well as functions towards autonomous driving are reasons for this complexity. A new dimensio...
详细信息
ISBN:
(纸本)9789897583582
the complexity in automotive systems engineering is increasing over the last decade. In particular, new comfort functions as well as functions towards autonomous driving are reasons for this complexity. A new dimension is introduced by the usage of multi-core processors since there is a shift from sequential to parallelthinking in the different development phases. therefore, in this paper we present an approach for supporting the development process of distributed systems aligned withthe EAST-ADL approach, by using partitioning. We present an extension to EAST-ADL for partitioning and show ways how an automatic partitioning on different levels of abstraction can be achieved. these partitions can support system designers during the design process of functional architectures, by giving a first insight how well the functional components can be distributed on hardware in later stages of the development process.
Taylor Models present the polynomial generalization of the simple interval approach for rigorous computations of differential equations suggested by Martin Berz. these models are used to obtain better estimates for gu...
详细信息
Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement i...
详细信息
ISBN:
(纸本)9781450362955
Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable algorithms for distributed-memory parallel systems. To this end, we present a distributedmemory triangle counting algorithm, which uses a 2D cyclic decomposition to balance the computations and reduce the communication overheads. the algorithm structures its communication and computational steps such that it reduces its memory overhead and includes key optimizations that leverage the sparsity of the graph and the way the computations are structured. Experiments on synthetic and real-world graphs show that our algorithm obtains an average relative speedup range between 3.24 to 7.22 out of 10.56 across the datasets using 169 MPI ranks over the performance achieved by 16 MPI ranks. Moreover, we obtain an average speedup of 10.2 times on comparison with previously developed distributed-memory parallelalgorithms.
High efficiency video coding (HEVC) handles the ever increasing global video content with better compression efficiency. Complex partition and increased number of angular modes in intra prediction is one of the factor...
详细信息
ISBN:
(数字)9781728154756
ISBN:
(纸本)9781728154763
High efficiency video coding (HEVC) handles the ever increasing global video content with better compression efficiency. Complex partition and increased number of angular modes in intra prediction is one of the factors responsible to achieve this but at the expense of complex computations. In this work, we propose two hardware architectures, parallel Pipelined Architecture (PPA), and parallel Datapath Architecture (PDA) for the planar and direct current (DC) modes of intra prediction in HEVC. PPA supports a combination of pipelining and parallel schemes, reuses the multipliers to reduce the hardware resources. PDA includes datapath0 for planar mode and datapath1 for DC mode. they function in parallel. they support all the block sizes and implemented on Artix-7 field programmable gate array (FPGA). the implemented results show that PDA uses 20% fewer resources for block size 4, while PPA uses 20%, 46%, and 62% fewer resources for block sizes 8, 16, and 32, respectively. Detailed synthesis results show that PPA and PDA achieve a throughput of 8 pixels/clock cycle and hence can support 4K videos at 30 frames per second.
暂无评论