Advances in deep neural networks have provided a significant improvement in accuracy and speed across a large range of Computer Vision (CV) applications. However, our ability to perform real-time CV on edge devices is...
详细信息
ISBN:
(纸本)9781450383318
Advances in deep neural networks have provided a significant improvement in accuracy and speed across a large range of Computer Vision (CV) applications. However, our ability to perform real-time CV on edge devices is severely restricted by their limited computing capabilities. In this paper we employ Vega, a parallel graph-based framework, to study the performance limitations of four heterogeneous edge-computing platforms, while running 12 popular deep learning CV applications. We expand the framework's capabilities, introducing two new performance enhancements: 1) an adaptive stage instance controller (ASI-C) that can improve performance by dynamically selecting the number of instances for a given stage of the pipeline;and 2) an adaptive input resolution controller (AIR-C) to improve responsiveness and enable real-time performance. These two solutions are integrated together to provide a robust real-time solution. Our experimental results show that ASI-C improves run-time performance by 1.4x on average across all heterogeneous platforms, achieving a maximum speedup of 4.3x while running face detection executed on a high-end edge device. We demonstrate that our integrated optimization framework improves performance of applications and is robust to changing execution patterns.
The previously proposed cunning ant system (cAS), a variant of the ACO algorithm, worked well on the TSP and the results showed that the cAS could be one of the most promising ACO algorithms. In this paper, we apply c...
详细信息
ISBN:
(纸本)9783540770459
The previously proposed cunning ant system (cAS), a variant of the ACO algorithm, worked well on the TSP and the results showed that the cAS could be one of the most promising ACO algorithms. In this paper, we apply cAS to solving QAP. We focus our main attention on the effects of applying local search and parallelization of the cAS. Results show promising performance of cAS on QAP.
We analyze the spatial smoothing algorithm of Solis, Borkar and Kumar [1] for clock synchronization over multi-hop wireless networks. In particular, for a model of a random wireless network we show that with high prob...
详细信息
ISBN:
(纸本)9781424401703
We analyze the spatial smoothing algorithm of Solis, Borkar and Kumar [1] for clock synchronization over multi-hop wireless networks. In particular, for a model of a random wireless network we show that with high probability the error variance is O(1) as the number of nodes in the network increases. This provides support for the feasibility of time-based computing n large wireless networks. We also provide bounds on the settling time of a distributed algorithm.
ES-ParHuff, a work efficient PRAM CREW algorithm, is presented for constructing Huffman codes. This parallel algorithm is work efficient and simple. These features could lead to very fast implementations that could be...
详细信息
ES-ParHuff, a work efficient PRAM CREW algorithm, is presented for constructing Huffman codes. This parallel algorithm is work efficient and simple. These features could lead to very fast implementations that could be attractive for practical purposes.
The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computin...
详细信息
The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computing the d-MST is NP-hard for every d in the range 2 &le d &le (n - 2) where n denotes the total number of nodes, several approximate algorithms have been proposed in the literature. We have previously proposed two approximate algorithms, TC-RNN and IR, for the d-MST problem. Our experimental results show that while the IR algorithm is faster, the TC-RNN algorithm consistently produces spanning trees with a smaller weight. In this paper, we propose a new algorithm, TC-NNC, which is an improved version of TC-RNN. Our experiments using randomly generated, weighted graphs as input demonstrate that the execution time of TC-NNC is smaller than that of TC-RNN, and is very close to that of IR. Further, the quality-of-solution of TC-NNC is better than that of IR and is very close to that of TC-RNN.
In this study, a fast and accurate method to predict the radar cross-section (RCS) of large-scale and complicated shape targets is proposed based on a high-performance parallel finite difference time-domain (FDTD) num...
详细信息
In this study, a fast and accurate method to predict the radar cross-section (RCS) of large-scale and complicated shape targets is proposed based on a high-performance parallel finite difference time-domain (FDTD) numerical method. To this end, several most popular parallel computation methods [including OpenMP, graphics processing unit (GPU), and message-passing interface (MPI)] are discussed first. Based on this discussion, a novel MPI-OpenMP-GPU hybrid parallel computation scheme for FDTD is developed. Moreover, the corresponding load-balance parallel configuration is discussed as well. Since this hybrid parallel scheme combines the merits of existing parallel technologies, the computation performance is remarkably improved. The results show that the computation time of the RCS simulation of a large-scale target can be reduced from 3 days to 0.8 h, that is, similar to 98.9% time saving.
Digital signal processing technology over the last decade with the digital computer, large scale integrated circuits and other advanced technologies, with rapid advances, has formed a strong technical and scientific v...
详细信息
Digital signal processing technology over the last decade with the digital computer, large scale integrated circuits and other advanced technologies, with rapid advances, has formed a strong technical and scientific vitality. Because it itself has a range of advantage, so effective in promoting the field of engineering technology transformation and subject development, more extensive application fields, in-depth, more and more people's attention. First, the fast Fourier transform (FFT) digital signal processing is the most basic computing, this article describes the beginning of the fast Fourier transform definition and the most widely used types. Then, the definition of parallel algorithms and matrix operations and matrix multiplication parallel algorithmsparallel algorithms, parallel algorithms are also introduced performance metrics. Finally, a practical application, reflects the fast Fourier transform algorithm used in parallel, through the main parallel FFT algorithm is studied, expect that the parallel Fast Fourier Transform algorithm has a clear understanding.
We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DI...
详细信息
We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DIMMA (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas;it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.
The evolution simulation of dust particles provides an important way to analyze the impact of dust on the environment. KMC-based parallel algorithm is proposed to simulate the evolution of dust particles. In the paral...
详细信息
The evolution simulation of dust particles provides an important way to analyze the impact of dust on the environment. KMC-based parallel algorithm is proposed to simulate the evolution of dust particles. In the parallel evolution simulation algorithm of dust particles, data distribution way and communication optimizing strategy are raised to balance the load of every process and reduce the communication expense among processes. The experimental results show that the simulation of diffusion, sediment, and resuspension of dust particles in virtual campus is realized and the simulation time is shortened by parallel algorithm, which makes up for the shortage of serial computing and makes the simulation of large-scale virtual environment possible.
Point multiplication is an important computation in elliptic curve cryptography. Various methods like binary method and window method have been implemented in the past for performing efficient elliptic curve point mul...
详细信息
Point multiplication is an important computation in elliptic curve cryptography. Various methods like binary method and window method have been implemented in the past for performing efficient elliptic curve point multiplications. However, all these implementations rely on serial computations performed on uni-core architectures. A new approach on multi-core implementation has been proposed in this paper. Hence, a new parallel algorithm has been designed and implemented on machines with upto 8 cores. Later, experimental studies have been per-formed with di®erent window sizes and degrees of paral-lelism.
暂无评论