检索结果-内蒙古大学图书馆

ACM/SPEC International Conference on Performance Engineering (ICPE)

作者： Gutierrez, Julian Agostini, Nicolas Bohm Kaeli, David Northeastern Univ Boston MA 02115 USA

ISBN: (纸本)9781450383318

Advances in deep neural networks have provided a significant improvement in accuracy and speed across a large range of Computer Vision (CV) applications. However, our ability to perform real-time CV on edge devices is severely restricted by their limited computing capabilities. In this paper we employ Vega, a parallel graph-based framework, to study the performance limitations of four heterogeneous edge-computing platforms, while running 12 popular deep learning CV applications. We expand the framework's capabilities, introducing two new performance enhancements: 1) an adaptive stage instance controller (ASI-C) that can improve performance by dynamically selecting the number of instances for a given stage of the pipeline;and 2) an adaptive input resolution controller (AIR-C) to improve responsiveness and enable real-time performance. These two solutions are integrated together to provide a robust real-time solution. Our experimental results show that ASI-C improves run-time performance by 1.4x on average across all heterogeneous platforms, achieving a maximum speedup of 4.3x while running face detection executed on a high-end edge device. We demonstrate that our integrated optimization framework improves performance of applications and is robust to changing execution patterns.

关键词： pipeline framework heterogeneous systems parallel algorithms run-time performance management SoC machine learning

来源：评论

学校读者我要写书评

暂无评论

Cunning ant system for quadratic assignment problem with local search and parallelization

引用

2nd International Conference on Pattern Recognition and Machine Intelligence

作者： Tsutsui, Shigeyoshi Hannan Univ Osaka 5808502 Japan

ISBN: (纸本)9783540770459

The previously proposed cunning ant system (cAS), a variant of the ACO algorithm, worked well on the TSP and the results showed that the cAS could be one of the most promising ACO algorithms. In this paper, we apply cAS to solving QAP. We focus our main attention on the effects of applying local search and parallelization of the cAS. Results show promising performance of cAS on QAP.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed clock synchronization over wireless networks: algorithms and analysis

Distributed clock synchronization over wireless networks: Al...

引用

45th IEEE Conference on Decision and Control

作者： Giridhar, Arvind Kumar, P. R. Univ Illinois CSL 1308 W Main St Urbana IL 61801 USA

ISBN: (纸本)9781424401703

We analyze the spatial smoothing algorithm of Solis, Borkar and Kumar [1] for clock synchronization over multi-hop wireless networks. In particular, for a model of a random wireless network we show that with high probability the error variance is O(1) as the number of nodes in the network increases. This provides support for the feasibility of time-based computing n large wireless networks. We also provide bounds on the settling time of a distributed algorithm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Work efficient parallel algorithm for constructing Huffman codes

Data Compression Conference Proceedings

引用

Data Compression Conference Proceedings 1999年 277-286页

作者： Milidiu, Ruy Luiz Laber, Eduardo Sany Pessoa, Artur Alves PUC-Rio Brazil

ES-ParHuff, a work efficient PRAM CREW algorithm, is presented for constructing Huffman codes. This parallel algorithm is work efficient and simple. These features could lead to very fast implementations that could be... 详细信息

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithm for the degree-constrained minimum spanning tree problem using nearest-neighbor chains

Proceedings of the International Symposium on Parallel Archi...

引用

Proceedings of the International Symposium on parallel Architectures, algorithms and Networks, I-SPAN 1999年 184-189页

作者： Mao, Li-Jen Deo, Narsingh Lang, Sheau-Dong Univ of Central Florida Orlando United States

The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computing the d-MST is NP-hard for every d in the range 2 &le d &le (n - 2) where n denotes the total number of nodes, several approximate algorithms have been proposed in the literature. We have previously proposed two approximate algorithms, TC-RNN and IR, for the d-MST problem. Our experimental results show that while the IR algorithm is faster, the TC-RNN algorithm consistently produces spanning trees with a smaller weight. In this paper, we propose a new algorithm, TC-NNC, which is an improved version of TC-RNN. Our experiments using randomly generated, weighted graphs as input demonstrate that the execution time of TC-NNC is smaller than that of TC-RNN, and is very close to that of IR. Further, the quality-of-solution of TC-NNC is better than that of IR and is very close to that of TC-RNN.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast and accurate RCS evaluation via high-performance parallel FDTD simulation

引用

JOURNAL OF ENGINEERING-JOE 2019年第21期2019卷 7322-7325页

作者： Zhou, Xiao Long Wang, Xin Yu Zhang, Jian Feng You, Jian Wei China Ship Dev & Design Ctr Wuhan 430064 Hubei Peoples R China Southeast Univ Sch Informat & Sci Engn Nanjing 210096 Jiangsu Peoples R China

In this study, a fast and accurate method to predict the radar cross-section (RCS) of large-scale and complicated shape targets is proposed based on a high-performance parallel finite difference time-domain (FDTD) numerical method. To this end, several most popular parallel computation methods [including OpenMP, graphics processing unit (GPU), and message-passing interface (MPI)] are discussed first. Based on this discussion, a novel MPI-OpenMP-GPU hybrid parallel computation scheme for FDTD is developed. Moreover, the corresponding load-balance parallel configuration is discussed as well. Since this hybrid parallel scheme combines the merits of existing parallel technologies, the computation performance is remarkably improved. The results show that the computation time of the RCS simulation of a large-scale target can be reduced from 3 days to 0.8 h, that is, similar to 98.9% time saving.

关键词： application program interfaces radar cross-sections parallel algorithms finite difference time-domain analysis message passing parallel processing radar computing MPI high-performance parallel FDTD simulation parallel computation methods large-scale target RCS simulation computation time computation performance parallel technologies hybrid parallel scheme corresponding load-balance parallel configuration novel MPI-OpenMP-GPU hybrid parallel computation scheme message-passing interface high-performance parallel finite difference time-domain numerical method complicated shape targets radar cross-section time 0 8 hour to 3 0 d

来源：评论

学校读者我要写书评

暂无评论

parallel algorithm based on Fast Fourier transforms

引用

Journal of Chemical and Pharmaceutical Research 2014年第3期6卷 188-195页

作者： Cui, Yuhuan Qu, Jingguo Zhou, Guanchen Qinggong College Heibei United University Tangshan China

Digital signal processing technology over the last decade with the digital computer, large scale integrated circuits and other advanced technologies, with rapid advances, has formed a strong technical and scientific vitality. Because it itself has a range of advantage, so effective in promoting the field of engineering technology transformation and subject development, more extensive application fields, in-depth, more and more people's attention. First, the fast Fourier transform (FFT) digital signal processing is the most basic computing, this article describes the beginning of the fast Fourier transform definition and the most widely used types. Then, the definition of parallel algorithms and matrix operations and matrix multiplication parallel algorithms parallel algorithms, parallel algorithms are also introduced performance metrics. Finally, a practical application, reflects the fast Fourier transform algorithm used in parallel, through the main parallel FFT algorithm is studied, expect that the parallel Fast Fourier Transform algorithm has a clear understanding.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

New parallel matrix multiplication algorithm on distributed-memory concurrent computers

Proceedings of the Conference on High Performance Computing ...

引用

Proceedings of the Conference on High Performance Computing on the Information Superhighway, HPC Asia'97 1997年 224-229页

作者： Choi, Jaeyoung Soongsil Univ Seoul Korea Republic of

We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DIMMA (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas;it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Kinetic-Monte-Carlo-Based parallel Evolution Simulation Algorithm of Dust Particles

引用

JOURNAL OF APPLIED MATHEMATICS 2014年第unknown期000卷 839726-1-839726-11页

作者： Hu, Xiaomei Xu, Zhifeng Cai, Hongxia Hu, Junjun Shanghai Univ Sch Mech Engn & Automat Shanghai Key Lab Mech Automat & Robot Shanghai 200072 Peoples R China

The evolution simulation of dust particles provides an important way to analyze the impact of dust on the environment. KMC-based parallel algorithm is proposed to simulate the evolution of dust particles. In the parallel evolution simulation algorithm of dust particles, data distribution way and communication optimizing strategy are raised to balance the load of every process and reduce the communication expense among processes. The experimental results show that the simulation of diffusion, sediment, and resuspension of dust particles in virtual campus is realized and the simulation time is shortened by parallel algorithm, which makes up for the shortage of serial computing and makes the simulation of large-scale virtual environment possible.

关键词： Particles parallel Kinetic-Monte-Carlo-Based parallel algorithms Dust particles particle resuspension simulating parallel evolution parallel Lines

来源：评论

学校读者我要写书评

暂无评论

A new parallel window-based implementation of the elliptic curve point multiplication in multi-core architectures

引用

International Journal of Network Security 2012年第2期14卷 101-108页

作者： Basu, Saikat Department of Computer Science and Engineering National Institute of Technology M. G. Avenue Durgapur - 713209 India

Point multiplication is an important computation in elliptic curve cryptography. Various methods like binary method and window method have been implemented in the past for performing efficient elliptic curve point multiplications. However, all these implementations rely on serial computations performed on uni-core architectures. A new approach on multi-core implementation has been proposed in this paper. Hence, a new parallel algorithm has been designed and implemented on machines with upto 8 cores. Later, experimental studies have been per-formed with di®erent window sizes and degrees of paral-lelism.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：