检索结果-内蒙古大学图书馆

2013 2nd International Conference on Advances in Computer Science and Engineering(CSE 2013)

作者： Eid Albalaw Parimala Thulasiraman Ruppa Thulasiram iDepartment of Computer Science University Of Manitoba Department of Computer Science University Of Manitoba

Open MP is a standard parallel programming language to develop parallel applications on shared memory machines. Open MP is very suitable for designing parallel algorithms for regular applications where the amount of work is known apriori and therefore, distribution of work among the threads can be done at compile time. In irregular applications, the load changes dynamically at runtime and distribution of work among the threads can be done only at runtime. In the literature, it has been shown that Open MP produces poor performance for irreg-ular applications. In 2008, the Open MP 3.0 version introduced new features such as "tasks" to handle irregular computations. Not much work has gone into studying irregular algorithms in Open MP 3.0. In this paper, we consider one graph problem, the all pair shortest path problem and its implementation in Open MP 3.0. We show that for large number of vertices, the algorithm running on Open MP 3.0 surpasses the one on Open MP 2.5 by 1.6 times.

关键词： Open MP 3.0 All Pair Shortest Path task parallelization

来源：评论

学校读者我要写书评

暂无评论

Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big. LITTLE asymmetric architectures

引用

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS 2018年第9期46卷 1756-1776页

作者： Corpas, Alberto Costero, Luis Botella, Guillermo Igual, Francisco D. Garcia, Carlos Rodriguez, Manuel Univ Granada ETSIIT Dept Architecture & Comp Technol E-18071 Granada Spain Univ Complutense Madrid Dept Comp Architecture & Automat Madrid 28040 Spain

This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost asymmetric multicore processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big. LITTLE ARM processors, including (1) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion;(2) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features. The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers. Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on AMPs with a suitable task scheduling with that of a homogeneous symmetric architecture.

关键词： AMP big LITTLE ARM asymmetric architecture energy efficiency face detection Odroid XU4 OmpSs OpenMP Raspberry Pi task parallelization Viola-Jones algorithm

来源：评论

学校读者我要写书评

暂无评论

Parallel Algorithm For Constructing a Cubic Spline on Multi-Core Processors in a Cluster 14

Parallel Algorithm For Constructing a Cubic Spline on Multi-...

引用

14th IEEE International Conference on Application of Information and Communication Technologies (AICT)

作者： Zaynidinov, Hakimjon Mallayev, Oybek Nurmurodov, Javohir TUIT Comp Engn Tashkent Uzbekistan

ISBN: (纸本)9781728173863

The article explores the possibility of computing parallel data compression using cubic spline. For example, ways to parallel the process of digital processing of seismic signals have been considered. The main performance indicators of parallel algorithms have been compared with consecutive algorithms. Spline methods are a versatile signal processing tool. It is more accurate than other mathematical methods, information equality is faster, and maintenance costs are much lower. On the other hand, the equipment used in such systems must also meet high performance requirements. To achieve high speeds, parallel algorithms were developed using OpenMP and MPI technologies and implemented in the architecture of multi-core processors. A mathematical method for the parallel calculation of the coefficients of a cubic spline has been developed and a parallel signal processing algorithm has been developed on its basis. As an example, parallelization is a computation during seismic signal processing. The main indicators of efficiency and acceleration of the parallel algorithm were compared with the sequential algorithm. Explained the relevance of the use of parallel numerical systems, described the main approaches to the distribution of processes and methods of data processing, described the principles of parallel programming technology, studied the basic parameters of parallel algorithms for the initial calculation of the numerical value of cubic spline. The parallel algorithm considered for constructing the cubic spline of defect 1 as p - > n leads to the construction of a local cubic spline on each grid interval omega.

关键词： Parallel computing UMA NUMA SMP data parallelization task parallelization data processing MPI

来源：评论

学校读者我要写书评

暂无评论

Stannis: Low-Power Acceleration of DNN Training Using Computational Storage Devices 57

Stannis: Low-Power Acceleration of DNN Training Using Comput...

引用

57th ACM/IEEE Design Automation Conference (DAC)

作者： HeydariGorji, Ali Torabzadehkashi, Mahdi Rezaei, Siavash Bobarshad, Hossein Alves, Vladimir Chou, Pai H. UC Irvine Irvine CA 92697 USA NGD Syst Inc Irvine CA USA

ISBN: (数字)9781728110851

ISBN: (纸本)9781728110851

Computational storage devices enable in-storage processing of data in place. These devices contain 64-bit application processors and hardware accelerators that can help improving performance and saving power by reducing or eliminating data movement between host computers and storage units. This paper proposes a framework, named Stannis, for distributed in-storage training of deep neural networks on clusters of computational storage devices. This in-storage processing style of training ensures that private data never leaves the storage while fully controlling the public sharing of data. The Stannis framework distributes the workload based on the processing power of each worker by determining the proper batch size for each node. Stannis also ensures the availability of input data for all nodes to avoid rank stall while maximizing the utilization and overall processing speed. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption with no significant loss in accuracy.

关键词： distributed deep neural network training training distribution task parallelization computational storage near data processing privacy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：