Open MP is a standard parallel programming language to develop parallel applications on shared memory machines. Open MP is very suitable for designing parallel algorithms for regular applications where the amount of w...
详细信息
Open MP is a standard parallel programming language to develop parallel applications on shared memory machines. Open MP is very suitable for designing parallel algorithms for regular applications where the amount of work is known apriori and therefore, distribution of work among the threads can be done at compile time. In irregular applications, the load changes dynamically at runtime and distribution of work among the threads can be done only at runtime. In the literature, it has been shown that Open MP produces poor performance for irreg-ular applications. In 2008, the Open MP 3.0 version introduced new features such as "tasks" to handle irregular computations. Not much work has gone into studying irregular algorithms in Open MP 3.0. In this paper, we consider one graph problem, the all pair shortest path problem and its implementation in Open MP 3.0. We show that for large number of vertices, the algorithm running on Open MP 3.0 surpasses the one on Open MP 2.5 by 1.6 times.
This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost asymmetric multicor...
详细信息
This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost asymmetric multicore processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big. LITTLE ARM processors, including (1) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion;(2) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features. The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers. Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on AMPs with a suitable task scheduling with that of a homogeneous symmetric architecture.
The article explores the possibility of computing parallel data compression using cubic spline. For example, ways to parallel the process of digital processing of seismic signals have been considered. The main perform...
详细信息
ISBN:
(纸本)9781728173863
The article explores the possibility of computing parallel data compression using cubic spline. For example, ways to parallel the process of digital processing of seismic signals have been considered. The main performance indicators of parallel algorithms have been compared with consecutive algorithms. Spline methods are a versatile signal processing tool. It is more accurate than other mathematical methods, information equality is faster, and maintenance costs are much lower. On the other hand, the equipment used in such systems must also meet high performance requirements. To achieve high speeds, parallel algorithms were developed using OpenMP and MPI technologies and implemented in the architecture of multi-core processors. A mathematical method for the parallel calculation of the coefficients of a cubic spline has been developed and a parallel signal processing algorithm has been developed on its basis. As an example, parallelization is a computation during seismic signal processing. The main indicators of efficiency and acceleration of the parallel algorithm were compared with the sequential algorithm. Explained the relevance of the use of parallel numerical systems, described the main approaches to the distribution of processes and methods of data processing, described the principles of parallel programming technology, studied the basic parameters of parallel algorithms for the initial calculation of the numerical value of cubic spline. The parallel algorithm considered for constructing the cubic spline of defect 1 as p - > n leads to the construction of a local cubic spline on each grid interval omega.
Computational storage devices enable in-storage processing of data in place. These devices contain 64-bit application processors and hardware accelerators that can help improving performance and saving power by reduci...
详细信息
ISBN:
(数字)9781728110851
ISBN:
(纸本)9781728110851
Computational storage devices enable in-storage processing of data in place. These devices contain 64-bit application processors and hardware accelerators that can help improving performance and saving power by reducing or eliminating data movement between host computers and storage units. This paper proposes a framework, named Stannis, for distributed in-storage training of deep neural networks on clusters of computational storage devices. This in-storage processing style of training ensures that private data never leaves the storage while fully controlling the public sharing of data. The Stannis framework distributes the workload based on the processing power of each worker by determining the proper batch size for each node. Stannis also ensures the availability of input data for all nodes to avoid rank stall while maximizing the utilization and overall processing speed. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption with no significant loss in accuracy.
暂无评论