SVD factorization is a fundamental operation to solve problems in chemistry, biology, physics, and engineering. These problems are image processing, data mining, and big data, among others. There are several methods t...
详细信息
ISBN:
(纸本)9783031521850;9783031521867
SVD factorization is a fundamental operation to solve problems in chemistry, biology, physics, and engineering. These problems are image processing, data mining, and big data, among others. There are several methods to get SVD factorization. One of these methods involve the use of Householder transformation, so it is possible to parallelize this task. Furthermore, novel computer architectures are oriented to use heterogeneous computing, such as CPUs and GPUs, in order to increase the performance and reduce the energy consumption. In this work, an heterogeneousparallel implementation of SVD based on Householder transformation is presented. Some strategies for matrix partition are presented in order to scale the program in the use of GPU cards. The speedup is increased when several GPU cards are used.
Graphic processors offer an accessible solution for high-performance computing, addressing challenges across various fields. The Compute Unified Device Architecture (CUDA) programming model has emerged to enhance the ...
详细信息
Graphic processors offer an accessible solution for high-performance computing, addressing challenges across various fields. The Compute Unified Device Architecture (CUDA) programming model has emerged to enhance the performance of general-purpose applications on graphic processors. However, developing CUDA programs is far from straightforward, and developers' lack of experience in parallelprogramming has led to numerous issues. This article presents a structural testing model and criteria to improve the quality of CUDA programs. These criteria facilitate the selection of test cases and aid in identifying faults. The ValiCUDA tool was developed to implement and validate this testing model and criteria. This tool instruments and analyzes programs, generating the necessary elements for each testing criterion. It also facilitates program execution and evaluation of criterion coverage. A statistical validation experiment assessed these criteria' effectiveness, cost, and strength metrics. The results demonstrate that the criteria can identify nontrivial faults in CUDA programs and assist testers in their testing endeavors for such applications.
PSO (particle swarm optimization), is an intelligent search method for finding the best solution according to population state. Various parallel implementations of this algorithm have been presented for intensivecompu...
详细信息
PSO (particle swarm optimization), is an intelligent search method for finding the best solution according to population state. Various parallel implementations of this algorithm have been presented for intensivecomputing applications. The ALC-PSO algorithm (PSO with an aging leader and challengers) is an improved population-based procedure that increases convergence rapidity, compared to the traditional PSO. In this paper, we propose a low-power heterogeneousparallel implementation of ALC-PSO algorithm using OmpSs and CUDA, for execution on both CPU and GPU cores. This is the first effort to heterogeneousparallel implementing ALCPSO algorithm with combination of OmpSs and CUDA. This hybrid parallelprogramming approach increases the performance and efficiency of the intensive-computing applications. The proposed approach of this article is also useful and applicable for heterogeneousparallel execution of the other improved versions of PSO algorithm, on both CPUs and GPUs. The results demonstrate that the proposed approach provides higher performance, in terms of delay and power consumption, than the existence implementations of ALC-PSO algorithm.
暂无评论