检索结果-内蒙古大学图书馆

gpu-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2015年第1期37卷 C54-C71页

作者： Gremse, Felix Hoefter, Andreas Schwen, Lars Ole Kiessling, Fabian Naumann, Uwe Rhein Westfal TH Aachen Expt Mol Imaging D-52056 Aachen Germany Rhein Westfal TH Aachen LuFG Informat Software & Tools Computat Engn 12 D-52056 Aachen Germany Fraunhofer MEVIS D-28359 Bremen Germany

We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as gpus. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using subwarps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation as well as with three gpu-based implementations. Measurements performed for computing the matrix square for 21 sparse matrices show that the proposed method consistently outperforms the other methods. Analysis showed that the performance is achieved by utilizing the compression effect and the gpu caching architecture. An improved performance was also found for computing Galerkin products which are required by algebraic multigrid solvers. The performance was particularly good for seven-point stencil matrices arising in the context of diffuse optical imaging and the improved performance allows one to perform image reconstruction at higher resolution using the same computational resources.

关键词： sparse matrix-matrix multiplication gpu programming algebraic multigrid fluorescence-mediated tomography

来源：评论

学校读者我要写书评

暂无评论

Mixing tone mapping operators on the gpu by differential zone mapping based on psychophysical experiments

引用

SIGNAL PROCESSING-IMAGE COMMUNICATION 2016年第0期48卷 50-62页

作者： Banterle, Francesco Artusi, Alessandro Sikudova, Elena Ledda, Patrick Bashford-Rogers, Thomas Chalmers, Alan Bloj, Marina ISTI CNR Rome Italy Univ Girona Girona Spain MPC Ltd Heanor England Univ Warwick Coventry CV4 7AL W Midlands England Univ Bradford Bradford BD7 1DP W Yorkshire England Comenius Univ Bratislava Slovakia

In this paper, we present a new technique for displaying High Dynamic Range (HDR) images on Low Dynamic Range (LDR) displays in an efficient way on the gpu. The described process has three stages. First, the input image is segmented into luminance zones. Second, the tone-mapping operator (TMO) that performs better in each zone is automatically selected. Finally, the resulting tone mapping (TM) outputs for each zone are merged, generating the final LDR output image. To establish the TMO that performs better in each luminance zone we conducted a preliminary psychophysical experiment using a set of HDR images and six different TMOs. We validated our composite technique on several (new) HDR images and conducted a further psychophysical experiment, using an HDR display as the reference that establishes the advantages of our hybrid three-stage approach over a traditional individual TMO. Finally, we present a gpu version, which is perceptually equal to the standard version but with much improved computational performance. (C) 2016 Published by Elsevier B.V.

关键词： High dynamic range imaging Tone mapping operators Real-time tone mapping gpu programming

来源：评论

学校读者我要写书评

暂无评论

Near real-time shadow detection and removal in aerial motion imagery application

引用

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING 2018年第Jun.期140卷 104-121页

作者： Silva, Guilherme F. Carneiro, Grace B. Doth, Ricardo Amaral, Leonardo A. de Azevedo, Dario F. G. PUCRS Av Ipiranga 6681 Partenon Porto Alegre RS Brazil

This work presents a method to automatically detect and remove shadows in urban aerial images and its application in an aerospace remote monitoring system requiring near real-time processing. Our detection method generates shadow masks and is accelerated by gpu programming. To obtain the shadow masks, we converted images from RGB to CIELCh model, calculated a modified Specthem ratio, and applied multilevel thresholding. Morphological operations were used to reduce shadow mask noise. The shadow masks are used in the process of removing shadows from the original images using the illumination ratio of the shadow/non-shadow regions. We obtained shadow detection accuracy of around 93% and shadow removal results comparable to the state-of-the-art while maintaining execution time under real-time constraints. (C) 2017 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

关键词： Shadow detection Shadow extraction Image processing Specthem ratio Aerial images gpu programming

来源：评论

学校读者我要写书评

暂无评论

Medusa: Simplified Graph Processing on gpus

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2014年第6期25卷 1543-1552页

作者： Zhong, Jianlong He, Bingsheng Nanyang Technol Univ Sch Comp Engn Singapore 639798 Singapore

Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (gpu) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to write correct and efficient gpu programs and even more difficult for graph processing due to the irregularities of graph structures. To simplify graph processing on gpus, we propose a programming framework called Medusa which enables developers to leverage the capabilities of gpus by writing sequential C/C++ code. Medusa offers a small set of user-defined APIs and embraces a runtime system to automatically execute those APIs in parallel on the gpu. We develop a series of graph-centric optimizations based on the architecture features of gpus for efficiency. Additionally, Medusa is extended to execute on multiple gpus within a machine. Our experiments show that 1) Medusa greatly simplifies implementation of GPgpu programs for graph processing, with many fewer lines of source code written by developers and 2) the optimization techniques significantly improve the performance of the runtime system, making its performance comparable with or better than manually tuned gpu graph operations.

关键词： GPgpu gpu programming graph processing runtime framework

来源：评论

学校读者我要写书评

暂无评论

Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2014年第1期28卷 33-49页

作者： Vigueras, Guillermo Orduna, Juan M. Lozano, Miguel Cecilia, Jose M. Garcia, Jose M. Univ Valencia Dept Informat E-46100 Burjassot Valencia Spain Univ Valencia Dept Informat Networking & Virtual Environm Grp E-46100 Burjassot Valencia Spain Univ Murcia Dept Ingn & Tecnol Comp E-30001 Murcia Spain

The computing capabilities of current multi-core and many-core architectures have been used in crowd simulations for both enhancing crowd rendering and simulating continuum crowds. However, improving the scalability of crowd simulation systems by exploiting the inherent parallelism of these architectures is still an open issue. In this paper, we propose different parallelization strategies for the collision check procedure that takes place in agent-based simulations. These strategies are designed for exploiting the parallelism in both multi-core and many-core architectures like graphic processing units (gpus). As for the many-core implementations, we analyse the bottlenecks of a previous gpu version of the collision check algorithm, proposing a new gpu version that removes the bottlenecks detected. In order to fairly compare the gpu with the multi-core implementations, we propose a parallel CPU version that uses read--copy update (RCU), a new synchronization method which significantly improves performance. We perform a comparison study of these different implementations. On the one hand, the comparison study shows the first performance evaluation of RCU in a real user-space application with complex data structures. On the other hand, the comparison shows that the gpu greatly accelerates the collision test with respect to any other implementation optimized for multi-core CPUs. In addition, we analyse the efficiency of the different implementations taking into account the theoretical performance and power consumption of each platform. The evaluation results show that the gpu-based implementation consumes less energy and provides a minimum speedup of 45x with respect to any of the CPU-based implementations. Since interactivity is a hard constraint in crowd simulations, this acceleration of the collision check process represents a significant improvement in the overall system throughput and response time. Therefore, the simulations are significantly accelerated, and the

关键词： Multi-core programming gpu programming crowd simulations collision check procedure performance improvement

来源：评论

学校读者我要写书评

暂无评论

Meerkat: A Framework for Dynamic Graph Algorithms on gpus

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2024年第5-6期52卷 400-453页

作者： Concessao, Kevin Jude Cheramangalath, Unnikrishnan Dev, Ricky Nasre, Rupesh Indian Inst Technol Palakkad Dept Comp Sci & Engn Palakkad India Indian Inst Technol Madras Dept Comp Sci & Engn Chennai India

Graph algorithms are challenging to implement due to their varying topology and irregular access patterns. Real-world graphs are dynamic in nature and routinely undergo edge and vertex additions, as well as, deletions. Typical examples of dynamic graphs are social networks, collaboration networks, and road networks. Applying static algorithms repeatedly on dynamic graphs is inefficient. Further, due to the rapid growth of unstructured and semi-structured data, graph algorithms demand efficient parallel processing. Unfortunately, we know only a little about how to efficiently process dynamic graphs on massively parallel architectures such as gpus. Existing approaches to represent and process dynamic graphs are either not general or are inefficient. In this work, we propose a graph library for dynamic graph algorithms over a gpu-tailored graph representation and exploits the warp-cooperative work-sharing execution model. The library, named Meerkat, builds upon a recently proposed dynamic graph representation on gpus. This representation exploits a hashtable-based mechanism to store a vertex's neighborhood. Meerkat also enables fast iteration through a group of vertices, a pattern common and crucial for achieving performance in graph applications. Our framework supports dynamic edge additions and edge deletions, along with their batched versions. Based on the efficient iterative patterns encoded in Meerkat, we implement dynamic versions of popular graph algorithms such as breadth-first search, single-source shortest paths, triangle counting, PageRank, and weakly connected components. We evaluated our implementations over the ones in other publicly available dynamic graph data structures and frameworks: GPMA, Hornet, and faimGraph. Using a variety of real-world graphs, we observe that Meerkat significantly improves the efficiency of the underlying dynamic graph algorithm, outperforming these frameworks.

关键词： Dynamic graph algorithms gpu programming Parallel computing CUDA

来源：评论

学校读者我要写书评

暂无评论

High dynamic range imaging pipeline on the gpu

引用

JOURNAL OF REAL-TIME IMAGE PROCESSING 2015年第2期10卷 273-287页

作者： Akyuz, Ahmet Oguz Middle E Tech Univ TR-06531 Ankara Turkey

Use of high dynamic range (HDR) images and video in image processing and computer graphics applications is rapidly gaining popularity. However, creating and displaying high resolution HDR content on CPUs is a time consuming task. Although some previous work focused on real-time tone mapping, implementation of a full HDR imaging (HDRI) pipeline on the gpu has not been detailed. In this article we aim to fill this gap by providing a detailed description of how the HDRI pipeline, from HDR image assembly to tone mapping, can be implemented exclusively on the gpu. We also explain the trade-offs that need to be made for improving efficiency and show timing comparisons for CPU versus gpu implementations of the HDRI pipeline.

关键词： High dynamic range imaging gpu programming Color imaging Tone mapping Real-time imaging

来源：评论

学校读者我要写书评

暂无评论

gpu-based rendering of curved polygons using simplicial coverings

引用

COMPUTERS & GRAPHICS-UK 2008年第5期32卷 581-588页

作者： Rueda, A. J. Ruiz de Miras, J. Feito, F. R. Univ Jaen Dept Informat Jaen 23071 Spain

In this work, we describe a new algorithm for rendering polygons defined by cubic Bezier curve segments in current gpus. Unlike other approaches, our algorithm has a simple preprocessing that does not require computing tessellations, and can be implemented in gpu as a geometry shader. The polygon is decomposed into a set of simplices which are individually rasterized into the stencil buffer to recreate the shape that is finally rendered in the frame buffer. Each simplex is rasterized using a fragment shader that evaluates the implicit equation of the Bezier curve to discard the pixels that fall outside it. The proposed method is simple, fast, robust and general, as it can handle curved polygons with holes, several components or self-intersections. (C) 2008 Elsevier Ltd. All rights reserved.

关键词： Rasterization algorithms Bezier curves gpu programming

来源：评论

学校读者我要写书评

暂无评论

Visual vehicle detection scheme on low-powered embedded gpu

引用

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019年第2期36卷 1867-1877页

作者： Hussain, Basharat Nawaz, Shah Yousaf, Muhammad Haroon Univ Engn & Technol Taxila Dept Comp Engn C2VR Taxila Pakistan Univ Insubria Varese Italy

Real-time vehicle detection is one of the challenging problems for automotive and autonomous driving applications. Object detection using Deformable Parts Model (DPM) proved to be a promising approach providing higher detection accuracy. But the baseline DPM scheme spends 98% of its execution time in loop processing thus highlighting the drawback of higher computational cost for real time applications. In this paper, we have proposed a real time vehicle detection scheme for a low-powered embedded Graphics Processing Unit (gpu). The proposed scheme is based upon DPM approach using CUDA programming with different parallelization and loop unrolling schemes to reduce computational cost of DPM. Three loop unrolling schemes i.e. loosely unrolled, tightly unrolled and hybrid unrolled is proposed and implemented on two different datasets. Finally, we provided an optimal solution for vehicle detection with minimum execution time without having any impact on vehicle detection accuracy. We achieved a speedup of 3x to 5x as compared to state-of-the-art gpu implementation and 30x as compared to baseline CPU implementation of DPM on a low-powered automotive-grade embedded computing platform which features a Tegra K1 System on Chip (SOC), thus getting advantage of improved efficiency through parallel computation of CUDA.

关键词： Automous driving gpu programming vehicle detection computer vision

来源：评论

学校读者我要写书评

暂无评论

Real-time single camera natural user interface engine development

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2017年第9期76卷 11159-11175页

作者： Song, Wei Cai, Xingquan Xi, Yulong Cho, Seoungjae Cho, Kyungeun North China Univ Technol Coll Comp 5 Jinyuanzhuang Rd Beijing 100144 Peoples R China Dongguk Univ Seoul Dept Multimedia Engn 26 Pildong 3 Ga Seoul 100715 South Korea

Natural user interfaces (NUIs) provide human computer interaction (HCI) with natural and intuitive operation interfaces, such as using human gestures and voice. We have developed a real-time NUI engine architecture using a web camera as a means of implementing NUI applications. The system captures video via the web camera, implements real-time image processing using graphic processing unit (gpu) programming. This paper describes the architecture of the engine and the real-virtual environment interaction methods, such as foreground segmentation and hand gesture recognition. These methods are implemented using gpu programming in order to realize real-time image processing for HCI. To verify the efficacy of our proposed NUI engine, we utilized it in the development and implementation of several mixed reality games and touch-less operation applications, using the developed NUI engine and the DirectX SDK. Our results confirm that the methods implemented by the engine operate in real time and the interactive operations are intuitive.

关键词： NUI gpu programming Mixed reality Foreground segmentation Motion detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：