A common pattern in high performance scientific computing is the structured grid pattern in which one or more elements of a matrix are computed as a stencil operation of other matrix neighbouring elements. Since there...
详细信息
ISBN:
(数字)9781728176284
ISBN:
(纸本)9781728176291
A common pattern in high performance scientific computing is the structured grid pattern in which one or more elements of a matrix are computed as a stencil operation of other matrix neighbouring elements. Since there are multiple options to efficiently implement this pattern on modern computing architectures, we provide a comparison of the performance of a number of parallel implementations on a multi-core system with GPU capabilities and also on a FPGA embedded inside a SoC. The application used for this case study implements the propagation of wireless signals in a bi-dimensional environment, considering reflections and signal attenuation. The parallel programming paradigms examined in this paper include CUDA, TBB, Rust, OpenMP, and HLS as hardware description paradigm, with CUDA proving to be the fastest implementation.
Computer technology, which continues to develop today, often has difficulties in meeting the needs of signal and image processing software. As a result of the developing technology, software needs larger memory and fa...
详细信息
ISBN:
(数字)9781728190907
ISBN:
(纸本)9781728190914
Computer technology, which continues to develop today, often has difficulties in meeting the needs of signal and image processing software. As a result of the developing technology, software needs larger memory and faster processor. parallel programming method has been developed to solve the speed problems of processors. In this study, OpenCL based image enhancement applications that can work in parallel on the graphics processor unit have been implemented.. The OpenCL architecture has been optimized to maximize the amount of acceleration. Appropriate image enhancement applications have been tested to observe that the designed algorithm and architecture are successful in simple or complex operations. In order to make sense of the speed gain, the same applications were developed with serial programming technique and the results obtained were compared with the applications developed in parallel. It is supported by the comparison results that parallel programming is better in terms of performance. Due to the parallel programming for the hardware used, it was observed that the calculation times were reduced by 1.58 times to 561 times.
Road recognition is one of essential information for determining an Autonomous Vehicle movement. Latest research has shown that machine learning could be used to obtain the information from images. Nevertheless, the s...
详细信息
ISBN:
(纸本)9781728180397
Road recognition is one of essential information for determining an Autonomous Vehicle movement. Latest research has shown that machine learning could be used to obtain the information from images. Nevertheless, the system could be improved by effectivity and efficiency. This research proposed finding better feature combinations and using Artificial Neural Network algorithm to build higher accuracy road detection model for better effectivity. Region of Interest module using heuristic method also applied to reduce computation for better efficiency. These three new modules are implemented and combined with road recognition module to become road recognition system. The proposed method performance then tested and compared with the latest research. The experiment results shown that Artificial Neural Network cannot increase the system effectiveness. Nonetheless, with right feature and region of interest module, the proposed system successfully gives better performance. The prototype has accuracy increased from F1-score 0,94 to 0,95 and speed increased from 99 to 112 frames processed per second.
Recent years have seen rapid growth in data-driven distributed systems, such as Hadoop MapReduce, Spark, and Dryad. However, the counterparts for high-performance or compute-intensive applications including large-scal...
详细信息
OpenMP allows developers to harness the power of shared memory multiprocessing in C and C++ applications, but the performance gained with OpenMP is highly sensitive to the underlying hardware, making performance porta...
详细信息
The emergence of heterogeneous processors such as GPUs provide massively parallel computing power but also exacerbate the difficulties of parallel programming. Although low-level programming methods such as CUDA and O...
详细信息
This work describes how to find 3D objects in 2D images. The images may contain various illumination conditions and backgrounds. Furthermore the distance and the rotation of the camera with respect to the object can b...
详细信息
This work describes how to find 3D objects in 2D images. The images may contain various illumination conditions and backgrounds. Furthermore the distance and the rotation of the camera with respect to the object can be arbitrary. The method described in this work provides a way to reduce computation time of the 3D object localization problem by searching only from the regions of the image that include a combination of the most common colors of the object. The accuracy and speed of the implementation is tested on images taken under various illuminations and backgrounds.
AI-powered edge devices currently lack the ability to adapt their embedded inference models to the ever-changing environment. To tackle this issue, Continual Learning (CL) strategies aim at incrementally improving the...
详细信息
OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory pa...
详细信息
The GPU usually handles the homogenous data parallel work, by taking advantage of its massive number of cores. In most of the applications, we use CUDA programming for utilizing the power of GPU. In data intensive hig...
详细信息
暂无评论