We describe two parallel analog VLSI architecturesthat integrate optical flow data obtained from arrays of elementary velocity sensors to estimate heading direction and time-to-contact. For heading direction computat...
详细信息
ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (Sy...
详细信息
ISBN:
(纸本)9781479989379
ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (System on Chip) are composed of different processing units, with different capabilities, and often with massively parallel computing unit. Due to the complexity of these SoCs, predicting if a given algorithm can be executed in real time on a given architecture is not trivial. In fact it is not a simple task for automotive industry actors to choose the most suited heterogeneous SoC for a given application. Moreover, embedding complex algorithms on these systems remains a difficult task due to heterogeneity, it is not easy to decide how to allocate parts of a given algorithm on the different computing units of a given SoC. In order to help automotive industry in embedding algorithms on heterogeneous architectures, we propose a novel approach to predict performances of image processingalgorithms applicable on different types of computing units. Our methodology is able to predict a more or less wide interval of execution time with a degree of confidence using only high level description of algorithms, and a few characteristics of computing units.
A social robot has to recognize human social intention in order to fully interact with him/her. People intention can be inferred by processing verbal and non-verbal communicative signs. In this work we describe an act...
详细信息
A social robot has to recognize human social intention in order to fully interact with him/her. People intention can be inferred by processing verbal and non-verbal communicative signs. In this work we describe an actions classification module embedded into a robot's cognitive architecture, contributing to the interpretation of users behavior. (C) 2018the Authors. Published by Elsevier Ltd. this is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the 8th Annual internationalconference on Biologically Inspired Cognitive architectures
Signal, image and Synthetic Aperture Radar imagery algorithms in recent time are used in a daily routine. Due to huge data and complexity, their processing is almost impossible in a real time. Often image processing a...
详细信息
ISBN:
(纸本)9781538669792
Signal, image and Synthetic Aperture Radar imagery algorithms in recent time are used in a daily routine. Due to huge data and complexity, their processing is almost impossible in a real time. Often image processingalgorithms are inherently parallel in nature, so they fit nicely into parallelarchitectures multicore Central processing Unit (CPU) and Graphics processing Unit GPUs. In this paper image processingalgorithms were evaluated, which are capable to execute in parallel manner on several platforms CPU and GPU. All algorithms were tested in TensorFlow, which is a novel framework for deep learning, but also for image processing. Relative speedups compared to CPU were given for all algorithms. TensorFlow GPU implementation can outperform multi-core CPUs for tested algorithms, obtained speedups range from 3.6 to 15 times.
GPUs (Graphics processing Units), traditionally used for 3D graphics calculations, have recently got an ability to perform general purpose calculations with a GPGPU (General Purpose GPU) technology. Moreover, GPUs can...
详细信息
ISBN:
(纸本)9780791855454
GPUs (Graphics processing Units), traditionally used for 3D graphics calculations, have recently got an ability to perform general purpose calculations with a GPGPU (General Purpose GPU) technology. Moreover, GPUs can be much faster than CPUs (Central processing Units) by performing hundreds or even thousands commands concurrently. this parallelprocessing allows the GPU achieving the extremely high performance but also requires using only highly parallelalgorithms which can provide enough commands on each clock cycle. this work formulates a methodology for selection of a right geometry representation and a data structure suitable for parallelprocessing on GPU. then the methodology is used for designing the 3-axis CNC milling simulation algorithm accelerated withthe GPGPU technology. the developed algorithm is validated by performing an experimental machining simulation and evaluation of the performance results. the experimental simulation shows an importance of an optimization process and usage of algorithmsthat provide enough work to GPU. the used test configuration also demonstrates almost an order of magnitude difference between CPU and GPU performance results.
In this paper we present our experience implementing domain decomposition preconditioners on vector architectures. In particular, we will focus on the solution of unstructured network equations arising from electrical...
详细信息
ISBN:
(纸本)9781450384414
In this paper we present our experience implementing domain decomposition preconditioners on vector architectures. In particular, we will focus on the solution of unstructured network equations arising from electrical power systems by preconditioning iterative algorithms withthe Additive Schwarz Method (ASM). the implementation will be carried out using the Julia programming language, which allows for easy prototyping and interfacing with GPU architecturesthanks to its multiple dispatch features. In our experiments, we will show the trade-off between device throughput and convergence of the iterative algorithm as the size of the domain varies, and determine optimal fronts of computational performance.
We present parallelalgorithms to find cut vertices, bridges, and Hamiltonian Path in bounded interval tolerance graphs. For a graph with n vertices, the algorithms require O (log n) time and use O (n) processors to r...
详细信息
ISBN:
(纸本)0769511538
We present parallelalgorithms to find cut vertices, bridges, and Hamiltonian Path in bounded interval tolerance graphs. For a graph with n vertices, the algorithms require O (log n) time and use O (n) processors to run OR. Concurrent Read Exclusive Write parallel RAM (CREW PRAM) model of computation. Our approach transforms the original graph problem to a problem in computational geometry. the total work done by the parallelalgorithms is comparable to the work done by the best known sequential algorithms for the more restricted class of graphs, namely, interval graphs and permutation graphs. In this sense our algorithms have optimal complementary.
this paper proposes a novel approach to program development for highly parallelarchitectures, primarily as far as debugging is concerned. the visual nature of the debugging stage, when dealing with image-processing a...
详细信息
Graph analysis now percolates society with applications ranging from advertising and transportation to medical research. the structure of graphs is becoming more complex every day while they are getting larger. the in...
详细信息
ISBN:
(纸本)9781450384414
Graph analysis now percolates society with applications ranging from advertising and transportation to medical research. the structure of graphs is becoming more complex every day while they are getting larger. the increasing size of graph networks has made many of the classical algorithms reasonably slow. Fortunately, CPU architectures have evolved to adjust to new and more complex problems in terms of core-level parallelism and vector-level parallelism (SIMD-level). In this paper, we are exploring how the modern vector architecture of CPUs can help with community detection, partitioning, and coloring kernels by studying two representatives algorithms. We consider the Intel SkylakeX and Cascade Lake architectures, which support gather and scatter instructions on 512-bit vectors. the existing vectorized graph algorithms of classic graph problems, such as BFS and PageRank, do not apply well to community detection;we show the support of gather and scatter are necessary. In particular for the implementation of the reduce-scatter patterns. We evaluate the performances achieved on the two architectures and conclude that good hardware support for scatter instructions is necessary to fully leverage the vector processing for graph partitioning problems.
A type of incomplete decomposition preconditioner based on local block factorization is considered, for the matrices derived from discreting 2-D or 3-D elliptic partial differential equations. We prove that the condit...
详细信息
ISBN:
(纸本)0769515126
A type of incomplete decomposition preconditioner based on local block factorization is considered, for the matrices derived from discreting 2-D or 3-D elliptic partial differential equations. We prove that the condition numbers of the preconditioned matrices are small, which means that the constructed preconditioners are effective. Further we consider an efficient parallel version of the preconditioner which depends only on a single integer argument. When its value is small, the iterations needed on multiple processors to converge is much more than on a single processor But withthe increase of this value, the difference decreases step by step. Finally, we have many experiments on a cluster of 6 PCs with main frequencies of 1.8GHz the results show that the local block factorizations constructed are efficient in serial implementation, if compared to some well-known effective preconditioners, and the parallel versions are efficient also.
暂无评论