ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (Sy...
详细信息
ISBN:
(纸本)9781479989379
ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (System on Chip) are composed of different processing units, with different capabilities, and often with massively parallel computing unit. Due to the complexity of these SoCs, predicting if a given algorithm can be executed in real time on a given architecture is not trivial. In fact it is not a simple task for automotive industry actors to choose the most suited heterogeneous SoC for a given application. Moreover, embedding complex algorithms on these systems remains a difficult task due to heterogeneity, it is not easy to decide how to allocate parts of a given algorithm on the different computing units of a given SoC. In order to help automotive industry in embedding algorithms on heterogeneous architectures, we propose a novel approach to predict performances of image processingalgorithms applicable on different types of computing units. Our methodology is able to predict a more or less wide interval of execution time with a degree of confidence using only high level description of algorithms, and a few characteristics of computing units.
A real-time emotional architecture (RTEA) for building parallel robotic applications is presented. RTEA allows the application developer to focus in the design and implementation of the agent processes, because the ar...
详细信息
ISBN:
(纸本)9783642131356
A real-time emotional architecture (RTEA) for building parallel robotic applications is presented. RTEA allows the application developer to focus in the design and implementation of the agent processes, because the architecture itself solves, in an autonomous way the decision about the attention to be paid to each of these processes. From the functional point of view, an RTEA selects and adapts its objectives depending on its physical (actuators) and its mental (processing) capabilities. this characteristic makes the architecture a useful solution in such applications that have to deal with several simultaneous tasks, that has real-time constraints, and where the objectives are defined in a flexible way. From the viewpoint of the design and development of applications, RTEA defines its different entities as independent modules. this modularity facilitates the programmer the development of each part of the project. To control the processing capacity of the agent and to guarantee the fulfilment of the temporal constraints of the processes. RTEA has been implemented in a real-time kernel (rt-linux). Mobile robot Experiments have been carried out to show how emotional system influence the mental organisation of the robot when it performs navigational tasks under different environmental conditions.
A message-passing multicomputer is presented, and its application to image processing and reconstruction is outlined. the multicomputer may be seen as a one-dimensional array of computing nodes with bidirectional shuf...
详细信息
A message-passing multicomputer is presented, and its application to image processing and reconstruction is outlined. the multicomputer may be seen as a one-dimensional array of computing nodes with bidirectional shuffle and shift connections. the resulting shuffle-shift machine is well suited for tasks like image processing and image reconstruction. A sample shuffle-shift machine has been built using transputers. the hardware and software aspects of this implementation are described, and benchmark results obtained with a number of image-oriented algorithms are included. the shuffle-shift machine is compared with related parallel computer architectures, and a two-dimensional generalization is indicated.
Signal, image and Synthetic Aperture Radar imagery algorithms in recent time are used in a daily routine. Due to huge data and complexity, their processing is almost impossible in a real time. Often image processing a...
详细信息
ISBN:
(纸本)9781538669792
Signal, image and Synthetic Aperture Radar imagery algorithms in recent time are used in a daily routine. Due to huge data and complexity, their processing is almost impossible in a real time. Often image processingalgorithms are inherently parallel in nature, so they fit nicely into parallelarchitectures multicore Central processing Unit (CPU) and Graphics processing Unit GPUs. In this paper image processingalgorithms were evaluated, which are capable to execute in parallel manner on several platforms CPU and GPU. All algorithms were tested in TensorFlow, which is a novel framework for deep learning, but also for image processing. Relative speedups compared to CPU were given for all algorithms. TensorFlow GPU implementation can outperform multi-core CPUs for tested algorithms, obtained speedups range from 3.6 to 15 times.
A system for dynamic intelligent scheduling and control (DISC) of reconfigurable parallel processors is presented. the purpose of the system is to provide a rapid prototyping capability for computer vision/image proce...
详细信息
A system for dynamic intelligent scheduling and control (DISC) of reconfigurable parallel processors is presented. the purpose of the system is to provide a rapid prototyping capability for computer vision/image processing tasks. the scheduler particularly addresses the problems of algorithms with execution times that depend on the image data and processing scenarios that vary dynamically based on the input image. Since conventional scheduling methods cannot propose schedules for most masks of this type, a dynamic controller is used to schedule the task and reconfigure the machine on the fly. this dynamic scheduling system attempts to balance the overall processing scenario withthe needs of the individual routines that make up the task. the implementation of this system is discussed, with emphasis on the scheduling heuristics and the use of the system for prototyping computer vision/image processing tasks. Testing was done on a number of tasks that exercised different aspects of the scheduling strategy. the schedules determined by DISC have an average tiling percentage of 77% and an average scheduling overhead of only 0.1% of the total task execution time.
In this paper we present our experience implementing domain decomposition preconditioners on vector architectures. In particular, we will focus on the solution of unstructured network equations arising from electrical...
详细信息
ISBN:
(纸本)9781450384414
In this paper we present our experience implementing domain decomposition preconditioners on vector architectures. In particular, we will focus on the solution of unstructured network equations arising from electrical power systems by preconditioning iterative algorithms withthe Additive Schwarz Method (ASM). the implementation will be carried out using the Julia programming language, which allows for easy prototyping and interfacing with GPU architecturesthanks to its multiple dispatch features. In our experiments, we will show the trade-off between device throughput and convergence of the iterative algorithm as the size of the domain varies, and determine optimal fronts of computational performance.
Scalability has been used extensively as a de factor performance criterion for evaluating parallelalgorithms and architectures. In this paper, the relation between scalability and execution time is carefully studied....
详细信息
Scalability has been used extensively as a de factor performance criterion for evaluating parallelalgorithms and architectures. In this paper, the relation between scalability and execution time is carefully studied. Results show that isospeed scalability well characterizes the variation of execution time. three algorithms from scientific computing are implemented on an Intel Paragon and an IBM SP2 parallel computer. Experimental and theoretical results show that scalability is an important, distinct metric for parallel and distributed systems, and may be as important as execution time in a scalable parallel and distributed environment.
this paper proposes a novel approach to program development for highly parallelarchitectures, primarily as far as debugging is concerned. the visual nature of the debugging stage, when dealing with image-processing a...
详细信息
the proceedings contain 8 papers. the topics discussed include: accelerating domain propagation: an efficient GPU-parallel algorithm over sparse matrices;parallelizing irregular computations for molecular docking;redu...
ISBN:
(纸本)9780738110905
the proceedings contain 8 papers. the topics discussed include: accelerating domain propagation: an efficient GPU-parallel algorithm over sparse matrices;parallelizing irregular computations for molecular docking;reducing queuing impact in irregular data streaming applications;supporting irregularity in throughput-oriented computing by SIMT-SIMD integration;DistDGL: distributed graph neural network training for billion-scale graphs;labeled triangle indexing for efficiency gains in distributed interactive subgraph search;distributed memory graph coloring algorithms for multiple GPU;and performance evaluation of the vectorizable binary search algorithms on an FPGA platform.
Graph analysis now percolates society with applications ranging from advertising and transportation to medical research. the structure of graphs is becoming more complex every day while they are getting larger. the in...
详细信息
ISBN:
(纸本)9781450384414
Graph analysis now percolates society with applications ranging from advertising and transportation to medical research. the structure of graphs is becoming more complex every day while they are getting larger. the increasing size of graph networks has made many of the classical algorithms reasonably slow. Fortunately, CPU architectures have evolved to adjust to new and more complex problems in terms of core-level parallelism and vector-level parallelism (SIMD-level). In this paper, we are exploring how the modern vector architecture of CPUs can help with community detection, partitioning, and coloring kernels by studying two representatives algorithms. We consider the Intel SkylakeX and Cascade Lake architectures, which support gather and scatter instructions on 512-bit vectors. the existing vectorized graph algorithms of classic graph problems, such as BFS and PageRank, do not apply well to community detection;we show the support of gather and scatter are necessary. In particular for the implementation of the reduce-scatter patterns. We evaluate the performances achieved on the two architectures and conclude that good hardware support for scatter instructions is necessary to fully leverage the vector processing for graph partitioning problems.
暂无评论