Robotics combines a lot of different domains with sophisticated challenges such as computer vision, motion control and search algorithms. Search algorithms can be applied to calculate movements. The A* algorithm is a ...
详细信息
Robotics combines a lot of different domains with sophisticated challenges such as computer vision, motion control and search algorithms. Search algorithms can be applied to calculate movements. The A* algorithm is a well-known and proved search algorithm to find a path within a graph. This paper presents an extended A* algorithm that is optimized for robot navigation using a bird's eye view as a map that is dynamically generated by image stitching. The scenario is a robot that moves to a target in an environment containing obstacles. The robot is controlled by a Xilinx Zynq platform that contains an ARM processor and an FPGA. In order to exploit the flexibility of such an architecture, the FPGA is used to execute the most compute-intensive task of the extended A* algorithm. This task is responsible for sorting the accessible nodes in the graph. Several environments with different complexity levels are used to evaluate the extended A* algorithm. The environment is captured by a Kinect sensor located directly on the robot. In order to dewarp the robot's view, the frames are transformed to a bird's eye view. In addition, a wider viewing range is achieved by image stitching. The evaluation of the extended A* algorithm shows a significant improvement in terms of memory utilization. Accordingly, this algorithm is especially practicable for embedded systems since they have often only limited memory resources. Moreover, the overall execution time for several use cases is reduced up to a speed-up of 2.88x.
Modern-day streaming digital signalprocessing (DSP) applications are often accompanied by real-time requirements. In addition, they expose increasing levels of dynamic behavior. Dynamic dataflow models of computation...
详细信息
Modern-day streaming digital signalprocessing (DSP) applications are often accompanied by real-time requirements. In addition, they expose increasing levels of dynamic behavior. Dynamic dataflow models of computation (MoCs) have been introduced to model and analyze such applications. Parametrized dataflow MoCs are an important subclass of dynamic dataflow MoCs because they integrate dynamic parameters and run-time adaptation of parameters in a structured way. However, these MoCs have been primarily analyzed for functional behavior and correctness while the analysis of their temporal behavior has received little attention. In this work, we present a new analysis approach that allows analysis of worst-case latency for dynamic streaming DSP applications that can be captured using parametrized dataflow MoCs based on synchronous dataflow (SDF). We show that in the presence of parameter inter-dependencies our technique can yield tighter worst-case latency estimates than the existing techniques that operate on SDF structures that abstract the worst-case behaviour of the initial parametrized specifications. We base the approach on the (max,+) algebraic semantics of timed SDF and on its non-parametric generalization known as FSM-based scenario-aware dataflow (FSM-SADF). We evaluate the approach on a realistic case study from the multimedia domain.
The usage of locating systems in sports (e.g. soccer) elevates match and training analysis to a new level. By tracking players and balls during matches or training, the performance of players can be analyzed, the trai...
详细信息
The usage of locating systems in sports (e.g. soccer) elevates match and training analysis to a new level. By tracking players and balls during matches or training, the performance of players can be analyzed, the training can be adapted and new strategies can be developed. The radio-based RedFIR system equips players and the ball with miniaturized transmitters, while antennas distributed around the playing field receive the transmitted radio signals. A cluster computer processes these signals to determine the exact positions based on the signals' Time Of Arrival (TOA) at the back end. While such a system works well, it is neither scalable nor inexpensive due to the required computing cluster. Also the relatively high power consumption of the GPU-based cluster is suboptimal. Moreover, high speed interconnects between the antennas and the cluster computers introduce additional costs and increase the installation effort. However, a significant portion of the computing performance is not required for the synthesis of the received data, but for the calculation of the unique TOA values of every receiver line. Therefore, in this paper we propose a smart sensor approach: By integrating some intelligence into the antenna (smart antenna), each antenna can correlate the received signal independently of the remaining system and only TOA values are send to the backend. While the idea is quite simple, the question of a well suited computer architecture to fulfill this task inside the smart antenna is more complex. Therefore, we are evaluating embedded architectures, such as FPGAs, ARM cores as well as a many core CPU (Epiphany) for this approach. Thereby, we are able to achieve 50.000 correlations per second in each smart antenna. As a result, the backend becomes lightweight, cheaper interconnects through data reduction are required and the system becomes more scalable, since most processing power is already included in the antenna.
We investigate the problem of line detection in digital imageprocessing and in special how state of the art algorithms behave in the presence of noise and whether CPU efficiency can be improved by the combination of ...
We investigate the problem of line detection in digital imageprocessing and in special how state of the art algorithms behave in the presence of noise and whether CPU efficiency can be improved by the combination of a Monte Carlo Tree Search, hierarchical space decomposition, and parallel computing. The starting point of the investigation is the method introduced in 1962 by Paul Hough for detecting lines in binary images. Extended in the 1970s to the detection of space forms, what came to be known as Hough Transform (HT) has been proposed, for example, in the context of track fitting in the LHC ATLAS and CMS projects. The Hough Transform transfers the problem of line detection, for example, into one of optimization of the peak in a vote counting process for cells which contain the possible points of candidate lines. The detection algorithm can be computationally expensive both in the demands made upon the processor and on memory. Additionally, it can have a reduced effectiveness in detection in the presence of noise. Our first contribution consists in an evaluation of the use of a variation of the Radon Transform as a form of improving theeffectiveness of line detection in the presence of noise. Then, parallel algorithms for variations of the Hough Transform and the Radon Transform for line detection are introduced. An algorithm for Parallel Monte Carlo Search applied to line detection is also introduced. Their algorithmic complexities are discussed. Finally, implementations on multi-GPU and multicore architectures are discussed.
The proceedings contain 53 papers. The topics discussed include: high performance multi-standard architecture for DCT computation in H.264/AVC high profile and HEVC codecs;architecture and programming model support fo...
ISBN:
(纸本)9791092279016
The proceedings contain 53 papers. The topics discussed include: high performance multi-standard architecture for DCT computation in H.264/AVC high profile and HEVC codecs;architecture and programming model support for efficient heterogeneous computing on tightly-coupled shared-memory clusters;a neural model for hardware plasticity in artificial vision systems;system-level PMC-driven energy estimation models in RVC-CAL video codec specifications;a resource-aware nearest neighbor search algorithm for k-dimensional trees;accuracy and performance analysis of Harris corner computation on tightly-coupled processor arrays;a linear state model for PDR+WLAN positioning;SiPM based smart pixel for photon counting integrated streak camera;a coarse-grained reconfigurable wavelet denoiser exploiting the multi-dataflow composer tool;foreground object features extraction with GLCM texture descriptor in FPG;and noise-agnostic adaptive image filtering without training references on an evolvable hardware platform.
Optimizing connected component labeling is currently a very active research field. The current most effective algorithms although close in their design are based on different memory/computation trade-offs. This paper ...
详细信息
ISBN:
(纸本)9791092279061
Optimizing connected component labeling is currently a very active research field. The current most effective algorithms although close in their design are based on different memory/computation trade-offs. This paper presents a review of these algorithms and a detailed benchmark on several Intel and ARM embedded processors that allows to focus on their advantages and drawbacks and to highlight how processor architecture impact them.
We present in this paper Video++, a new framework targeting image and video applications running on multi-core processors. While offering a high expressive power, we show that it generates code running up to 32 times ...
详细信息
Chirp-sequence-based Frequency Modulation Continuous Wave (FMCW) radar is effective at detecting range and velocity of a target. However, the target detection algorithm is based on two-dimensional Fast Fourier Transfo...
详细信息
ISBN:
(纸本)9791092279061
Chirp-sequence-based Frequency Modulation Continuous Wave (FMCW) radar is effective at detecting range and velocity of a target. However, the target detection algorithm is based on two-dimensional Fast Fourier Transform, which uses a great deal of data over several PRIs (Pulse Repetition Intervals). In particular, if the multiple-receive channel is employed to estimate the angle position of a target;even more computational complexity is required. In this paper, we report on how a newly developed signalprocessing module is implemented in the FPGA, and on its performance measured under test conditions. Moreover, we have presented results from analysis of the use of hardware resources and processing times.
This paper presents the Parallel Heterogeneous Architecture Technology (PHAT), a scalable design methodology for prototyping and evaluating heterogeneous arrays of software-programmable VLIW processors and both manual...
详细信息
ISBN:
(纸本)9791092279061
This paper presents the Parallel Heterogeneous Architecture Technology (PHAT), a scalable design methodology for prototyping and evaluating heterogeneous arrays of software-programmable VLIW processors and both manually designed and automatically-compiled custom hardware accelerators, using a shared memory architecture for communication. We discuss the trade-offs and break-even point for switching from bus-based to network-on-chip interconnects, the interface and protocols for connecting distributed on-chip caches and multi-bank out-of-order offchip- memories, as well as the impact of floorplanning on the quality of results for implementation on Xilinx Virtex 6 LX 760 devices. The capabilities are evaluated at the system-level on the multi-FPGA Convey HC-1ex hybrid-core computer, accessing its high-performance memory system, and integrating r-VEX processor cores with IP blocks for SHA and FFT computations.
暂无评论