FPGA-based prototyping is nowadays common practice in the functional verification of hardware components since it allows to cover a large number of test cases in a shorter time compared to HDL simulation. In addition,...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
FPGA-based prototyping is nowadays common practice in the functional verification of hardware components since it allows to cover a large number of test cases in a shorter time compared to HDL simulation. In addition, an FPGA-based emulator significantly accelerates the simulation with respect to bit-true software models. This speed-up is crucial when the statistical properties of a system have to be analyzed by Monte Carlo techniques. In this paper we consider a multiple-input multiple-output (MIMO) wireless communication system and show how integrating an FPGA accelerator in the software simulation framework is key to enable the development of complex hardware components in the receiver, from algorithm all the way to chip testing. In particular, we focus on a MIMO detector implementation based on the depth-first sphere decoding algorithm. The speed-up of up to 3 orders of magnitude achieved by hardware-accelerated simulation compared to a pure software testbed enables an extensive fixed-point exploration. Furthermore, it allows a unique characterization of the system communication performance and the MIMO detector run-time characteristics, which vary for different configuration parameters and operating scenarios and hence require a thorough investigation.
Parallel computing systems are becoming widespread and grow in sophistication. Besides simulation, rapid system prototyping becomes important in designing and evaluating their architecture. We present an efficient FPG...
详细信息
ISBN:
(纸本)9781424410583
Parallel computing systems are becoming widespread and grow in sophistication. Besides simulation, rapid system prototyping becomes important in designing and evaluating their architecture. We present an efficient FPGA-based platform that we developed and use for research and experimentation on high speed interprocessor communication, network interfaces and interconnects. Our platform supports advanced communication capabilities such as Remote DMA, Remote Queues, zero-copy data delivery and flexible notification mechanisms, as well as link bundling for increased performance. We report on the platform architecture, its design cost, complexity and performance (latency and throughput). We also report our experiences from implementing benchmarking kernels and a user-level benchmark application, and show how software can take advantage of the provided features, but also expose the weaknesses of the system.
The development of complex algorithms for advanced driver assistance systems is a challenging task, due to the high innovation rate and processing demands of applications in this field. The development is usually supp...
详细信息
ISBN:
(纸本)9781479937707
The development of complex algorithms for advanced driver assistance systems is a challenging task, due to the high innovation rate and processing demands of applications in this field. The development is usually supported by a software development framework that provides an infrastructure (e.g., access to sensor data) that simulates and evaluates the algorithms. One problem, especially with computationally intensive algorithms, is the slow simulation speed. This paper presents a prototyping environment that connects a software development framework with a FPGA-based hardware platform. This allows implementing computationally intensive tasks in hardware. The proposed rapid prototyping system not only reduces the simulation time, thereby allowing the software designer to evaluate algorithmic parameters with quicker feedback, but also allows verifying and evaluating hardware modules for rapid prototyping. A case study is presented in which a traffic sign detection algorithm is implemented on a soft-core processor. By using the hardware implementation the simulation was accelerated by a factor of 65, compared to the pure software implementation.
Heterogeneous multicore systems have gained momentum, specially for embedded applications, thanks to the performance and energy consumption trade-offs provided by in-order and out-of-order cores. Micro-architectural s...
详细信息
ISBN:
(纸本)9781479937707
Heterogeneous multicore systems have gained momentum, specially for embedded applications, thanks to the performance and energy consumption trade-offs provided by in-order and out-of-order cores. Micro-architectural simulation models the behavior of pipeline structures and caches with configurable parameters. This level of abstraction is well known for being flexible enough to quickly evaluate the performance of new hardware implementations, such as future heterogeneous multicore platforms. However, currently, there is no open-source micro-architectural simulator supporting both in-order and out-of-order ARM cores. This article describes the implementation and accuracy evaluation of a micro-architectural simulator of Cortex-A cores, supporting in-order and out-of-order pipelines and based on the open-source gem5 simulator. We explain how to simulate Cortex-A8 and Cortex-A9 cores in gem5, and compare the execution time of ten benchmarks with real hardware. Both models, with average absolute errors of only 7 %, are more accurate than similar micro-architectural simulators, which show average absolute errors greater than 15 %.
Flexible constraint length channel decoders are required for software defined radios. This paper presents a novel scalable scheme for realizing flexible constraint length Viterbi decoders on a de Bruijn interconnectio...
详细信息
ISBN:
(纸本)9781424445011
Flexible constraint length channel decoders are required for software defined radios. This paper presents a novel scalable scheme for realizing flexible constraint length Viterbi decoders on a de Bruijn interconnection network. architectures for flexible decoders using the flattened butterfly and shuffle-exchange networks are also described. It is shown that these networks provide favourable substrates for realizing flexible convolutional decoders. Synthesis results for the three networks are provided and a comparison is performed. An architecture based on a 2D-mesh, which is a topology having a nominally lesser silicon area requirement, is also considered as a fourth point for comparison. It is found that of all the networks considered, the de Bruijn network offers the best tradeoff in terms of area versus throughput.
Energy efficiency is one of the most important aspects in designing embedded processors. The use of a wide SIMD processor architecture is a promising approach to build energy-efficient high performance embedded proces...
详细信息
ISBN:
(纸本)9781479901036
Energy efficiency is one of the most important aspects in designing embedded processors. The use of a wide SIMD processor architecture is a promising approach to build energy-efficient high performance embedded processors. In this paper, we propose a configurable wide SIMD architecture that utilizes explicit datapath to achieve high energy efficiency. To efficiently program the proposed architecture with a standard parallel programming language, we introduce a toolflow that can compile and map OpenCL programs onto it. The compiler in the proposed toolflow is able to analyze the static access patterns in OpenCL kernels and generate efficient mapping and code that utilizes the explicit datapath. Experimental results show that the proposed architecture is efficient. In a 128-PE processor, the proposed architecture is able to achieve over 200 times speed-up and reduce the energy consumption of register file and memory by over 90% compared to a RISC processor.
The multimedia capabilities in battery powered mobile communication devices should be provided at high energy efficiency. Consequently, the hardware is usually implemented using low-power technology and the hardware a...
详细信息
ISBN:
(纸本)3540364102
The multimedia capabilities in battery powered mobile communication devices should be provided at high energy efficiency. Consequently, the hardware is usually implemented using low-power technology and the hardware architectures are optimized for embedded computing. Software architectures, on the other hand, are not embedded system specific, but closely resemble each other for any computing device. The popular architectural principle, software layering, is responsible for much of the overheads, and explains the stagnation of active usage times of mobile devices. In this paper, we consider the observed developments against the needs of multimedia applications in mobile communication devices and quantify the overheads in reference implementations.
Multi-processors are increasingly being used in modern embeddedsystems for reasons of power and speed. These systems have to support a large number of applications and standards, in different combinations, called use...
详细信息
ISBN:
(纸本)9781424445011
Multi-processors are increasingly being used in modern embeddedsystems for reasons of power and speed. These systems have to support a large number of applications and standards, in different combinations, called use-cases. The key challenges are designing efficient systems handling all these use-cases;this requires fast exploration of software and hardware alternatives with accurate performance evaluation. In this paper, we present a system-level FPGA-based simulation methodology for performance evaluation of applications on multiprocessor platforms. We observe that for multiple applications sharing an MPSoC platform, dynamic arbitration can cause deadlock in simulation. We use conservative Parallel Discrete Event simulation (PDES) for simulation of these use-cases. We further note that conservative PDES is inefficient so we present a new PDES methodology that avoids causality errors by detecting them in advance. We call our new approach as smart conservative PDES. It is scalable in the number of use-cases and number of simulated processors and is 15% faster than conservative PDES. We further present results of a case-study of two real life applications. We used our simulation technique to do a design space exploration for optimal buffer space for JPEG and H263 decoders.
embedded multimedia and wireless applications require a model-based design approach in order to satisfy stringent quality and cost constraints. The Model-of-Computation (MoC) should appropriately capture system dynami...
详细信息
Wireless Sensor Networks (WSN) are seen as attractive solutions for various monitoring and controlling applications, a large part of which require cryptographic protection. Due to the strict cost and power consumption...
详细信息
ISBN:
(纸本)9783540736226
Wireless Sensor Networks (WSN) are seen as attractive solutions for various monitoring and controlling applications, a large part of which require cryptographic protection. Due to the strict cost and power consumption requirements, their cryptographic implementations should be compact and energy-efficient. In this paper, we survey hardware architectures proposed for Advanced Encryption Standard (AES) implementations in low-cost and low-power devices. The survey considers both dedicated hardware and specialized processor designs. According to our review, currently 8-bit dedicated hardware designs seem to be the most feasible solutions for embedded, low-power WSN nodes. Alternatively, compact special functional units can be used for extending the instruction sets of WSN node processors for efficient AES execution.
暂无评论