The development of complex algorithms for advanced driver assistance systems is a challenging task, due to the high innovation rate and processing demands of applications in this field. The development is usually supp...
详细信息
ISBN:
(纸本)9781479937707
The development of complex algorithms for advanced driver assistance systems is a challenging task, due to the high innovation rate and processing demands of applications in this field. The development is usually supported by a software development framework that provides an infrastructure (e.g., access to sensor data) that simulates and evaluates the algorithms. One problem, especially with computationally intensive algorithms, is the slow simulation speed. This paper presents a prototyping environment that connects a software development framework with a FPGA-based hardware platform. This allows implementing computationally intensive tasks in hardware. The proposed rapid prototyping system not only reduces the simulation time, thereby allowing the software designer to evaluate algorithmic parameters with quicker feedback, but also allows verifying and evaluating hardware modules for rapid prototyping. A case study is presented in which a traffic sign detection algorithm is implemented on a soft-core processor. By using the hardware implementation the simulation was accelerated by a factor of 65, compared to the pure software implementation.
Heterogeneous multicore systems have gained momentum, specially for embedded applications, thanks to the performance and energy consumption trade-offs provided by in-order and out-of-order cores. Micro-architectural s...
详细信息
ISBN:
(纸本)9781479937707
Heterogeneous multicore systems have gained momentum, specially for embedded applications, thanks to the performance and energy consumption trade-offs provided by in-order and out-of-order cores. Micro-architectural simulation models the behavior of pipeline structures and caches with configurable parameters. This level of abstraction is well known for being flexible enough to quickly evaluate the performance of new hardware implementations, such as future heterogeneous multicore platforms. However, currently, there is no open-source micro-architectural simulator supporting both in-order and out-of-order ARM cores. This article describes the implementation and accuracy evaluation of a micro-architectural simulator of Cortex-A cores, supporting in-order and out-of-order pipelines and based on the open-source gem5 simulator. We explain how to simulate Cortex-A8 and Cortex-A9 cores in gem5, and compare the execution time of ten benchmarks with real hardware. Both models, with average absolute errors of only 7 %, are more accurate than similar micro-architectural simulators, which show average absolute errors greater than 15 %.
The multimedia capabilities in battery powered mobile communication devices should be provided at high energy efficiency. Consequently, the hardware is usually implemented using low-power technology and the hardware a...
详细信息
ISBN:
(纸本)3540364102
The multimedia capabilities in battery powered mobile communication devices should be provided at high energy efficiency. Consequently, the hardware is usually implemented using low-power technology and the hardware architectures are optimized for embedded computing. Software architectures, on the other hand, are not embedded system specific, but closely resemble each other for any computing device. The popular architectural principle, software layering, is responsible for much of the overheads, and explains the stagnation of active usage times of mobile devices. In this paper, we consider the observed developments against the needs of multimedia applications in mobile communication devices and quantify the overheads in reference implementations.
Multi-processors are increasingly being used in modern embeddedsystems for reasons of power and speed. These systems have to support a large number of applications and standards, in different combinations, called use...
详细信息
ISBN:
(纸本)9781424445011
Multi-processors are increasingly being used in modern embeddedsystems for reasons of power and speed. These systems have to support a large number of applications and standards, in different combinations, called use-cases. The key challenges are designing efficient systems handling all these use-cases;this requires fast exploration of software and hardware alternatives with accurate performance evaluation. In this paper, we present a system-level FPGA-based simulation methodology for performance evaluation of applications on multiprocessor platforms. We observe that for multiple applications sharing an MPSoC platform, dynamic arbitration can cause deadlock in simulation. We use conservative Parallel Discrete Event simulation (PDES) for simulation of these use-cases. We further note that conservative PDES is inefficient so we present a new PDES methodology that avoids causality errors by detecting them in advance. We call our new approach as smart conservative PDES. It is scalable in the number of use-cases and number of simulated processors and is 15% faster than conservative PDES. We further present results of a case-study of two real life applications. We used our simulation technique to do a design space exploration for optimal buffer space for JPEG and H263 decoders.
embedded multimedia and wireless applications require a model-based design approach in order to satisfy stringent quality and cost constraints. The Model-of-Computation (MoC) should appropriately capture system dynami...
详细信息
This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. ...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. The latter parameter is normally not known at design time. This leads to the development of more general purpose processors which are capable to handle the data to be processed in any case. With the novel approach which keeps the microarchitecture of a processor flexible, the processor can start as a general purpose device and end up with a specific parameterization, comparable with application specific processor architectures. Furthermore, the increased degree of freedom which is enabled through the approach for a novel quality of processors is described.
Energetic-particle induced soft errors in on-chip cache memories have become a major challenge in designing new generation reliable microprocessors. Uniformly applying conventional protection schemes such as error cor...
详细信息
ISBN:
(纸本)1424401550
Energetic-particle induced soft errors in on-chip cache memories have become a major challenge in designing new generation reliable microprocessors. Uniformly applying conventional protection schemes such as error correcting codes (ECC) to SRAM caches may not be practical where performance, power, and die area are highly constrained, especially for embeddedsystems. In this paper, we propose to analyze the lifetime behavior of the data cache to identify its temporal vulnerability. For this vulnerability analysis, we develop a new lifetime model. Based on the new lifetime model, we evaluate the effectiveness of several existing schemes in reducing the vulnerability of the data cache. Furthermore, we propose to periodically invalidate clean cache lines to reduce the probability of errors being read in by the CPU. Combined with previously proposed early writeback strategies [1], our schemes achieve a substantially low vulnerability in the data cache, which indicate the necessity of different protection schemes for data items during various phases in their lifetime.
Software synthesis from an initial specification model becomes a critical issue in the ESL design methodology as hardware platforms are often reused and more processors are involved in the target platform. Since embed...
详细信息
Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the embeddedsystems and in the high-performance computing arenas, in the form of specialized or general-purpose accelerators. Pro...
详细信息
ISBN:
(纸本)9781479901036
Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the embeddedsystems and in the high-performance computing arenas, in the form of specialized or general-purpose accelerators. Programming models for such systems is currently a hot research topic, and as a general rule require deep programmer knowledge of the underlying hardware architecture. In this paper we present the implementation of OpenMP, one of the most intuitive and productive programming models, on the STHORM accelerator. This particular platform provides a shared-memory substrate which OpenMP requires. An innovative feature of our design is the deployment of the OpenMP model both at the host and the fabric sides, in a seamless way, which provides the programmer with a simple but effective interface for offloading and executing OpenMP kernels on the MPSoC. The optimized runtime environment provides full OpenMP support despite its small footprint (less than 10KB for a 16-core cluster) and can sustain close-to-ideal speedups in computationally intensive applications. We detail on design issues we faced along with their solutions, given the limited available resources.
Major challenges for system-level Design Space Exploration (DSE) include (a) tremendous search-space sizes for modern many-core architectures and networked systems and (b) the preponderance of infeasible solutions in ...
详细信息
ISBN:
(纸本)9783031045806;9783031045790
Major challenges for system-level Design Space Exploration (DSE) include (a) tremendous search-space sizes for modern many-core architectures and networked systems and (b) the preponderance of infeasible solutions in the search space from which no actual implementations can be derived. Since current DSE approaches are not equipped to handle these developments, we propose the integration of deep generative models into DSE to automatically compress large-scale search spaces, thus (I) reducing problem complexity faced by the optimizer while (II) learning a model of feasible solutions to focus the optimization on. The proposed approach is seamlessly integrated into state-of-the-art DSE flows, is complementary to existing search-space pruning techniques, and its potential to improve optimization quality by up to approximate to 66% is demonstrated for a variety of DSE problems.
暂无评论