This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs...
详细信息
ISBN:
(纸本)9783540736226
This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs with support to Globally Asynchronous Locally Synchronous scheme. It allows early hardware/software co-verification and optimization. The architecture abstracts the underlying hardware details from the processors so that knowledge about the exact locations of individual components are not required for communication. Implemented example architecture contains 58 IP blocks, including 35 Nios II soft processors. As a proof of concept, a MPEG-4 video encoder is run on the example architecture.
Modern networked embedded system design has to cope with multiple design objectives. One major challenge is the determination of optimal routings with respect to these objectives. Existing automatic optimization appro...
详细信息
ISBN:
(纸本)9781424419852
Modern networked embedded system design has to cope with multiple design objectives. One major challenge is the determination of optimal routings with respect to these objectives. Existing automatic optimization approaches carry out a two step optimization: First, they perform a multi-objective topology optimization of the networked embedded system. Then, a multi-objective routing optimization for a subset of Pareto-optimal solutions obtained from the first step is performed. In general, this may exclude several globally optimal solutions from the optimization process. To overcome this drawback, a unified approach based on Multi-Objective Evolutionary Algorithms is presented that ensures a combined optimization of the topology and routing. Since the system topology is varied within the optimization, the main contribution of this paper contribution is a novel routing technique that always samples feasible paths using a topology independent genetic encoding. This encoding preserves optimized routing information when changing the underlying topology. An experimental evaluation shows the effectiveness of the presented approach.
The use of driver models within advanced driver assistance systems (ADAS) allows anticipating the driving behavior of the vehicle and all traffic participants in the close vicinity. This valuable information could con...
详细信息
ISBN:
(纸本)9781479937707
The use of driver models within advanced driver assistance systems (ADAS) allows anticipating the driving behavior of the vehicle and all traffic participants in the close vicinity. This valuable information could considerably improve the performance as well as the acceptance of ADAS. Consequently complex driver models need to be integrated in embeddedsystems. This work, first of all, aims to summarize important driver models described in literature. Based upon this a suitable approach to implement a driver model on an embedded system is derived. The model used, focuses on the longitudinal driving and lane change behavior of drivers. The system architecture is derived and optimized for real-time execution. The driver model is analyzed in detailed simulations. Test drives in a small scale naturalistic driving study are used to validate the driver model. This paper defines a standard driver model to be implemented as part of the DESERVE platform within the Artemis project "DESERVE". As embedded automotive hardware the dSpace MicroAutoBox II is used. The paper summarizes approaches and examples to use the generated prediction data in ADAS like ACC.
embeddedsystems in Field-Programmable Gate Arrays can be customised and adaptive if assembled from modular components at run time. This paper describes techniques for modelling inter-module channel behaviour based on...
详细信息
ISBN:
(纸本)1424401550
embeddedsystems in Field-Programmable Gate Arrays can be customised and adaptive if assembled from modular components at run time. This paper describes techniques for modelling inter-module channel behaviour based on statistical time division multiplexing. Where modules communicate over shared media, the proposed techniques enable systematic development of on-chip communication infrastructure to support run-time instantiation of components. Our techniques also allow system designers to guarantee that logical communication requirements between the adjunct modules can be satisfied by the infrastructure. An in-depth analysis is presented, and then verified with cycle-accurate simulations for the Sonic-on-chip reconfigurable platform for real-time video applications.
This paper describes the design of a programmable coprocessor for Public Key Cryptography (PKC) on an FPGA. The implementation provides a very broad range of functions together with countermeasures against Side-Channe...
详细信息
ISBN:
(纸本)9781424410583
This paper describes the design of a programmable coprocessor for Public Key Cryptography (PKC) on an FPGA. The implementation provides a very broad range of functions together with countermeasures against Side-Channel Analysis (SCA) attacks. The functions are implemented in a hierarchical manner, where all levels are accessible by the user. This makes the coprocessor very flexible and particularly suitable to be used in embedded environments where the border between hardware and software needs to be decided depending on the application. Especially for RSA, the resulting implementation on an XC3S5000 FPGA, from the low-cost Spartan series of XiIinx, shows comparable performance figures compared to the state-of-the-art in PKC coprocessors.
The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. Ho...
详细信息
ISBN:
(纸本)9781467373111
The added encoding efficiency and visual quality that is offered by the latest HEVC standard is mostly attained at the cost of a significant increase of the computational complexity at both the encoder and decoder. However, such added complexity greatly compromises the implementation of this standard in computational and energy constrained devices, including embeddedsystems, mobile and battery supplied devices. To circumvent this limitation, this paper proposes the exploitation of embedded GPU devices already equipping many state of the art SoCs to accelerate the HEVC in-loop filters (i.e. deblocking filter and sample adaptive offset). The presented approaches comprehensively exploit both fine and coarse-grained parallelization opportunities of these filters in an NVIDIA Tegra GPU. According to the conducted experimental evaluation, the proposed approach showed to be a remarkable strategy to satisfy the real-time requirements of the HEVC decoder, being able to filter each Ultra HD 4K intra frame in less than 20 ms (about 50 fps).
The constantly increasing computational power of the embeddedsystems is based on the integration of a large number of cores on a single chip. In such complex platforms, the synchronization of the accesses of the shar...
详细信息
ISBN:
(纸本)9781479937707
The constantly increasing computational power of the embeddedsystems is based on the integration of a large number of cores on a single chip. In such complex platforms, the synchronization of the accesses of the shared memory data is becoming a major issue, since it affects the performance of the whole system. This problem, which is currently a challenge in the embeddedsystems, has been studied in the High Performance Computing domain, where several message passing algorithms have been designed to efficiently avoid the limitations coming from locking. In this work, inspired from the work on message passing synchronization algorithms in the High Performance Computing domain we design and evaluate a set of synchronization algorithms for multi-core embedded platforms. We compare them with the corresponding lock-based implementations and prove that message passing synchronization algorithms can be efficiently utilized in multi-core embeddedsystems. By using message passing synchronization instead of lock-based, we managed to reduce the execution time of our benchmark up to 29.6%.
With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become a necessity. This paper shows how, by using the powerful mechanism o...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become a necessity. This paper shows how, by using the powerful mechanism of transactions as a con-currency model, and by taking advantage of. NET introspection and attribute programming capabilities, we were able to develop a system-level modeling and parallel simulation environment. We kept the same concepts to describe the architecture of high-level models, such as modules and communication channels. However, unlike SystemC, the behaviour is no longer described as processes and events but as transactions. We implemented scheduling algorithms in order to enable simulating a transactional models in parallel by taking advantage of a multicore machine. These algorithms take into account the dependency between transactions and the number of cores of the simulation machine. We studied two synchronisation strategies: one using locking and the other using partitioning. An experiment made on a WiFi 802.11a transmitter achieved a speedup of about 1.9 using two threads. With 8 threads, although the workload of individual transactions was not significant, we could reach a 5.1 speedup. When the workload is significant the speedup can reach 6.3.
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such as software bypassing and operand sharing. Previously, these optimizations have mostly been performed inside single ...
详细信息
ISBN:
(纸本)9781538634370
Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such as software bypassing and operand sharing. Previously, these optimizations have mostly been performed inside single basic blocks, leaving much of their potential unused. In this work, software bypassing and operand sharing are integrated with loop scheduling, allowing optimizations over loop iteration boundaries. This considerably further reduces register file accesses and immediate value transfers on tight loops - in some cases even eliminating all register file accesses from the loop body. In the benchmarked 12 small loops, compared to traditional VLIW-style processors, on average 63% of register file reads and 77% of register file writes could be eliminated. Compared to a compiler which performs these optimizations only inside a basic block, on average 58% of register file reads, 28% of register file writes could be eliminated. The additional register access reductions allow both direct energy savings from fewer register accesses and indirect energy savings by allowing the use of simpler register files with less read and write ports and a simpler interconnect network with less transport buses.
This paper introduces an Y-chart methodology for performance estimation based on high level models for both application and architecture. As embedded devices are more and more complex, the choice of the best suited ar...
详细信息
暂无评论