In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embeddedsystems. Towards this, we map and implement 5 benchmarks, that find utility in very different application d...
详细信息
ISBN:
(纸本)9781479901036
In this paper we evaluate the promise held by low-power GPUs for non-graphic workloads that arise in embeddedsystems. Towards this, we map and implement 5 benchmarks, that find utility in very different application domains, to an embedded GPU. Our results show that apart from accelerated performance, embedded GPUs are promising also because of their energy efficiency which is an important design goal for battery-driven mobile devices. We show that adopting the same optimization strategies as those used for programming high-end GPUs might lead to worse performance on embedded GPUs. This is due to restricted features of embedded GPUs, such as, limited or no user-defined memory, small instruction-set, limited number of registers, among others. We propose techniques to overcome such challenges, e.g., by distributing the workload between GPUs and multi-core CPUs, similar to the spirit of heterogeneous computation.
This paper introduces a methodology for forward error correction (FEC) architectures prototyping, oriented to system verification and characterization. A complete design flow is described, which satisfies the requirem...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
This paper introduces a methodology for forward error correction (FEC) architectures prototyping, oriented to system verification and characterization. A complete design flow is described, which satisfies the requirement for error-free hardware design and acceleration of FEC simulations. FPGA devices give the designer the ability to observe rare events, due to tremendous speed-up of FEC operations. A Matlab-based system assists the investigation of the impact of very rare decoding failure events on the FEC system performance and the finding of solutions which aim to parameters optimization and BER performance improvement of LDPC codes in the error floor region. Furthermore, the development of an embedded system, which offers remote access to the system under test and verification process automation, is explored. The presented here prototyping approach exploits the high-processing speed of FPGA-based emulators and the observability and usability of software-based models.
Attacks on embedded devices using the electromagnetic (EM) side channel have proliferated. Predicting software vulnerability to such attacks requires an ability to simulate EM fields during software development rather...
详细信息
ISBN:
(纸本)9783031150746;9783031150739
Attacks on embedded devices using the electromagnetic (EM) side channel have proliferated. Predicting software vulnerability to such attacks requires an ability to simulate EM fields during software development rather than relying on expensive lab-based measurements. We propose a modeling approach capable of synthesizing instruction-level EM traces for arbitrary software, using a one-time pre-characterization of a processor. Reducing the cost of dictionary construction is a major contribution of this paper. Results on a set of benchmarks show that synthesized traces are accurate in estimating EM emanations with less than 5% mean absolute percentage error (MAPE) compared to measurements. Furthermore, synthesized traces predict control flow leakage with an accuracy of 87% or more based on the side-channel vulnerability factor (SVF) metric.
A hybrid approach for mapping applications represented as Directed Acyclic Graphs (DAGs) is introduced in this work. It combines the Benders decomposition principle, which integrates Integer Linear and Constraint Prog...
详细信息
ISBN:
(纸本)9781509030767
A hybrid approach for mapping applications represented as Directed Acyclic Graphs (DAGs) is introduced in this work. It combines the Benders decomposition principle, which integrates Integer Linear and Constraint Programming (ILP and CP) methods, with a pure ILP model to find optimal solutions. The cuts that are generated during the iterative Benders solution process are later exploited by the ILP solver to prune the remaining search space. The proposed model succeeds to provide the optimal solution in cases where either method alone fails to do so, while it also reduces the total solution time.
In this article we implement a stochastic modeling technique for simulating the communication between processors and arbitration among buses for an embedded SoC. The stochastic models implemented with queues have been...
详细信息
In this article we implement a stochastic modeling technique for simulating the communication between processors and arbitration among buses for an embedded SoC. The stochastic models implemented with queues have been used to estimate, through simulation of different arbitration policies, the power consumption and delays, as well as estimate average or worst case scenarios that could occur with different architectures and arbitration policies. This idea could then be extended to writing probabilistic test benches to analyze the performance of different architectures as well as device and test arbitration policies which would attempt to optimize the power consumption and buffer lengths with constraints on the average delay. (c) 2006 Elsevier B.V. All rights reserved.
Due to energy efficiency requirements of modern embeddedsystems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same ...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
Due to energy efficiency requirements of modern embeddedsystems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same time multiple applications are intended to run concurrently on the devices with such heterogeneous architectures. This rapid growth in the complexity of the hardware and its use cases imposes new challenges on the software development tools. To overcome this complexity, model of computation based approaches are becoming increasingly promising. Synchronous Data Flow (SDF) is a popular specification formalism for streaming applications with inherently concurrent nature. However, the parallelism expressed in the original representation is often not sufficient to maximally exploit the potential of multicore platforms. In this paper we present a holistic methodology for improving the throughput of streaming applications while mapping them onto heterogeneous architectures. The approach uses transformations that adapt the parallelism in SDF according to available platform resources. We use a genetic algorithm to explore SDF instances with the objective of maximizing throughput on a target platform. Our model supports architecture heterogeneity and multi-application scenarios. The experiments indicate that our approach outperforms other techniques for exploiting parallelism on a single application in most of the test cases and enables concurrent applications optimization.
The computational demand of signal processing algorithms is rising continuously. Heterogeneous embedded multiprocessor systems-on-chips are one solution to tackle this demand. But to be able to take advantage of the b...
详细信息
ISBN:
(纸本)9783540736226
The computational demand of signal processing algorithms is rising continuously. Heterogeneous embedded multiprocessor systems-on-chips are one solution to tackle this demand. But to be able to take advantage of the benefits of these systems, new strategies are required how to map applications to such a system and how to evaluate the system's performance at a very early design stage. We will present a static, analytical, bottom-up methodology for temporal and spatial mapping of applications to MP-SoCs based on packing. Furthermore we will demonstrate how the result can be used for performance evaluation and system improvement without the need for simulations.
In this paper, we present a new approach for mapping LLVM IR to binary machine code for overcoming the current limitations of host-based simulations of performance-critical embedded software imposed by compiler optimi...
详细信息
ISBN:
(纸本)9783031045806;9783031045790
In this paper, we present a new approach for mapping LLVM IR to binary machine code for overcoming the current limitations of host-based simulations of performance-critical embedded software imposed by compiler optimizations. Our novel, fully automated mapping approach even copes with aggressive compiler optimizations without requiring any modification to the compiler or the need of expert supervision. Experimental results show that accurate mappings are produced even when compiling with the highest level of optimization (average error below 2%). The proposed simulation methodology provides a speedup of at least 26 compared to the widely used gem5 simulator.
This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs...
详细信息
ISBN:
(纸本)9783540736226
This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs with support to Globally Asynchronous Locally Synchronous scheme. It allows early hardware/software co-verification and optimization. The architecture abstracts the underlying hardware details from the processors so that knowledge about the exact locations of individual components are not required for communication. Implemented example architecture contains 58 IP blocks, including 35 Nios II soft processors. As a proof of concept, a MPEG-4 video encoder is run on the example architecture.
Modern networked embedded system design has to cope with multiple design objectives. One major challenge is the determination of optimal routings with respect to these objectives. Existing automatic optimization appro...
详细信息
ISBN:
(纸本)9781424419852
Modern networked embedded system design has to cope with multiple design objectives. One major challenge is the determination of optimal routings with respect to these objectives. Existing automatic optimization approaches carry out a two step optimization: First, they perform a multi-objective topology optimization of the networked embedded system. Then, a multi-objective routing optimization for a subset of Pareto-optimal solutions obtained from the first step is performed. In general, this may exclude several globally optimal solutions from the optimization process. To overcome this drawback, a unified approach based on Multi-Objective Evolutionary Algorithms is presented that ensures a combined optimization of the topology and routing. Since the system topology is varied within the optimization, the main contribution of this paper contribution is a novel routing technique that always samples feasible paths using a topology independent genetic encoding. This encoding preserves optimized routing information when changing the underlying topology. An experimental evaluation shows the effectiveness of the presented approach.
暂无评论