The power consumption of battery-powered and energy-scavenging devices has become a major design metric for embeddedsystems. Increasingly complex software applications as well as rising demands in operating times whi...
详细信息
ISBN:
(纸本)9781424445011
The power consumption of battery-powered and energy-scavenging devices has become a major design metric for embeddedsystems. Increasingly complex software applications as well as rising demands in operating times while having restricted power budgets make power-aware system design indispensable. In this paper we present an emulation-based power profiling approach allowing for real-time power analysis of embeddedsystems. Power saving potential as well as power-critical events can be identified in much less time compared to power simulations. Hence, the designer can take countermeasures already in early design stages, which enhances development efficiency and decreases time-to-market. Accuracies achieved for a deep sub-micron smart-card controller are greater than 90% compared to gate-level simulations.
Spark is one of the most widely used frameworks for data analytics that offers fast development of applications like machine learning and graph computations in distributed systems. In this paper, we present SPynq: A f...
详细信息
ISBN:
(纸本)9781538634370
Spark is one of the most widely used frameworks for data analytics that offers fast development of applications like machine learning and graph computations in distributed systems. In this paper, we present SPynq: A framework for the efficient utilization of hardware accelerators over the Spark framework on heterogeneous MPSoC FPGAs, such as Zynq. Spark has been mapped to the Pynq platform and the proposed framework allows the seamlessly utilization of the programmable logic for the hardware acceleration of computational intensive Spark kernels. We have also developed the required libraries in Spark that hides the accelerator's details to minimize the design effort to utilize the accelerators. A cluster of 4 nodes (workers) based on the all-programmable MPSoCs has been implemented and the proposed platform is evaluated in a typical machine learning application based on logistic regression. The logistic regression kernel has been developed as an accelerator and incorporated to the Spark. The developed system is compared to a high-performance Xeon cluster that is typically used in cloud computing. The performance evaluation shows that the heterogeneous accelerator-based MpSoC can achieve up to 2.3x system speedup compared with a Xeon system (with 90% accuracy) and 20x better energy-efficiency. For embedded application, the proposed system can achieve up to 40x speedup compared to the software only implementation on low-power embedded processors and 30x lower energy consumption.
Virtual prototyping is an alternative way to simulate the real product without the physical object. The product is created in simulator before making a physical prototype. In this research, a vehicle controller is the...
详细信息
ISBN:
(纸本)9781538634370
Virtual prototyping is an alternative way to simulate the real product without the physical object. The product is created in simulator before making a physical prototype. In this research, a vehicle controller is the target system for prototyping. The vehicle navigation system is the target program to execute in a vehicle controller system. One of critical factors in a vehicle controller is response time. Thus, this paper aims to create the virtual prototype with real response time for a vehicle controller. The hardware accelerators are used to improve system performance. The hardware accelerators are implemented into two bases, which are GPU and programmable logic (PL). In addition, the overclocking method is applied to increase the operating speed in GPU hardware accelerator. The result from improvement is evaluated in terms of execution time per frame, which is the metric for measurement of the ability of real-time response. To achieve real-time response, the execution time per frame must be below 33.33 ms. The GPU based version achieves run-time response by running at 14.43 ms per frame. The PL based version runs at 16.332 ms per frame, which achieves run-time response.
Dynamically reconfigurable systems demand complicated run-time management. Due to resource constraints and reconfiguration latencies, efficient reconfiguration strategies that can reduce the overhead cost of dynamic r...
详细信息
ISBN:
(纸本)9781424410583
Dynamically reconfigurable systems demand complicated run-time management. Due to resource constraints and reconfiguration latencies, efficient reconfiguration strategies that can reduce the overhead cost of dynamic reconfiguration need to be studied. In this paper, we i) propose a reconfigurable task model which extends the classical real-time task model to support the additional states and latencies needed to capture dynamically reconfigurable behavior, ii) propose a coprocessor-coupled reconfigurable architecture which has hardware run-time support for task execution, task reallocation and resource management, and iii) present a SystemC based framework to model and simulate coprocessor-coupled reconfigurable systems. We illustrate how COSMOS may be used to capture the dynamic behavior of such systems and emphasize the need for capturing the system aspects of such systems in order to deal with future design challenges of dynamically reconfigurable systems.
Deep neural network algorithms show very high performance, however increased amounts of arithmetic and memory accesses hinder their adoption to embeddedsystems. This paper explores a programmable neural network proce...
详细信息
ISBN:
(纸本)9781509030767
Deep neural network algorithms show very high performance, however increased amounts of arithmetic and memory accesses hinder their adoption to embeddedsystems. This paper explores a programmable neural network processing architecture that can efficiently execute feed-forward, recurrent, and convolutional deep neural networks. The neural network algorithms are transformed to matrix-vector multiplication operations, which are then executed using a very wide SIMD (Single Instruction Multiple Data) functional unit. Especially, the functional and the data-level parallelism are compared for this architecture exploration, and an auxiliary hardware support for data rearrangement is added. The simulation results show that the architecture with a 128-wide SIMD functional unit can execute deep neural network algorithms for voice command, gesture, and handwritten digit recognition in real-time.
Dynamic management of modern Multi-Processors System on Chip (MPSoC) become mandatory for optimization purpose. Evaluation of these managers is essential early in the design process to guarantee a reduced design cycle...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
Dynamic management of modern Multi-Processors System on Chip (MPSoC) become mandatory for optimization purpose. Evaluation of these managers is essential early in the design process to guarantee a reduced design cycle. However, most of the existing system-level simulation-based frameworks consider static application mapping and do not consider the run-time management effects. In this work, we present a modeling and simulation approach that allows integration of run-time management strategies in MPSoC system simulation. We have integrated the proposed approach in an industrial modeling and simulation framework. A case-study with seven applications running on a heterogeneous multicore platform is considered and different management strategies are evaluated according to latency and power consumption criteria.
The size of the program code has become a critical design constraint in embeddedsystems, especially in handheld, battery operated devices. Large program codes require large memories, which increase the size and cost ...
详细信息
The size of the program code has become a critical design constraint in embeddedsystems, especially in handheld, battery operated devices. Large program codes require large memories, which increase the size and cost of the chip. In addition, the power consumption is increased due to higher memory I/O bandwidth. Program compression is one of the most often used methods to reduce the size of the program code. In this paper, two compression approaches, dictionary-based compression and instruction template-based compression, were evaluated on a customizable processor architecture with parallel resources. The effects on area and power consumption were measured. Dictionary-based compression reduced the area at best by 77% and power consumption by 73%. Instruction template-based compression resulted in increase in both area and power consumption and hence turned out to be impractical. (C) 2007 Elsevier B.V. All rights reserved.
Digital information technology has revolutionized the world within less than four decades. It has taken the step from mainframe computers, mainly operated as hosts in computing centres, to desktops and laptops, connec...
ISBN:
(纸本)3540364102
Digital information technology has revolutionized the world within less than four decades. It has taken the step from mainframe computers, mainly operated as hosts in computing centres, to desktops and laptops, connected by networks and found nearly on all office desks and tables today. computers have become every day tools deeply integrated into all kinds of activities of our life.
The programming complexity of increasingly parallel processors calls for new tools to assist programmers in utilising the parallel hardware resources. In this paper we present a set of models that we have developed to...
详细信息
ISBN:
(纸本)9781424445011
The programming complexity of increasingly parallel processors calls for new tools to assist programmers in utilising the parallel hardware resources. In this paper we present a set of models that we have developed to form part of a tool which is intended for iteratively tuning the mapping of dataflow graphs onto manycores. One of the models is used for capturing the essentials of manycores that are identified as suitable for signal processing and which we use as target architectures. Another model is the intermediate representation in the form of a timed configuration graph, describing the mapping of a dataflow graph onto a machine model. Moreover, this IR can be used for performance evaluation using abstract interpretation. We demonstrate how the models can be configured and applied in order to map applications on the Raw processor. Furthermore, we report promising results on the accuracy of performance predictions produced by our tool. It is also demonstrated that the tool can be used to rank different mappings with respect to optimisation on throughput and end-to-end latency.
The recent spectacular progress in modern microelectronics created a big stimulus towards development of mobile, autonomous, embedded and re-configurable systems, but also resulted in man), difficult to solve issues, ...
详细信息
ISBN:
(纸本)1424401550
The recent spectacular progress in modern microelectronics created a big stimulus towards development of mobile, autonomous, embedded and re-configurable systems, but also resulted in man), difficult to solve issues, as power and energy crisis or increased leakage power, that are especially serious for this sort of systems. What prevents the (re)configurable heterogeneous embeddedsystems from becoming one of the main practically used paradigms is mainly inadequate support of the development methodologies and EDA-tools for efficient mapping of applications and producing power, energy, and speed optimized systems. As a part of our research that aims at development of effective methods and EDA-tools for the heterogeneous (re-)configurable embedded system synthesis, we performed a comparative analysis of several representative commercial and academic synthesis methods and tools for the FPGA-targeted controller synthesis. In this paper, the automatic hardware synthesis for the heterogeneous embeddedsystems is considered, when focusing on the efficient multi-objective controller synthesis. In particular, a part of results and conclusions from our analysis, and effective solutions of some problems observed are discussed.
暂无评论