PYTHON has become the de facto programming language in machine learning and scientific computing, but high performance implementations are challenging to create especially for embeddedsystems with limited resources. ...
详细信息
ISBN:
(数字)9783030609399
ISBN:
(纸本)9783030609399;9783030609382
PYTHON has become the de facto programming language in machine learning and scientific computing, but high performance implementations are challenging to create especially for embeddedsystems with limited resources. We address the challenge of compiling and optimizing PYTHON source code for a low-level target by introducing RUST as an intermediate source code step. We show that pre-existing PYTHON implementations that depend on optimized libraries, such as NumPy, can be transpiled to RUST semi-automatically, with potential for further automation. We use two representative test cases, Black-Scholes for financial options pricing and robot trajectory optimization. The results show up to 12x speedup and 1.5x less memory use on PC, and the same performance but 4x less memory use on an ARM processor on PYNQ SoC FPGA. We also present a comprehensive list of factors for the process, to show the potential for fully automated transpilation. Our findings are generally applicable and can improve the performance of many PYTHON applications while keeping their easy programmability.
Current integrated circuits exhibit an impressive and increasing power density. In this scenario, thermal modelling plays a key role in the design of next generation cooling and thermal management solutions. However, ...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
Current integrated circuits exhibit an impressive and increasing power density. In this scenario, thermal modelling plays a key role in the design of next generation cooling and thermal management solutions. However, extending existing thermal models, or designing new ones to account for new cooling solutions, requires parameter identification as well as a validation phase to ensure correctness of the results. In this paper, we propose a flexible solution to the validation issue, in the form of a hardware platform based on a Thermal Test Chip (TTC). The proposed platform allows to test a heat dissipation solution under realistic conditions, including fast spatial and temporal power gradients as well as hot spots, while collecting a temperature map of the active silicon layer. The combined power/temperature map is the key input to validate a thermal model, in both the steady state and transient case. This paper presents the current development of the platform, and provides a first validation dataset for the case of a commercial heat sink.
Markov Decision Processes (MDPs) provide a powerful decision making framework, which is increasingly being used in the design of embedded Computing systems (ECSs). This paper presents a detailed accounting of the use ...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
Markov Decision Processes (MDPs) provide a powerful decision making framework, which is increasingly being used in the design of embedded Computing systems (ECSs). This paper presents a detailed accounting of the use of MDPs in this context across research groups, including reference implementations, common datasets, file formats and platforms. Inspired by recent results showing the promising outlook of using embedded GPUs to solve MDPs on ECSs, we detail the many challenges that designers currently face and present GEMBench (the Gpu accelerated embedded Mdp testBench) in order to facilitate experimental research in this area. GEMBench is targeted to a specific embedded GPU platform, the NVIDIA Jetson platform, and is designed for future retargetability to other platforms. GEMBench is a novel open source software package that is intended to run on the target platform. The package contains libraries of MDP solvers, parsers, datasets and reference solutions, which provide a comprehensive infrastructure for understanding trade-offs among existing embedded MDP techniques, and experimenting with novel techniques.
A new instruction scheduling algorithm for Transport Triggered Architecture (TTA) is introduced. The proposed scheduling algorithm is based on operation-based two-level list scheduling and tries to aggressively bypass...
详细信息
ISBN:
(纸本)9781509030767
A new instruction scheduling algorithm for Transport Triggered Architecture (TTA) is introduced. The proposed scheduling algorithm is based on operation-based two-level list scheduling and tries to aggressively bypass data moves before scheduling them and resolves deadlocks by backtracking and bypassing less aggressively those moves that cause deadlocks. Compared to two earlier list schedulers for TTA processors, the proposed scheduler creates code that is on average 2.0 % and 2.2 % and best case of 15.2 % and 16.3 % faster while reducing the amount of register file reads by on average of 9.7 % and 6.9 % and best cases of 31.0 % and 19.0 %, and register file writes on average 18.0 % and 18.9 % and best cases of 48.1 % and 36.8 %. The scheduling time with the proposed scheduler is short enough for the algorithm to be used when performing design space exploration unlike in some instruction schedulers based on mathematical models. The scheduler also introduces a framework, which makes it very easy to be extended to support new optimizations.
Traditional design techniques for embeddedsystems apply transformations on the source code to optimize hardware-related cost factors. Unfortunately, such transformations cannot adequately deal with the highly dynamic...
详细信息
Traditional design techniques for embeddedsystems apply transformations on the source code to optimize hardware-related cost factors. Unfortunately, such transformations cannot adequately deal with the highly dynamic nature of today's multimedia applications. Therefore, we go one step back in the design process. Starting from a conceptual UML model, we first transform the model before refining it into executable code. This paper presents: various model transformations, an estimation technique for the steering cost parameters, and three case studies that show how our model transformations result in factors improvement in memory footprint and performance with respect to the initial implementation. (c) 2006 Elsevier B.V. All rights reserved.
3D stacking and integration can provide significant system advantages. Following a brief technology review, this abstract explores application drivers, design and CAD for 3D ICs. The main 3D exploitation explored in d...
详细信息
This book constitutes the proceedings of the 21;internationalconference on embeddedcomputersystems: architectures, modeling, and simulation, SAMOS 2021, which took place in July 2021. Due to COVID-19 pandemic the c...
详细信息
ISBN:
(数字)9783031045806
ISBN:
(纸本)9783031045790
This book constitutes the proceedings of the 21;internationalconference on embeddedcomputersystems: architectures, modeling, and simulation, SAMOS 2021, which took place in July 2021. Due to COVID-19 pandemic the conference was held virtually.
Application Specific Instruction Set Processors (ASIPs) seek for an optimal performance/area/energy trade-off for a given algorithm. In all current design methodologies an architectural model must be first manually cr...
详细信息
ISBN:
(纸本)9781479937707
Application Specific Instruction Set Processors (ASIPs) seek for an optimal performance/area/energy trade-off for a given algorithm. In all current design methodologies an architectural model must be first manually created based on designers experience. These models are increasingly refined until the design constraints are met, through several time consuming algorithmic/architecture co-exploration iterations. This paper presents a novel performance estimation approach that shortens the design cycle of existing methodologies by providing an early assessment of the impact of customizations on the achievable performance. The approach does so by eliminating the need for a completely specified architecture, without limiting designer's freedom and without simulating the application repeatedly. Overall, our approach reduces the number of necessary co-exploration iterations, thus increasing design productivity. We validate our approach via two different case studies: a butterfly-enabled ASIP for Fast Fourier Transform computation and a Connected Components Labeling ASIP for computer vision.
Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradig...
详细信息
ISBN:
(纸本)9783031150746;9783031150739
Computing systems have shifted towards highly parallel and heterogeneous architectures to tackle the challenges imposed by limited power budgets. These architectures must be supported by novel power management paradigms addressing the increasing design size, parallelism, and heterogeneity while ensuring high accuracy and low overhead. In this work, we propose a systematic, automated, and architecture-agnostic approach to accurate and lightweight DVFS-aware statistical power modeling of the CPU and GPU sub-systems of a heterogeneous platform, driven by the sub-systems' local performance monitoring counters (PMCs). Counter selection is guided by a generally applicable statistical method that identifies the minimal subsets of counters robustly correlating to power dissipation. Based on the selected counters, we train a set of lightweight, linear models characterizing each sub-system over a range of frequencies. Such models compose a lookup-table-based system-level model that efficiently captures the non-linearity of power consumption, showing desirable responsiveness and decomposability. We validate the system-level model on real hardware by measuring the total energy consumption of an NVIDIA Jetson AGX Xavier platform over a set of benchmarks. The resulting average estimation error is 1.3%, with a maximum of 3.1%. Furthermore, the model shows a maximum evaluation runtime of 500 ns, thus implying a negligible impact on system utilization and applicability to online dynamic power management (DPM).
暂无评论