A systematic approach to design space exploration of customisable options for multi-processor architectures is presented. This approach is used to explore a parameterisable system model as a part of a novel exploratio...
详细信息
ISBN:
(纸本)9781424419852
A systematic approach to design space exploration of customisable options for multi-processor architectures is presented. This approach is used to explore a parameterisable system model as a part of a novel exploration tool. Architecture trends are analysed through the variation of prefabrication choice of number of processing elements (PEs) and cache size. Of note, is the relationship between multi-threading and off-chip memory access. This is shown to reduce performance by tip to five times for a decimation case study. From the analysis of architecture trends, a post-fabrication choice of processing pattern is shown to provide up to three times improvement for a negligible area cost. In verification, the system model mimics the performance of sample graphics processors. This is achieved with a run time of only five minutes for a decimation case study with a model setup of eight processing elements and 16KB cache.
In today's design of embeddedsystems the software part is increasingly important. Over the last years we have observed a shift from hardware to software added value. With the rise of multi- and many-core platform...
详细信息
Current tools for embedded system design have limited support for modelling the interaction of the system with its physical environment. Furthermore, the natural representation of (streaming, real-time) applications w...
详细信息
A swapping algorithm for NAND flash memory based embeddedsystems is developed by combining data compression and an improved page update method. The developed method allows efficient execution of a memory demanding or...
详细信息
ISBN:
(纸本)354026969X
A swapping algorithm for NAND flash memory based embeddedsystems is developed by combining data compression and an improved page update method. The developed method allows efficient execution of a memory demanding or multiple applications without requiring a large size of main memory. It also helps enhancing the stability of a NAND flash file system by reducing the number of writes. The update algorithm is based on the CFLRU (Clean First LRU) method and employs some additional features such as selective compression and delayed swapping. The WKdm compression algorithm is used for software based compression while the LZO is used for hardware based implementation. The proposed method is implemented on an ARM9 CPU based Linux system and the performances in the execution of MPEG2 decoder, encoder, and gcc programs are measured and interpreted.
Energy-efficient scalable soft-output signal detectors are of significant interest in emerging Multiple-Input Multiple-Output (MIMO) wireless communication systems. However, traditional high-performance MIMO detectors...
详细信息
ISBN:
(纸本)9781424445011
Energy-efficient scalable soft-output signal detectors are of significant interest in emerging Multiple-Input Multiple-Output (MIMO) wireless communication systems. However, traditional high-performance MIMO detectors consume a rather high amount of power, are typically constraint to one modulation scheme and are not scalable with the number of antennas. Hence, they are not well-suited for future energy-efficient Software Defined Radio (SDR) platforms. This paper presents two energy-efficient scalable MIMO detector architectures: one optimized for high throughput, one for low area. Both architectures support 16-QAM as well as 64-QAM while offering soft-output and near-ML performance. The 2 x 2 high-throughput architecture was implemented in CMOS 65nm technology and subsequently scaled to 4 x 4 and 8x8. The 4 x 4 instance provides up to 300Mbps throughput while consuming only 0.3mm(2) area and 28mW power. The 8 x 8 instance offers a throughput 10 x better than the state-of-the-art while consuming 2/3 less power. Thus, the proposed near-ML Selective Spanning with Fast Enumeration (SSFE) based detector architectures are not only multi-standard capable and scalable, they are also highly efficient.
Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in t...
详细信息
ISBN:
(纸本)9783030275624;9783030275617
Low-level sensory data processing in many Internet-of-Things (IoT) devices pursue energy efficiency by utilizing sleep modes or slowing the clocking to the minimum. To curb the share of stand-by power dissipation in those designs, ultra-low-leakage processes are employed in fabrication. Those limit the clocking rates significantly, reducing the computing throughputs of individual cores. In this contribution we explore compensating for the substantial computing power needs of a vision application using massive parallelism. The Processing Elements (PE) of the design are based on Transport Triggered Architecture. The fine grained programmable parallel solution allows for fast and efficient computation of learnable low-level features (e.g. local binary descriptors and convolutions). Other operations, including Max-pooling have also been implemented. The programmable design achieves excellent energy efficiency for Local Binary Patterns computations.
Multi-sensor embeddedsystems often consist of a central unit being responsible to manage a heterogeneous set of attached sensors. Particularly when such systems are deployed in areas without access to a static power ...
详细信息
ISBN:
(纸本)9783031150746;9783031150739
Multi-sensor embeddedsystems often consist of a central unit being responsible to manage a heterogeneous set of attached sensors. Particularly when such systems are deployed in areas without access to a static power supply, they have to be powered using energy harvesting to operate autonomously. Objectives such as availability and data loss rate depend on the set of attached sensors, the system configuration (e.g., used photovoltaic (PV) module, batteries, and data storage), as well as environmental factors such as the location of the deployed system. Moreover, also the employed energy management strategy and its parametrization severely influence the system characteristics. In fact, different strategies can lead to different tradeoffs in terms of the above objectives. In this paper we propose a design methodology to automatically explore the design space of configurations of multi-sensor embeddedsystems and to determine and configure the best energy management strategy for a given sensor configuration and location. Our methodology includes a real-time analysis and a simulation-based DSE to explore the design space. We investigate a case study from a biomonitoring project and demonstrate the benefits of the proposed design methodology: A system-including its configuration and energy management strategy-has to be tailored to the characteristics of the set of attached sensors and the location it operates. Else designs exhibit suboptimal characteristics when operating at sites or for sensor sets for which they were not optimized.
VMODEX is an interactive visualization tool to support system-level Design Space Exploration (DSE) of MPSoC architectures. It was initially developed to help designers to get insight into the search process of Multi-O...
详细信息
This paper presents automated distribution of embedded real-time applications modeled in Unified modeling Language version 2.0 (UML 2.0). The automated distribution requires methods and tools for design automation, as...
详细信息
ISBN:
(纸本)3540364102
This paper presents automated distribution of embedded real-time applications modeled in Unified modeling Language version 2.0 (UML 2.0). The automated distribution requires methods and tools for design automation, as well as the run-time environment for the distributed execution on the target platform. Executable application code is generated from UML models, and UML with a custom profile is used to abstract hardware architecture and configure application mapping. For experimenting, a full featured WLAN terminal was designed in UML and implemented as a distributed multiprocessor system-on-chip (SoC) on an FPGA prototype platform. Measurements show that a 50-70% reduction in protocol delays is achived with distribution, and delay variations are reduced 45-85%.
The growth in embeddedsystems applications and sophistication increased the need for rapid development and modeling of embedded processors. embedded processors are usually application specific. This causes the strong...
详细信息
ISBN:
(纸本)9781424402717
The growth in embeddedsystems applications and sophistication increased the need for rapid development and modeling of embedded processors. embedded processors are usually application specific. This causes the strong need for modeling environments that can be used for rapid generation of detailed micro-architecture processor simulators. However, existing simulation tools in this category are far less mature and mostly commercial. This paper presents a generic cycle-accurate micro-architecture simulation framework for embedded processors. The framework is designed to generate an RTL (Register Transfer Level) cycle accurate simulator. The framework is built in Java to provide features like extensibility, ability to be changed easily and platform independence. It provides the above features while being as fast as most known available frameworks. The paper uses ARM1022E as an example for embedded processors due to its wide range of applications like modems, cellular phones and automobiles. It simulates its two instruction set architectures (ISA): ARM (32-bit ISA) and THUMB (16-bit ISA). The paper verifies the framework by comparing the ARM simulator with ARMulator (from ARM Ltd.). It also compares the current simulation speed with available known frameworks. Lastly, the paper provides a study of ADPCM (Adaptive Differential Pulse Code Modulation) decode performance on the ARM1022E processor using the framework.
暂无评论