Contemporary embeddedsystems are often designed as Multiprocessor System-on-Chips (MPSoC) which include multiple processors and other peripherals on a single chip. In contrast to general purpose multiprocessors, the ...
详细信息
The work presented here is on a methodology for design of hard real-time embedded control software for robots, i.e. mechatronic products. The behavior of the total robot system (machine, control, software and I/O) is ...
详细信息
ISBN:
(纸本)9781467322973
The work presented here is on a methodology for design of hard real-time embedded control software for robots, i.e. mechatronic products. The behavior of the total robot system (machine, control, software and I/O) is relevant, because the dynamics of the machine influences the robot software. Therefore, we use two appropriate Models of Computation, which represent continuous-time equations for the machine / robot part, and discrete event / discrete time equations for the control software part. To compute (simulate) such combined models, co-simulation of these models is used. The design work can be done as a stepwise refinement process, whereby each step is verified via co-simulation. This in general yields a shorter design time, and a better quality product. The tools pass model-specific information between each other via parametrized tokens in the generated, high-level code to get a better separation of design steps. This allows for better quality of the models and more reuse, thus enhancing the efficiency of model-driven design for the (industrial) end user. The method is illustrated with a case study using the tools, some of which are at the prototype level. Especially the structuring of the models and regularly doing simulations (of which some can be 'repeated' as real experiments), is beneficial, shortening the development time and producing better models. Future work is to test the method on more complex cases, and to extend the method by detailing out the electronics and mechanics sub design flows.
Recently, three-dimensional integration technology has allowed researchers and designers to explore novel architectures for computing systems. Due to the memory-intensive nature of signal processing systems, DSPs can ...
详细信息
In this paper, we consider flux caches prefetching and a media application. We analyze the MPEC4 encoder workload with realistic data set in a scenario representative for the embeddedsystems domain. Our study shows t...
详细信息
ISBN:
(纸本)3540364102
In this paper, we consider flux caches prefetching and a media application. We analyze the MPEC4 encoder workload with realistic data set in a scenario representative for the embeddedsystems domain. Our study shows that different well known data prefetch mechanisms can gain little reduction in the cache miss ratios when applied on the complete MPEG4 application. Furthermore, we investigate the potential improvement when dedicated prefetching strategies are applied to the sum of absolute differences (SAD) kernels in MPEG4. We propose a flux cache mechanism that dynamically invokes cache designs with dedicated prefetching engines that can fully utilize the available memory bandwidth. We show that our proposal improves the cache miss ratios by a factor close to 3x.
This book constitutes the proceedings of the 22st internationalconference on embeddedcomputersystems: architectures, modeling, and simulation, SAMOS 2021, which took place in July 2022 in Samos, Greece. The 11...
详细信息
ISBN:
(数字)9783031460777
ISBN:
(纸本)9783031460760
This book constitutes the proceedings of the 22st internationalconference on embeddedcomputersystems: architectures, modeling, and simulation, SAMOS 2021, which took place in July 2022 in Samos, Greece. The 11 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 45 submissions. The conference covers a wide range of embeddedsystems design aspects, including machine learning accelerators, and power management and programmable dataflow systems.
Wireless sensor nodes are increasingly being tasked with computation and communication intensive functions while still subject to constraints related to energy availability. On these embedded platforms, once all low p...
详细信息
ISBN:
(纸本)9783540736226
Wireless sensor nodes are increasingly being tasked with computation and communication intensive functions while still subject to constraints related to energy availability. On these embedded platforms, once all low power design techniques have been explored, duty-cycling the various subsystems remains the primary option to meet the energy and power constraints. This requires the ability to provide spurts of high MIPS and high bandwidth connections. However, due to the large overheads associated with duty-cycling the computation and communication subsystems, existing high performance sensor platforms are not efficient in supporting such an option. In this paper, we present the design and optimizations taken in a wireless gateway node (WGN) that bridges data from wireless sensor networks to Wi-Fi networks in an on-demand basis. We discuss our strategies to reduce duty-cycling related costs by partitioning the system and by reducing the amount of time required to activate or deactivate the high-powered components. We compare the design choices and performance parameters with those made in the Intel Stargate platform to show the effectiveness of duty-cycling on our platform. We have built a working prototype, and the experimental results with two different power management schemes show significant reductions in latency and average power consumption compared to the Stargate.
In embedded multiprocessors cache partitioning is a known technique to eliminate inter-task cache conflicts, so to increase predictability. On such systems, the partitioning ratio is a parameter that should be tuned t...
详细信息
ISBN:
(纸本)1424401550
In embedded multiprocessors cache partitioning is a known technique to eliminate inter-task cache conflicts, so to increase predictability. On such systems, the partitioning ratio is a parameter that should be tuned to optimize performance. In this paper we propose a Simulated Annealing (SA) based heuristic to determine the cache partitioning ratio that maximizes an application's throughput. In its core, the SA method iterates many times over many partitioning ratios, checking the resulted throughput. Hence the throughput of the system has to be estimated very fast, so we utilize a light simulation strategy. The light simulation derives the throughput from tasks I timings gathered off-line. This is possible because in an environment where tasks don't interfere with each other, their performance figures can be used in any possible combination. An application of industrial relevance (H.264 decoder) running on a parallel homogeneous platform is used to demonstrate the proposed method. For the H.264 application 9% throughput improvement is achieved when compared to the throughput obtained using methods of partitioning for the least number of misses. This is a significant improvement as it represents 45% from the theoretical throughput improvement achievable when assuming an infinite cache.
Predicting the performance of Artificial Neural Networks (ANNs) on embedded multi-core platforms is tedious. Concurrent accesses to shared resources are hard to model due to congestion effects on the shared communicat...
详细信息
ISBN:
(纸本)9783031150746;9783031150739
Predicting the performance of Artificial Neural Networks (ANNs) on embedded multi-core platforms is tedious. Concurrent accesses to shared resources are hard to model due to congestion effects on the shared communication medium, which affect the performance of the application. Most approaches focus therefore on evaluation through systematic implementation and testing or through the building of analytical models, which tend to lack of accuracy when targeting a wide range of architectures of varying complexity. In this paper we present a hybrid modeling environment to enable fast yet accurate timing prediction for fully-connected ANNs deployed on multi-core platforms. The modeling flow is based on the integration of an analytical computation time model with a communication time model which are both calibrated through measurement inside a system level simulation using SystemC. The ANN is described using the Synchronous DataFlow (SDF) Model of Computation (MoC), which offers a strict separation of communications and computations and thus enables the building of separated computation and communication time models. The proposed flow enables the prediction of the end-to-end latency for different mappings of several fully-connected ANNs with an average of 99.5% accuracy between the created models and real implementation.
In this paper we describe an efficient data fetch circuitry for retrieving several operands from a n-bank interleaved memory system in a single machine cycle. The proposed address generation (AGEN) unit operates with ...
详细信息
ISBN:
(纸本)9783540736226
In this paper we describe an efficient data fetch circuitry for retrieving several operands from a n-bank interleaved memory system in a single machine cycle. The proposed address generation (AGEN) unit operates with a modified version of the low-order-interleaved memory access approach. Our design supports data structures with arbitrary lengths and different (odd) strides. A detailed discussion of the 32-bit AGEN design aimed at multiple-operand functional units is presented. The experimental results indicate that our AGEN is capable of producing 8 x 32-bit addresses every 6 ns for different stride cases when implemented on VIRTEX-II PRO xc2vp30-7ff1696 FPGA device using trivial hardware resources.
Software defines the functionality of today's Cyber-Physical systems (CPS). Many product innovations are based on software and thus the complexity of software, even when running on platforms equipped with small mi...
详细信息
ISBN:
(纸本)9781509030767
Software defines the functionality of today's Cyber-Physical systems (CPS). Many product innovations are based on software and thus the complexity of software, even when running on platforms equipped with small microprocessors, is increasing dramatically. This calls for adequate embedded software integration testing, even before the actual hardware platform is available. The application of virtual platforms for functional validation, that allows simulating CPS running real target platform application code on a generic host computer, is currently being adopted by the industry. Since the correct behavior of a CPS not only depends on the correctness of computation but also on its timeliness, virtual platforms contain a certain notion of time. This work focuses on enhancing OVP processor models by a quasi-cycle accurate timing model. This paper demonstrates and evaluates the accuracy of the proposed timing model against real hardware measurements for the Xilinx MicroBlaze and ARM Cortex-M0 processors. Results show a mean error of 0.16% for the MicroBlaze and 0.72% for the ARM Cortex-M0 processor over all considered benchmarks, which is a clear improvement compared to previous published work.
暂无评论