When designing complex cyber-physical systems, engineers have to integrate numerical models from different modeling environments in order to simulate the whole system and estimate its global performances. Co-simulatio...
详细信息
ISBN:
(纸本)9781538677599
When designing complex cyber-physical systems, engineers have to integrate numerical models from different modeling environments in order to simulate the whole system and estimate its global performances. Co-simulation refers to such joint simulation of heterogeneous models. If some parts of the system are physically available, it is possible to connect these parts to the co-simulation in a Hardware-in-the-Loop (HiL) approach. In this case, the simulation has to be performed in real-time where models execution consists in periodically reacting to the real (physically available) components and providing periodic output updates. This paper deals with the parallelization and scheduling of real-time Hardware-in-the-Loop co-simulation of numerical models on multi-core architectures. A method for defining real-time constraints that have to be met is proposed. Also, an ILP formulation as well as a heuristic are proposed to solve the problem of scheduling the co-simulation on a multi-core architecture while satisfying the previously defined real-time constraints. The proposed approach is evaluated for different sizes of co-simulations and multi-core processors.
The amount and diversity of connected computing platforms in the Internet of Things (IoT) is expected to increase exponentially throughout the next years, together with their dependability requirements. This imposes m...
详细信息
ISBN:
(纸本)9781538691205
The amount and diversity of connected computing platforms in the Internet of Things (IoT) is expected to increase exponentially throughout the next years, together with their dependability requirements. This imposes many challenges to software and hardware developers and calls for safe and secure real-time operating systems (RTOSs) that are portable to different or changing hardware. Middleware ports, including RTOS ports, must keep functional and non-functional behavior constant towards the application. Current middleware portability approaches for embeddedsystems, however, are arduous and error prone. We present a novel approach towards portability of embedded RTOSs based on the formal, hardware-independent and detailed specification of RTOS kernels. With additional models of relevant MCU properties and instruction set architectures (ISA), we are able to generate low level RTOS code for different target architectures. This paper focuses on the hardware-independent model of the context switch within a multi-tasking RTOS. With the general approach, we expect to (1) reduce the effort for maintaining and porting RTOS code, as well as the (2) likeliness for errors, (3) make it easier to test new kernel concepts during OS development, (4) improve security by modeling different levels of access permissions for memory or peripherals depending on the execution mode, and (5) improve safety by formally proving the correctness and consistency of the models.
In this paper, we propose a new approach for the predictability and optimality of the inter-core communication and execution of tasks allocated on different cores of multicore architectures. Our approach is based on t...
详细信息
In this paper, we propose a new approach for the predictability and optimality of the inter-core communication and execution of tasks allocated on different cores of multicore architectures. Our approach is based on the execution of synchronous programs written in the ForeC programming language on deterministic architectures called PREcision Timed. The originality of the work resides in the time-triggered model of computation and communication that allows for a very precise control over the thread execution. Synchronization is done via configurable Time Division Multiple Access (TDMA) arbitrations where the optimal size and offset of the time slots are computed to reduce the inter-core synchronization costs. We implemented a robotic application and simulated it using MORSE, a robotic simulation environment. Results show that the model we propose guarantees time-predictable inter-core communication, the absence of concurrent accesses (without relying on hardware mechanisms), and allows for optimized execution throughput.
The ACM/IEEE international Symposium on computer Architecture (ISCA) conference is one of the premier forums for presenting, debating and advanc- ing new ideas and experimental results in computer architecture. Accord...
详细信息
In this era, the requirement of high-performance computing at low power cost can be met by the parallel execution of an application on a large number of programmable cores. Emerging many-core architectures provide den...
详细信息
ISBN:
(纸本)9781538655641
In this era, the requirement of high-performance computing at low power cost can be met by the parallel execution of an application on a large number of programmable cores. Emerging many-core architectures provide dense interconnection fabrics leading to new communication requirements. In particular, the effective exploitation of synchronous and asynchronous channels for fast communication from/to internal cores and external devices is a key issue for these architectures. In this paper, we propose a methodology for clustering sequential commands used for configuring the parallel execution of tasks on a globally asynchronous locally synchronous multi-chip many-core neuromorphic platform. With the purpose of reducing communication costs and maximise the exploitation of the available communication bandwidth, we adapted the Multiple Sequence Alignment (MSA) algorithm for clustering the unicast streams of packets used for the configuration of each core so as to generate a coherent multicast stream that configures all cores at once. In preliminary experiments, we demonstrate how the proposed method can lead up to a 97% reduction in packet transmission thus positively affecting the overall communication cost.
In this paper, we propose an approach for designing application-specific heterogeneous systems based on performance models through combining accelerator and processor core models. An application-specific program is pr...
详细信息
In this paper, we propose an approach for designing application-specific heterogeneous systems based on performance models through combining accelerator and processor core models. An application-specific program is profiled by the dynamic execution trace and is used to construct a data flow model of the accelerator. modeling of the processor is partitioned into an instruction set architecture (ISA) execution and a micro-architecture specific timing model. These models are implemented on FPGAs to take advantage of their parallelism and speed up the simulation when architecture complexity increases. This approach aims to ease the design of multi-core multi-accelerator architecture, consequently contributes to explore the design space by automating the design steps. A case study is conducted to confirm that presented design flow can model the accelerator starting from an algorithm, validate its integration in a simulation framework, allowing precise performance to be estimated. We also assess the performance of our RISC-V single-core and RISC-V-based heterogeneous architecture models.
The proceedings contain 33 papers. The topics discussed include: a performance evaluation of multi-FPGA architectures for computations of information transfer;massively parallel computation of linear recurrence equati...
ISBN:
(纸本)9781450364942
The proceedings contain 33 papers. The topics discussed include: a performance evaluation of multi-FPGA architectures for computations of information transfer;massively parallel computation of linear recurrence equations with graphics processing units;a first-order approximation of microarchitecture energy-efficiency;delays and states in dataflow models of computation;communication-aware scheduling algorithms for synchronous dataflow graphs on multicore systems;towards power management verification of time-triggered systems using virtual platforms;architectural considerations for FPGA acceleration of machine learning applications in MapReduce;and fast parallel simulation of a manycore architecture with a flit-level on-chip network model.
A large number of different applications are associated with different types of embeddedsystems where sensors play the key role in the creating of a particular view of an environment a system is being operated in. Em...
详细信息
ISBN:
(纸本)9781728140704
A large number of different applications are associated with different types of embeddedsystems where sensors play the key role in the creating of a particular view of an environment a system is being operated in. embeddedsystems are often characterized as the soft- or hard-real-time systems, with high requirements for safety, thus imposing strict requirements for the timing behavior and accuracy of sensors in order to ensure determinism and dependability of a system. At early stage of a system design, analysis or optimization, the satisfaction of the requirements can be checked with models of sensors. The authors aim to investigate the timing performance and accuracy achieved during simulation of the same sensor model implemented in two different ways: as a software artifact and as a field programmable gate arrays (FPGA) solution. This article constitutes a part of the research activities defined in [1].
Deep learning is rapidly becoming a strong boost to the already pervasive field of computer vision. State-of-the-art Convolutional Neural Networks reach accuracies comparable to human senses. However, the high computa...
详细信息
ISBN:
(纸本)9781538695623
Deep learning is rapidly becoming a strong boost to the already pervasive field of computer vision. State-of-the-art Convolutional Neural Networks reach accuracies comparable to human senses. However, the high computational load and low energy efficiency make their implementation on modern embeddedsystems hard. In this paper, several strategies for designing fast convolutional engines suitable to hardware accelerate Convolutional Neural Networks are evaluated. When implemented within a complete embedded system based on a Zynq Ultrascale+ SoC device, two of the proposed architectures achieve a peak performance of 131.6 GMAC/s at 234MHz running frequency, by occupying at most similar to 13% of the DSP slices available on chip. All the proposed engines overcome state-of-the-art competitors, exhibiting a performance/DSP utilization ratio up to 29.6 times higher.
暂无评论