Modern embeddedsystems increasingly accommodate several applications running concurrently on a multiprocessor platform managed by a real-time operating system (RTOS). The increasing design complexity of such systems ...
详细信息
ISBN:
(纸本)9781538634370
Modern embeddedsystems increasingly accommodate several applications running concurrently on a multiprocessor platform managed by a real-time operating system (RTOS). The increasing design complexity of such systems calls for good design tools to evaluate real-time performance during the very early stages of design. To this end, fast system-level simulators that allow for efficient hardware/software co-simulation are essential. In this paper, we present SysRT, a generic and high-level RTOS simulator that is highly suited for early design space exploration (DSE). The simulator contains different types of application models and a modular RTOS kernel model, all developed in SystemC. Efficient and precise modeling of preemptive scheduling is achieved via an event-driven simulation approach, allowing simulations to be performed much faster than cycle-accurate simulations. At the same time, the kernel model is developed to be generic and modular to support for easy plug-in of new schedulers as well as new resource sharing protocols. Comparing SysRT with state-of-art simulators, it achieves faster simulation speeds with an identically small simulation error. We demonstrate the flexibility of SysRT and its benefits for early DSE using experiments with a mixed workload executing on multiprocessor platforms with different numbers of cores.
Exploration of task mappings has an important role to achieve high performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms. The application workloads in modern MPSoC-based embeddedsystems are be...
详细信息
ISBN:
(纸本)9781538634370
Exploration of task mappings has an important role to achieve high performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms. The application workloads in modern MPSoC-based embeddedsystems are becoming increasingly dynamic. Different applications concurrently execute and contend for resources in such systems. In this paper, a run-time algorithm is proposed to analytically evaluate the system throughput of to-be-executed applications (modelled as Kahn Process Networks, KPNs) in order to quickly determine a proper resource binding for these applications. Merging transformations on the KPNs are applied to capture the cases in which the number of processes in the KPN is larger than the number of available processing resources, thereby modeling the effects of binding multiple processes to a single processor. We evaluated our algorithm using a heterogeneous MPSoC system with several applications. Our experimental results revealed that during runtime, the performance of selected mapping with regard to available resources is close to the optimal performance obtained by exhaustive search and simulation. Therefore, the results clearly confirm that our algorithm is effective.
Exposed datapath processor architectures allow the software to bypass register usage by directly moving intermediate results between processing units with suitable instructions. Synchronous Control Asynchronous Datafl...
详细信息
ISBN:
(纸本)9781538634370
Exposed datapath processor architectures allow the software to bypass register usage by directly moving intermediate results between processing units with suitable instructions. Synchronous Control Asynchronous Dataflow (SCAD) is a new exposed datapath architecture consisting of a grid of processing units with buffers to store their input and output values. Code generation inspired from queue machines can utilize the bypassing capability of such exposed datapath architectures with buffered processing units and can completely eliminate the use of registers this way. In this paper, we consider different execution paradigms of SCAD that make a compromise between hardware complexity, hardware scalability, and the use of instruction-level parallelism. To that end, we study both resource- and time-optimal code for all variants that we determine by satisfiability modulo theories (SMT) solvers. Experimental results show that the execution paradigm followed by our original SCAD architecture makes a reasonable compromise between hardware complexity and the use of instruction-level parallelism while maintaining hardware scalability.
Recursive programs that typically implement divide-and-conquer algorithms are well-suited for multicore systems, as they offer a high degree of parallelization potential. So far, existing parallelizing compilers have ...
详细信息
ISBN:
(纸本)9781538634370
Recursive programs that typically implement divide-and-conquer algorithms are well-suited for multicore systems, as they offer a high degree of parallelization potential. So far, existing parallelizing compilers have mainly focused on extracting other parallel patterns, such as data or pipeline level parallelism. In this paper, we propose a toolflow for the extraction of recursion level parallelism for embedded multicore systems. To achieve this, the toolflow verifies not only the mutual independence of recursive call-sites, but also selects an appropriate task granularity to ensure a good trade-off between load balancing and parallelization overhead. Profitable parallelization opportunities are implemented by using compiler directives from the OpenMP tasking model. Results show the effectiveness of our toolflow, as it is able to speedup sequential recursive programs between 2.5x and 3.8x on a quad-core platform.
The functionality of DRAMs, especially the state transitions are described in JEDEC standards. These standards contain a finite state machine, which intends to provide an overview of the possible state transitions and...
详细信息
ISBN:
(纸本)9781538634370
The functionality of DRAMs, especially the state transitions are described in JEDEC standards. These standards contain a finite state machine, which intends to provide an overview of the possible state transitions and the commands to control them. However, today's DRAMs are highly concurrent devices as they provide bank parallelism. The state diagram used in JEDEC standards does not model this concurrency and furthermore it is misleading in several aspects. In this paper, for the first time we present an easily comprehensive model of the DRAM states and transitions, using a Petri Net, which covers also the DRAM concurrency.
SystemC TLM based virtual prototypes have become the main tool in industry and research for concurrent hardware and software development, as well as hardware design space exploration. However, there exists a lack of a...
详细信息
ISBN:
(纸本)9781538634370
SystemC TLM based virtual prototypes have become the main tool in industry and research for concurrent hardware and software development, as well as hardware design space exploration. However, there exists a lack of accurate, free, changeable and realistic SystemC models of modern CPUs. Therefore, many researchers use the cycle accurate open source system simulator gem5, which has been developed in parallel to the SystemC standard. In this paper we present a coupling of gem5 with SystemC that offers full interoperability between both simulation frameworks, and therefore enables a huge set of possibilities for system level design space exploration. Furthermore, we show that the coupling itself only induces a relatively small overhead to the total execution time of the simulation.
The migration of sequential embedded software to multicore processors is a challenging task. Parallelization of software introduces concurrency bugs (e.g. data races), which only conditionally appear during testing be...
详细信息
ISBN:
(纸本)9781538634370
The migration of sequential embedded software to multicore processors is a challenging task. Parallelization of software introduces concurrency bugs (e.g. data races), which only conditionally appear during testing because they strongly depend on the timing of the execution. Therefore, traditional testing approaches cannot efficiently test concurrent software. More appropriate are analysis approaches that prove the absence of software faults. Current approaches often produce false positives as they fail to consider all relevant synchronization sources. In this paper, we complement current analysis techniques by considering a scheduling scheme as a synchronization mechanism. We narrow the analysis by analyzing only relevant variants in execution timing that might produce concurrency bugs. Therefore, we eliminate a family of false positives caused by ignoring the scheduling synchronization. Engineers can optimize this scheduling scheme to satisfy different requirements. Our approach uses virtual prototyping to enable design space exploration of systems with complex scheduling schemes by investigating the influence of the scheduling scheme on the synchronization of concurrent software.
An approach for mapping applications represented as Directed Acyclic Graphs (DAGs) on platforms consisting of heterogeneous cores considering the communication overhead between the cores is introduced. The approach is...
详细信息
ISBN:
(纸本)9781538634370
An approach for mapping applications represented as Directed Acyclic Graphs (DAGs) on platforms consisting of heterogeneous cores considering the communication overhead between the cores is introduced. The approach is based on the Benders decomposition principle and integrates Integer Linear and Constraint Programming formulations. Both formulations take into account the communication delay between dependent tasks that are assigned to different cores trying to optimize the application's execution time. The proposed approach succeeds to provide the optimal solution in all cases of synthetic and real-application DAGs, while the pure ILP model fails more than half of them. Also, the average solution time of the proposed method is about 1 minute, whereas for instances solved by both models, the speedup equals to 11x over the ILP model.
The virtual network embedding (VNE) problem of mapping virtual network (VN) requests to a substrate network is a key component of network virtualization in datacenters. In a bid to improve datacenter network's per...
详细信息
ISBN:
(纸本)9781538642818
The virtual network embedding (VNE) problem of mapping virtual network (VN) requests to a substrate network is a key component of network virtualization in datacenters. In a bid to improve datacenter network's performance and cost, there has been recent interest in "reconfigurable" network architectures, wherein the network topology can be changed at runtime to better handle current traffic patterns. Such reconfigurable networks seem naturally well-suited for efficient network virtualization-as networks can be "tailored" to accommodate the incoming VN requests. Motivated by the above, in this paper, we address the problem of virtual network embedding in reconfigurable networks;to the best of our knowledge, this has not been addressed before. In particular, we address the VNE problem in reconfigurable networks under two different models of VN link demands: fixed-bandwidth and stochastic-bandwidth demands. The former is the traditional model, while we propose the the latter to improve network utilization and leverage the runtime reconfiguration capability of reconfigurable networks. For the stochastic demand model, we employ a novel concept of embedding with "runtime-binding," wherein the embedding of a VN link is "configured" at runtime (via network reconfiguration) depending on the prevailing network state and traffic. We evaluate the efficiency of our proposed models and techniques via simulation using real VN requests and traffic statistics from large datacenters, and show that our proposed models and techniques offer significant performance advantages (up to 30-40%) over traditional models.
The stray currents are the fraction of traction return current returning by earth in at grade configurations, or by metallic structures (reinforcement bars) in case of viaducts and tunnels. These currents expose steel...
详细信息
暂无评论