This paper presents a novel technique for the modeling and the simulation of parallel applications for Multi-Processor systems-on-Chip (MPSoCs). This technique consists of an application-transparent emulation of OS pr...
详细信息
ISBN:
(纸本)9781605584706
This paper presents a novel technique for the modeling and the simulation of parallel applications for Multi-Processor systems-on-Chip (MPSoCs). This technique consists of an application-transparent emulation of OS primitives, includ-ing task creation, scheduling, synchronization etc.;this emu-lation guarantees compatibility with any program compiled against the standard POSIX library, independently of the target OS. This methodology can be used to perform initial HW/SW partitioning and concurrent engineering of a given application, as it allows any software routine to be trans-parently emulated with systemC modules. The proposed approach has been verified on a large set of multi-threaded benchmarks, with both POSIX Threads and OpenMP pro-gramming styles. Results show that our methodology en-ables (a) fast simulation of POSIX applications, (b) accurate analysis of multi-threaded applications, and (c) co-design and fast preliminary hardware-software partitioning. This work has been partially supported by the European HiPEAC network of excellence. Copyright 2008 ACM.
Using traditional software profiling to optimize embedded software in an MPSoC design is not reliable. With multiple processors running concurrently and programs interacting, traditional profiling on individual proces...
详细信息
ISBN:
(纸本)9781605584706
Using traditional software profiling to optimize embedded software in an MPSoC design is not reliable. With multiple processors running concurrently and programs interacting, traditional profiling on individual processors cannot capture useful execution information to assist software optimization. A new method to model parallel executions of interacting programs is needed. In this paper, we consider the software optimization problem for throughput-constrained MPSoC designs. We define the "longest delay path" as a sequence of steps leading to a throughput constraint violation and propose an algorithm to build up the path dynamically during simulation. Using an industrial-strength MPEG-2 decoder design in our case study and custom instructions for software optimization, we show that we can optimize the software efficiently in MPSoC designs using frequently executed statement information from the longest delay path. Copyright 2008 ACM.
Instruction set simulation and real time operating system modeling have become important issues for the design of distributed embedded systems. This paper presents a holistic approach to simulate a distributed, embedd...
详细信息
ISBN:
(纸本)9781605584706
Instruction set simulation and real time operating system modeling have become important issues for the design of distributed embedded systems. This paper presents a holistic approach to simulate a distributed, embedded system that includes target software, processing units, and abstract RTOS within a virtual prototype environment. The processing unit is modeled by an ISS, which is embedded in a systemC environment to allow the integration into a platform model. In comparison to existing approaches, the RTOS is not directly running on the ISS but outsourced and replaced by an RTOS model. This step strongly reduces simulation time since the execution on the ISS is much more time consuming in contrast to the execution on the host processor. The results show the theoretical and measured performance gain depending on the RTOS scheduler and task switching. Copyright 2008 ACM.
Application-specific system-on-chip platforms create the opportunity to customize the cache configuration for optimal performance with minimal chip estate. Simulation, in particular trace-driven simulation, is widely ...
详细信息
ISBN:
(纸本)9781605584706
Application-specific system-on-chip platforms create the opportunity to customize the cache configuration for optimal performance with minimal chip estate. Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates. However, simulation is too slow to be deployed in the design space exploration, specially when it involves hundreds of design points and huge traces or long program execution. In this paper, we propose a novel static analysis technique for rapid and accurate design space exploration of instruction caches. Given the program control flow graph (CFG) annotated only with basic block and control flow edge execution counts, our analysis estimates the hit rates for multiple cache configurations in one pass. We achieve this by modeling the cache states at each node of the CFG in probabilistic manner and exploiting the structural similarities among related cache configurations. Experimental results indicate that our analysis is 24-3, 855 times faster compared to the fastest known cache simulator while maintaining high accuracy (0.7% average error), in predicting hit rates for popular embedded benchmarks. Copyright 2008 ACM.
Open Core Protocol (OCP) is a standard on-chip core interface specification. The current release is flexible and configurable to support the communication needs of a wide range of Intellectual Property cores, and is n...
详细信息
ISBN:
(纸本)9781605584706
Open Core Protocol (OCP) is a standard on-chip core interface specification. The current release is flexible and configurable to support the communication needs of a wide range of Intellectual Property cores, and is now in widespread use. However, it does not support system-level coherence. This paper summarizes an effort within the OCP-IP cache coherence working group on incorporating cache coherence extensions into OCP, which is expected to have strong impact on the MPSoC industry. In this paper, we propose a backward-compatible coherent Open Core Protocol interface and discuss the design challenges and implications introduced. This interface is flexible and can support a range of coherence protocols and schemes: we show how it can specify a snoopy bus-based scheme as well as a directory-based scheme. The correctness of the specification and models was verified using NuSMV, via exploring the entire state space for the two basic coherence schemes. Copyright 2008 ACM.
software-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation algorithms typically consider single proc...
详细信息
ISBN:
(纸本)9781605584706
software-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation algorithms typically consider single process applications. But embedded applications are mostly multi-tasking with real-time constraints, where the scratchpad memory space has to be shared among interacting processes that may preempt each other. In this paper, we develop a novel dynamic scratchpad allocation technique that takes these process interferences into account to improve the performance and predictability of the memory system. We model the application as a Message Sequence Chart (MSC) to best capture the interprocess interactions. Our goal is to optimize the worst-case response time (WCRT) of the application through runtime reloading of the scratchpad memory content at appropriate execution points. We propose an iterative allocation algorithm that consists of two critical steps: (1) analyze the MSC along with the existing allocation to determine potential interference patterns, and (2) exploit this interference information to tune the scratchpad reloading points and content so as to best improve the WCRT. We evaluate our memory allocation scheme on a real-world embedded application controlling an Unmanned Aerial Vehicle (UAV). Copyright 2008 ACM.
Streaming applications can be implemented with a pipeline of processors. Each processor in the pipeline can be an Application Specific Instruction Set Processor (ASIP) with the result being a heterogeneous pipelined M...
详细信息
ISBN:
(纸本)9781605584706
Streaming applications can be implemented with a pipeline of processors. Each processor in the pipeline can be an Application Specific Instruction Set Processor (ASIP) with the result being a heterogeneous pipelined MPSoC system. Since ASIPs can be of differing configurations, finding the optimal set of configurations for a multiprocessor architecture is a difficult problem. In this paper, we obtain an optimal system design for a set of processors which execute a multimedia application. The variables in the system are the presence or absence of different additional instructions and differing cache configurations for each of the processors. The problem is formulated as a 0-1 Integer Linear Programming (ILP) problem. To reduce the complexity of the ILP formulation, inferior ASIP configurations are efficiently pruned so that the solution could be reached quickly. Given a system runtime constraint, the proposed methodology finds a design with minimal area. We integrated this design methodology into a commercial design flow, and performed a case study upon the JPEG encoding application. We obtained 15 optimal designs subject to 15 different runtime constraints, each in less than 100 seconds from more than 4.2 × 10 13 design points. Copyright 2008 ACM.
With power becoming a major constraint for multi-processor embedded systems, it is becoming important for designers to characterize and model processor power dissipation. It is critical for these processor power model...
详细信息
ISBN:
(纸本)9781605584706
With power becoming a major constraint for multi-processor embedded systems, it is becoming important for designers to characterize and model processor power dissipation. It is critical for these processor power models to be useable across various modeling abstractions in an electronic system level (ESL) design flow, to guide early design decisions. In this paper, we propose a unified processor power modeling methodology for the creation of power models at multiple granularity levels that can be quickly mapped to an ESL design flow. Our experimental results based on applying the proposed methodology on an OpenRISC processor demonstrate the usefulness of having multiple power models. The generated models range from very high-level two-state and architectural/ISS models that can be used in transaction level models (TLM), to extremely detailed cycle-accurate models that enable early exploration of power optimization techniques. These models offer a designer tremendous flexibility to trade off estimation accuracy with estimation/simulation effort. Copyright 2008 ACM.
This paper describes the acceleration of calculations for public-key cryptography on hyperelliptic curves on very small FPGAs. This is achieved by using a Haxdware/softwarecodesign Approach starting with an all-softw...
详细信息
ISBN:
(纸本)9783540781523
This paper describes the acceleration of calculations for public-key cryptography on hyperelliptic curves on very small FPGAs. This is achieved by using a Haxdware/softwarecodesign Approach starting with an all-software implementation on an embedded Microprocessor and migrating very time-consuming calculations from software to hardware. Basic GF(2n)-hardware extensions are connected to work in conjunction with the Microprocessor and possible alternatives for connecting external hardware to the Microprocessor are investigated. The performance of the hardware implementations compared to their counterparts as a software approach are evaluated. Based on these results, a coprocessor is devised and optimized for performance. The system utilizes minimal resources and fits easily on a small FPCA. It allows for fast Hyperelliptic Curve Cryptography (HECC) operations while running at a very low clock speed of 33 MHz, thus making it suitable for usage in embedded systems.
The paper gives an insight to the verification of hardware/softwarecodesign of embedded systems by considering the network component commonly seen in the today's embedded systems. The approach tries to provide a ...
详细信息
暂无评论