Time-consuming cycle-accurate MPSoC simulation is often needed for debugging and verification. Its practicability is put at risk by the growing MPSoC complexity. This work presents a conservative synchronous parallel ...
详细信息
ISBN:
(纸本)9781605589053
Time-consuming cycle-accurate MPSoC simulation is often needed for debugging and verification. Its practicability is put at risk by the growing MPSoC complexity. This work presents a conservative synchronous parallel simulation approach along with a systemC framework to accelerate tightly-coupled MPSoC simulations on multi-core hosts. Key contribution is the implementation strategy, which utilizes techniques from the high-performance computing domain. Results show speed-ups of up to 4.4 on four host cores.
We present an approach to the automatic derivation of executable Process Network specifications from Weakly Dynamic Applications. We introduce the notions of Dynamic Single Assignment Code, Approximated Dependence Gra...
详细信息
ISBN:
(纸本)1581137427
We present an approach to the automatic derivation of executable Process Network specifications from Weakly Dynamic Applications. We introduce the notions of Dynamic Single Assignment Code, Approximated Dependence Graph, and Linearly Bounded Sets to model and capture weakly dynamic (data-dependent) behavior of applications at the task-level of abstraction. Process Networks are simple parallel processing models that match the emerging multi-processor architectures in the sense that the mapping of Process Network specifications of applications onto multi-processor architectures can be done in a systematic and transparent way.
Multimedia computing systems are typically heterogeneous multiprocessors which are highly optimized for both performance and implementation cost. This makes them well-suited to hardware/software co-design techniques, ...
详细信息
ISBN:
(纸本)0819425842
Multimedia computing systems are typically heterogeneous multiprocessors which are highly optimized for both performance and implementation cost. This makes them well-suited to hardware/software co-design techniques, which simultaneously optimize the hardware and software architectures of a system to meet system requirements. This paper surveys important results in co-design with emphasis on techniques necessary for the design of multimedia computing systems, particularly the analysis and synthesis of memory systems.
Transformative applications are a class of dataflow computation characterized by iterative behavior. The problem of partitioning a transformative application specification to a set of available hardware (HW) and softw...
详细信息
ISBN:
(纸本)1581139373
Transformative applications are a class of dataflow computation characterized by iterative behavior. The problem of partitioning a transformative application specification to a set of available hardware (HW) and software (SW) processing elements (PEs) and derivation of a job execution order (scheduling) on them has been quite well studied, but the problem of obtaining fast simulation of these applications poses different constraints. In this paper, we propose an efficient framework for a symmetric multi-processor (SMP) simulation host to achieve fast HW/SW co-simulation for transformative applications, given the partition solutions and the derived schedulers. The framework overcomes the limitations in existing Linux SMP kernel and requires only a reasonable amount of modifications to it. We also present a heuristic algorithm which effectively assigns simulation tasks to the processors on the simulation host, considering both average job simulation time on each processor and other simulation overhead. Our experiments show that the algorithm is able to find satisfactory suboptimal solutions with very little computation time. Based on the task assignment solution, the simulation time can be reduced by 25% to 50% from the obvious but naive approach.
This article proposes a hardware/software partitioning method targeted to performance-constrained systems for datapath applications. Exploiting a platform based design, a Timed Petri Net formalism is proposed to repre...
详细信息
ISBN:
(纸本)9781605584706
This article proposes a hardware/software partitioning method targeted to performance-constrained systems for datapath applications. Exploiting a platform based design, a Timed Petri Net formalism is proposed to represent the mapping of the application onto the platform, allowing to statically extract performance estimations in early phases of the design process and without the need of expensive simulations. The mapping process is generalized in order to allow an automatic exploration of the solution space, that identifies the best performance/area configurations among several application-architecture combinations. The method is evaluated implementing a typical datapath performance constrained system, i.e. a packet processing application. Copyright 2008 ACM.
One of the key steps in Network-on-Chip (NoC) based design is spatial mapping of cores and routing of the communication between those cores. Known solutions to the mapping and routing problem first map cores onto a to...
详细信息
ISBN:
(纸本)1595931619
One of the key steps in Network-on-Chip (NoC) based design is spatial mapping of cores and routing of the communication between those cores. Known solutions to the mapping and routing problem first map cores onto a topology and then route communication, using separated and possibly conflicting objective functions. In this paper we present a unified single-objective algorithm, called Unified MApping, Routing and Slot allocation (UMARS). As the main contribution we show how to couple path selection, mapping of cores and TDMA time-slot allocation such that the network required to meet the constraints of the application is minimized. The time-complexity of UMARS is low and experimental results indicate a run-time only 20% higher than that of path selection alone. We apply the algorithm to an MPEG decoder system-on-Chip (SoC), reducing area by 33%, power by 35% and worst-case latency by a factor four over a traditional multi-step approach.
In this paper, we propose a storage engine for MongoDB to accelerate the queries and reduce the memory usage. An FPGA-based query accelerator is deployed to speed up the queries while hot data is migrated from memory ...
详细信息
ISBN:
(纸本)9781728191980
In this paper, we propose a storage engine for MongoDB to accelerate the queries and reduce the memory usage. An FPGA-based query accelerator is deployed to speed up the queries while hot data is migrated from memory to SSD to reduce memory occupancy by our storage engine. Moreover, multiple query tasks of MongoDB are performed in parallel and query conditions are parameterized to support diversified queries. Based on TPC-H benchmark and Tencent data set, experimental results demonstrate that our storage engine can achieve higher query efficiency (saving up to 63.5% time overhead) and lower memory occupancy (reducing up to 73.4% memory usage) compared with traditional MongoDB.
This paper presents an approach to system-level optimization of error detection implementation in the context of fault-tolerant realtime distributed embedded systems used for safety-critical applications. An applicati...
详细信息
ISBN:
(纸本)9781605589053
This paper presents an approach to system-level optimization of error detection implementation in the context of fault-tolerant realtime distributed embedded systems used for safety-critical applications. An application is modeled as a set of processes communicating by messages. Processes are mapped on computation nodes connected to the communication infrastructure. To provide resiliency against transient faults, efficient error detection and recovery techniques have to be employed. Our main focus in this paper is on the efficient implementation of the error detection mechanisms. We have developed techniques to optimize the hardware/software implementation of error detection, in order to minimize the global worst-case schedule length, while meeting the imposed hardware cost constraints and tolerating multiple transient faults. We present two design optimization algorithms which are able to find feasible solutions given a limited amount of resources: the first one assumes that, when implemented in hardware, error detection is deployed on static reconfigurable FPGAs, while the second one considers partial dynamic reconfiguration capabilities of the FPGAs.
This paper presents a correct-by-construction synthesis method for generating operating system based device drivers from a formally specified device behavior model. Existing driver development is largely manual using ...
详细信息
ISBN:
(纸本)1581137427
This paper presents a correct-by-construction synthesis method for generating operating system based device drivers from a formally specified device behavior model. Existing driver development is largely manual using an ad-hoc design methodology. Consequently, this task is error prone and becomes a bottleneck in embedded system design methodology. Our solution to this problem starts by accurately specifying device access behavior with a formal model, viz. extended event driven finite state machines. We state easy to check soundness conditions on the model that subsequently guarantee properties such as bounded execution time and deadlock-free behavior. We design a deadlock-free resource accessing scheme for our device access model. Finally, we synthesize an operating system (OS) based event processing mechanism, which is the core of the device driver, using a disciplined methodology that assures the correctness of the resulting driver. We validate our synthesis method using two case studies: an infrared port and the USB device controller for an SA I 100 based handheld. Besides assuring a correct-by -construction driver, the size of the specification is 70% smaller than a manually written driver, which is a strong indicator of improved design productivity.
We present a co-synthesis approach that accelerates reactive software processing by moving the calculation of complex expressions into external combinational hardware. The starting point is a system model written in t...
详细信息
ISBN:
(纸本)9781595938244
We present a co-synthesis approach that accelerates reactive software processing by moving the calculation of complex expressions into external combinational hardware. The starting point is a system model written in the synchronous language Esterel, which can be mapped to both hardware and software. Our approach performs the partitioning at the source-code level and preserves the original, strictly synchronous semantics. It is thus platform-independent and allows to use standard simulation and synthesis tools. Furthermore, the source-level partitioning approach presented here should be applicable to non-reactive processing platforms as well. However, the challenge is to partition the program without changing its meaning under any circumstances. In particular, signal scopes and inter-partition signal dependencies must be maintained, which rules out a naive top-level partitioning. We have implemented the co-synthesis approach based on the Columbia Esterel Compiler and have validated it on the Kiel Esterel Processor. As the experimental results confirm, this can significantly reduce execution times and energy consumption per reaction, with minimal additional hardware requirements. Copyright 2007 ACM.
暂无评论