A scalable, distributed micro-architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of...
详细信息
ISBN:
(纸本)3540364102
A scalable, distributed micro-architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. the architecture is based on a superscalar processor model with out-of-order execution, that supports specialized, complex DSP function units, and simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks, forming a "network-on-a-chip" (NOC) [1]. the communication protocol is a modified version of Tomasulo's scheme [2], that was extended to eliminate all central control structures for the data flow and to support multithreading. the performance of the architecture is scalable with boththe number of function units and the number of thread units without having any impact on the processors cycle-time.
We present a highly efficient automated clock gating platform for rapidly developing power efficient hardware architectures. Our language, called CoDeL, allows hardware description at the algorithm level, and thus dra...
详细信息
ISBN:
(纸本)3540364102
We present a highly efficient automated clock gating platform for rapidly developing power efficient hardware architectures. Our language, called CoDeL, allows hardware description at the algorithm level, and thus dramatically reduces design time. We have extended CoDeL to automatically insert clock gating at the behavioral level to reduce dynamic power dissipation in the resulting architecture. this is, to our knowledge, the first hardware design environment that allows an algorithmic description of a component and yet produces a power aware design. To estimate the power savings, we have developed an estimation framework, which is shown to be consistent withthe power savings obtained using statistical power analysis using Synopsys tools. To evaluate our platform we use the CoDeL implementation of a counter and various integer transforms used in the realm of DSP (Digital Signal Processing): discrete wavelet transform, discrete cosine transform and an integer transform used in the H.264 (MPEG4 Part 10) video compression standard. these designs are then clock gated using CoDeL and Synopsys. A simulation based power analysis on the designed circuits shows that CoDeL's clock gating performs better than Synopsys' automated clock gating. CoDeL reduces the power dissipation by 83% on average, while Synopsys gives 81% savings.
the cache memory plays a crucial role in the performance of any processor. the cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor pe...
详细信息
ISBN:
(纸本)1424401550
the cache memory plays a crucial role in the performance of any processor. the cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor performance and speed. Also the cache consumes much less energy than the main memory. that leads to a huge power saving which is very important for embedded applications. In today's processors, although the cache memory reduces the energy consumption of the processor, however the energy consumption in the on-chip cache account to almost 40% of the total energy consumption of the processor. In this paper, we propose a cache architecture, for the instruction cache, that is a modification of the hotspot architecture. Our proposed architecture consists of a small filter cache in parallel withthe hotspot cache, between the L1 cache and the main memory. the small filter cache is to hold the code that was not captured by the hotspot cache. We also propose a prediction mechanism to steer the memory access to either the hotspot cache, the filter cache, or the L1 cache. Our design has both a faster access time and less energy consumption compared to boththe filter cache and the hotspot cache architectures. We use Mibench and Mediabench benchmarks, together withthe simplescalar simulator in order to evaluate the performance of our proposed architecture and compares it withthe filter cache and the hotspot cache architectures. the simulation results show that our design outperforms boththe filter cache and the hotspot cache in boththe average memory access time and the energy consumption.
the development of more and more complex embeddedsystems constitutes a very challenging task for EDA experts, due to their HW/SW-mixed nature joint to the high demand for quality and reliability. Recently, both indus...
详细信息
ISBN:
(纸本)3540343040
the development of more and more complex embeddedsystems constitutes a very challenging task for EDA experts, due to their HW/SW-mixed nature joint to the high demand for quality and reliability. Recently, both industrial engineers and academic researchers have developed a very large number of techniques for dynamic verification in terms of co-simulation, which, in particular, address the different nature of hardware and software components of an embedded system. However, a widely accepted methodology does not exist. thus, this paper is intended to provide a general view on simulation-based modeling and verification strategies for developing embeddedsystems. In particular, the paper is focussed on describing state-of-the art co-simulation approaches and verification strategies based on fault simulation and assertion checking.
Tomorrow's embedded devices need to run high-resolution multimedia applications which need an enormous computational complexity with a very low energy consumption constraint. In this context, the register file is ...
详细信息
ISBN:
(纸本)3540390944
Tomorrow's embedded devices need to run high-resolution multimedia applications which need an enormous computational complexity with a very low energy consumption constraint. In this context, the register file is one of the key sources of power consumption and its inappropriate design and management can severely affect the performance of the system. In this paper, we present a new approach to reduce the energy of the shared register file in upcoming embedded VLIW architectures with several processing units. Energy savings up to a 60% can be obtained in the register file without any performance penalty. It is based on a set of hardware extensions and a compiler-based energy-aware register assignment algorithm that enable the de/activation of parts of the register file (i.e. sub-banks) in an independent way at run-time, which can be easily included in these embeddedarchitectures.
the ability to enhance single-thread performance, such as by increasing clock frequency, is reaching a point of diminishing returns: power is becoming a dominating factor and limiting scalability. Adding additional co...
详细信息
ISBN:
(纸本)9780769526270
the ability to enhance single-thread performance, such as by increasing clock frequency, is reaching a point of diminishing returns: power is becoming a dominating factor and limiting scalability. Adding additional cores is a scalable way to increase performance, but it requires that system designers have a method for developing multi-threaded applications. Plasma, (Parallel LAnguage for System modeling and Analysis) is a parallel language for system modeling and multi-threaded application development implemented as a superset of C++. the language extensions are based upon those found in Occam, which is based upon CSP (Communicating Sequential Processes) by C. A. R. Hoare. the goal of the Plasma project is to investigate whether a language withthe appropriate constructs might be used to ease the task of developing highly multi-threaded software. In addition, through the inclusion of a discrete event simulation API, we seek to simplify the task of system modeling and increase productivity through clearer representation and increased compile-time checking of the more difficult-to-get-right aspects of systems models (the concurrency). the result is a single language which allows users to develop a parallel application and then to model it within the context of a system, allowing for hardware-software partitioning and various other early tradeoff analyses. We believe that this language offers a simpler and more concise syntax than other offerings and can be targeted at a large range of potential architectures, including heterogeneous systems and those without shared memory.
the Dynamic Data Driven Application systems (DDDAS) concept entails the ability to incorporate dynamically data into an executing application simulation, and in reverse, the ability of applications to dynamically stee...
详细信息
ISBN:
(纸本)3540343830
the Dynamic Data Driven Application systems (DDDAS) concept entails the ability to incorporate dynamically data into an executing application simulation, and in reverse, the ability of applications to dynamically steer measurement processes. Such dynamic data inputs can be acquired in real-time on-line or they can be archival data. DDDAS offers the promise of improving modeling methods, augmenting the analysis and prediction capabilities of application simulations, improving the efficiency of simulations and the effectiveness of measurement systems. the scope of the present workshop provides examples of research and technology advances in enabling the DDDAS capabilities.
Currently there is a diversity of tools for agent-based simulation, which can be applied to the understanding of social phenomena. Describing this kind of phenomena with a visual language can facilitate the use of the...
详细信息
ISBN:
(纸本)3540333800
Currently there is a diversity of tools for agent-based simulation, which can be applied to the understanding of social phenomena. Describing this kind of phenomena with a visual language can facilitate the use of these tools by users who are not necessarily experts in computer programming, but in social sciences. Withthis purpose, we propose to define such visual language, which is based on well established concepts of agent-oriented software engineering, and more concretely on the INGENIAS methodology. the proposed language is independent of any particular simulation platform and, by using INGENIAS code generation support, it is possible to generate implementations for the desired target platforms. Also, we consider that modeling should be application domain oriented and that a generic language itself does not suffice. thus, we discuss at the end how specific domain simulation environments could be achieved.
the search for energy efficiency in the design of embeddedsystems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient paralle...
详细信息
ISBN:
(纸本)3540390944
the search for energy efficiency in the design of embeddedsystems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient parallelism to keep all these CPU resources busy. Since embeddedsystems often consist of multiple tasks, task-level parallelism can be used for the purpose. Simultaneous multi-threading (SMT) proved a valuable technique to do so in high-performance systems, but it cannot be afforded in system with tight energy budgets. Moreover, it does not exploit data-level parallel hardware, and does not exploit the available information on threads. We propose software-SMT (SW-SMT), a technique to exploit task-level parallelism to improve the utilization of both instruction-level and data-level parallel hardware, thereby improving performance. the technique performs simultaneous compilation of multiple threads at design-time, and it includes a run-time selection of the most efficient mixes. We have applied the technique to two major blocks of a SDR (software-defined radio) application, achieving energy gains up to 46% on different ILP and DLP architectures. We show that the potentials of SW-SMT increase with SIMD datapath size and VLIW issue width.
暂无评论