the constant increase of gate capacity and performance of configurable hardware chips made it possible to implement systems-on-chip (SoC) able to tackle the demanding requirements of many embedded systems. In this pap...
详细信息
ISBN:
(纸本)0780394518
the constant increase of gate capacity and performance of configurable hardware chips made it possible to implement systems-on-chip (SoC) able to tackle the demanding requirements of many embedded systems. In this paper, we propose an approach to the design space exploration of a configurable SoC (CSoC) platform based on a network on chip (NoC) architecture for the execution of dataflow dominated embedded systems. the approach has been validated withthe design of a color image compression algorithm in an FPGA.
Post-link and dynamic optimizations have become important to achieve program performance. A major challenge in post-link and dynamic optimizations is the acquisition of registers for inserting optimization code in the...
详细信息
ISBN:
(纸本)3540400567
Post-link and dynamic optimizations have become important to achieve program performance. A major challenge in post-link and dynamic optimizations is the acquisition of registers for inserting optimization code in the main program. It is difficult to achieve both correctness and transparency when software-only schemes for acquiring registers are used, as described in [1]. We propose an architecture feature that builds upon existing hardware for stacked register allocation on the Itanium processor. the hardware impact of this feature is minimal, while simultaneously allowing post-link and dynamic optimization systems to obtain registers for optimization in a "safe" manner, thus preserving the transparency and improving the performance of these systems.
New trends in the space industry, e.g. the development of wireless networked constellations using miniaturized satellites, have generated a pressing need for condition-based maintenance, self-repair and upgrade capabi...
详细信息
ISBN:
(纸本)3540400567
New trends in the space industry, e.g. the development of wireless networked constellations using miniaturized satellites, have generated a pressing need for condition-based maintenance, self-repair and upgrade capabilities on-board satellites. this can be achieved by using reconfigurable hardware technologies, such as high-density Field Programmable Gate Arrays, implementing an entire on-board computer on a single chip. In this paper we present a system-on-chip architecture for on-board partial run-time reconfiguration to enable system-level functional changes on-board satellites ensuring correct operation, longer life and higher quality of service.
this paper introduces LEAP(Loop Engine on Array Processor), a novel coarse-grained reconfigurable architecture which accelerates applications through Loop Self-Pipelining (LSP) technique. the LSP can provide effective...
详细信息
ISBN:
(纸本)3540400567
this paper introduces LEAP(Loop Engine on Array Processor), a novel coarse-grained reconfigurable architecture which accelerates applications through Loop Self-Pipelining (LSP) technique. the LSP can provide effective execution mode for application pipelining. By mapping and distributing the expression statements of high level programming languages onto processing elements array, the LEAP can step the loop iteration automatically. the LEAP architecture has no centralized control, no centralized multi-port registers and no centralized data memory. the LEAP has the ability to exploit loop-level, instruction-level, and task-level parallelism, and it is suitable choice for stream-based application domains, such as multimedia, DSP and graphics application.
Instruction hints have become an important way to communicate compile-time information to the hardware. they can be generated by the compiler and the post-link optimizer to reduce cache misses, improve branch predicti...
详细信息
ISBN:
(纸本)3540400567
Instruction hints have become an important way to communicate compile-time information to the hardware. they can be generated by the compiler and the post-link optimizer to reduce cache misses, improve branch prediction and minimize other performance bottlenecks. this paper discusses different instruction hints available on modern processor architectures and shows the potential performance impact on many benchmark programs. Some hints can be effectively selected at compile time with profile feedback. However, since the same program executable can behave differently on various inputs and performance bottlenecks may change on different micro-architectures, significant performance opportunities can be exploited by selecting instruction hints dynamically.
In SoC designs, efficient communication between the hardware IPs and the on-chip processor becomes very important, however the interface is usually affacted by the processor core specification. thus in this paper, we ...
详细信息
ISBN:
(纸本)0780394518
In SoC designs, efficient communication between the hardware IPs and the on-chip processor becomes very important, however the interface is usually affacted by the processor core specification. thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip processor and embedded hardware IP cores. we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.
In this paper, an open performance model framework PMPS(n) and a realization of this framework PMPS(3), including memory, I/O and network, are presented and used to predict runtime of NPB benchmarks on P4 cluster. the...
详细信息
ISBN:
(纸本)3540400567
In this paper, an open performance model framework PMPS(n) and a realization of this framework PMPS(3), including memory, I/O and network, are presented and used to predict runtime of NPB benchmarks on P4 cluster. the experimental results demonstrates that PMPS(3) can work much better than PERC for I/O intensive applications, and can do as well as PERC for memory-intensive applications. through further analysis, it is indicated that the results of the performance model can be influenced by the data correlations, control correlations and operation overlaps and which must be considered in the models to improve the prediction precision. the experimental results also showed that PMPS(n) be of great scalability.
Bypass delays are expected to grow beyond 1ns as technology scales. these delays necessitate pipelining of bypass paths at processor frequencies above 1GHz and thus affect the performance of sequential code sequences....
详细信息
ISBN:
(纸本)3540400567
Bypass delays are expected to grow beyond 1ns as technology scales. these delays necessitate pipelining of bypass paths at processor frequencies above 1GHz and thus affect the performance of sequential code sequences. We propose dealing withthese delays through a dynamic functional unit chaining approach. We study the performance benefits of a superscalar, out-of-order processor augmented with a two-by-two array of ALUs interconnected by a fast, partial bypass network. An online profiler guides the automatic configuration of the network to accelerate specific patterns of dependent instructions. A detailed study of benchmark simulations demonstrates these first steps towards mapping binaries to a small coarse-grained array at runtime can improve instruction throughput by over 18% and 25% when the microarchitecure includes bypass delays of one cycle and two cycles, respectively.
Memory bandwidth and interface flexibility are often bottlenecks of embedded processors. the research about memory bandwidth optimization has become a hot topic. this paper introduces four new bandwidth optimization m...
详细信息
ISBN:
(纸本)3540400567
Memory bandwidth and interface flexibility are often bottlenecks of embedded processors. the research about memory bandwidth optimization has become a hot topic. this paper introduces four new bandwidth optimization methods for External Memory Control Interface (EMCI) integrated in high performance digit signal processors (DSP), and aims at realization of the maximum throughput of data transmission and architecture flexibility, i.e. programmable and decoupled structure, pipelined transmission of burst mode, programmable priority for arbitration, and preferential reading based on cache-line offset. the experiment results show that the performance improvement is remarkable, but different for synchronous and asynchronous memories, and depends on the application behavior. the decoupled structure proves to be of great benefit to the architectural exploration and optimization for DSPs.
Existing ADLs (architecture Description Languages) have an advantage of formally specifying the architecture of component-based systems. But ADLs have not come into extensive use in industries since ADL users should l...
详细信息
ISBN:
(纸本)0769522459
Existing ADLs (architecture Description Languages) have an advantage of formally specifying the architecture of component-based systems. But ADLs have not come into extensive use in industries since ADL users should learn a distinct notation specific to architecture, and ADLs do not address all stakes of development process that is becoming diversified everyday. On the other hand, UML is a de facto standard general modeling language for software developments as UML provides a consistent notation and various supporting tools during the whole software development cycle. A number of researches on architecture modeling based on UML have been progressed. In particular, many research results have been introduced that specialize UML by its extension mechanism in order to explicitly represent core architecture concepts that UML does not fully support.. UML2.0 embraces much more concepts that are important to architecture modeling than UML1.x. In this paper, we examine architecture modeling elements that can be represented in UML2.0 and discuss how to extend and specialize UML2.0 in order to make it more suitable for representing architectures.
暂无评论