This paper proposes an out-of-order instruction commit mechanism using a novel compiler/architecture interface. The compiler creates instruction "blocks" guaranteeing some commit conditions and the processor...
详细信息
This paper proposes an out-of-order instruction commit mechanism using a novel compiler/architecture interface. The compiler creates instruction "blocks" guaranteeing some commit conditions and the processor uses the block information to commit certain instructions out of order. Micro-architectural support for the new commit mode is made on top of the standard, ROB-based processor and includes out-of-order instruction commit with register and load queue entry release. The commit mode may be switched multiple times during execution. Initial results for a 4-wide processor show that, on average, 52% instructions are committed out of order resulting in 10% to 26% speedups over in-order commit, with minimal hardware overhead. The performance improvement is a result of an effectively larger instruction window that allows more cache misses to be overlapped for both L1 and L2 caches.
New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to m...
详细信息
New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to meet demanding challenges of increasing computational requirements and stringent constraints on area and power consumption in fields of embedded engineering, there is a gradual trend towards coarse-grained parallel embedded processors. Furthermore, such processors are enabled with dynamic reconfiguration features for supporting time- and space-multiplexed execution of the algorithms. However, the formidable problem in efficient mapping of applications (mostly loop algorithms) onto such architectures has been a hindrance in their mass acceptance. In this paper we present (a) a highly parameterizable, tightly coupled, and reconfigurable parallel processor architecture together with the corresponding power breakdown and reconfiguration time analysis of a case study application, (b) a retargetable methodology for mapping of loop algorithms, (c) a co-design framework for modeling, simulation, and programming of such architectures, and (d) loosely coupled communication with host processor. (C) 2008 Elsevier B.V. All rights reserved.
With the term architecture/compilerco-exploration, we denote the problem of simultaneously optimizing an application-specific instruction set processor (ASIP) architecture as well as its generated compiler. In this p...
详细信息
With the term architecture/compilerco-exploration, we denote the problem of simultaneously optimizing an application-specific instruction set processor (ASIP) architecture as well as its generated compiler. In this paper, we characterize the design space of both compiler frontend (intermediate code optimization) and backend (changes of the machine model) and present the workflow of our framework BUILDABONG. The project consists of four phases: (a) architecture entry and composition, (b) automatic simulator generation, (c) compiler generation (in particular, retargeting), and (d) automatic architecture/compilerdesign space exploration. We demonstrate the feasibility of our approach by a detailed case study.
暂无评论