Modern superscalar processors execute instructions out-of-order, but commit them in program order to provide precise exception handling and safe instruction retirement. However, in-order instruction commit is highly c...
详细信息
ISBN:
(纸本)9781450383172
Modern superscalar processors execute instructions out-of-order, but commit them in program order to provide precise exception handling and safe instruction retirement. However, in-order instruction commit is highly conservative and holds on to critical resources far longer than necessary, severely limiting the reach of general-purpose processors, ultimately reducing performance. Solutions that allow for efficient, early reclamation of these critical resources could seize the opportunity to improve performance. One such solution is out-of-order commit, which has traditionally been challenging due to inefficient, complex hardware used to guarantee safe instruction retirement and provide precise exception handling. In this work, we present NOREBA, a processor for Non-speculative out-of-order Retirement via Branch Re convergence Analysis. In NOREBA, we enable non-speculative out-of-order commit and resource reclamation in a light-weight manner, improving performance and efficiency. We accomplish this through a combination of (1) automatic compiler annotation of true branch dependencies, and (2) an efficient re-design of the reorder buffer from traditional processors. By exploiting compiler branch dependency information, this system achieves 95% of the performance of aggressive, speculative solutions, without any additional speculation, and while maintaining energy efficiency.
out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequent...
详细信息
out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instruction executionin other words, in-order instruction commit. While in-ordercommit has a number of advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, physical registers) until they are released in program order. In contrast, out-of-order commit can release some resources much earlier, yielding improved performance and/or lower resource requirements. Non-speculative out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti (2004). In this paper we revisit out-of-order commit by examining the potential performance benefits of lifting these conditions one by one and in combination, for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. Through this analysis of the potential of out-of-order commit, we learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit;b) it is important to optimize the out-of-order commit depth for a balanced design, as smaller cores benefit from reduced depth while larger cores continue to benefit from deeper designs;c) the focus on implementing only a subset of the out-of-order commit conditions could lead to efficient implementations;d) the benefits of out-of-order commit increases with higher memory latency and in conjunction with prefetching;e) out-of-order commi
Modern out-of-order processors call for more aggressive scheduling techniques such as priority scheduling and out-of-order commit to make use of increasing core resources. Since these approaches prioritize the issue o...
详细信息
ISBN:
(纸本)9798400700958
Modern out-of-order processors call for more aggressive scheduling techniques such as priority scheduling and out-of-order commit to make use of increasing core resources. Since these approaches prioritize the issue or commit of certain instructions, they face the conundrum of providing the capacity efficiency of scheduling structures while preserving the ideal ordering of instructions. Traditional collapsible queues are too expensive for today's processors, while state-of-the-art queue designs compromise with the pseudo-ordering of instructions, leading to performance degradation as well as other limitations. In this paper, we present Orinoco, a microarchitecture/circuit co-design that supports ordered issue and unordered commit with non-collapsible queues. We decouple the temporal ordering of instructions from their queue positions by introducing an age matrix with the bit count encoding, along with a commit dependency matrix and a memory disambiguation matrix to determine instructions to prioritize issue or commit. We leverage the Processingin-Memory (PIM) approach and efficiently implement the matrix schedulers as 8T SRAM arrays. Orinoco achieves an average IPC improvement of 14.8% over the baseline in-ordercommit core with the state-of-the-art scheduler while incurring overhead equivalent to a few kilobytes of SRAM.
This paper proposes an out-of-order instruction commit mechanism using a novel compiler/architecture interface. The compiler creates instruction "blocks" guaranteeing some commit conditions and the processor...
详细信息
This paper proposes an out-of-order instruction commit mechanism using a novel compiler/architecture interface. The compiler creates instruction "blocks" guaranteeing some commit conditions and the processor uses the block information to commit certain instructions out of order. Micro-architectural support for the new commit mode is made on top of the standard, ROB-based processor and includes out-of-order instruction commit with register and load queue entry release. The commit mode may be switched multiple times during execution. Initial results for a 4-wide processor show that, on average, 52% instructions are committed out of order resulting in 10% to 26% speedups over in-ordercommit, with minimal hardware overhead. The performance improvement is a result of an effectively larger instruction window that allows more cache misses to be overlapped for both L1 and L2 caches.
In Total Store order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In arch...
详细信息
ISBN:
(纸本)9781450348928
In Total Store order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In architectures with an unordered interconnection network and directory coherence, this has been the established view for decades. We show, for the first time, that it is not necessary to squash and re-execute speculatively reordered loads in TSO when their reordering is seen. Instead, the reordering can be hidden form other cores by the coherence protocol. The implication is that we can irrevocably bind speculative loads. This allows us to commit reordered loads out-of-order without having to wait (for the loads to become non-speculative) or without having to checkpoint committed state (and rollback if needed), just to ensure correctness in the rare case of some core seeing the reordering. We show that by exposing a reordering to the coherence layer and by appropriately modifying a typical directory protocol we can successfully hide load-load reordering without perceptible performance cost and without deadlock. Our solution is cost-effective and increases the performance of out-of-order commit by a sizable margin, compared to the base case where memory operations are not allowed to commit if the consistency model could be violated.
Current superscalar processors commit instructions in program order by using a reorder buffer ( ROB). The ROB provides support for speculation, precise exceptions, and register reclamation. However, committing instruc...
详细信息
Current superscalar processors commit instructions in program order by using a reorder buffer ( ROB). The ROB provides support for speculation, precise exceptions, and register reclamation. However, committing instructions in program order may lead to significant performance degradation if a long latency operation blocks the ROB head. Several proposals have been published to deal with this problem. Most of them retire instructions speculatively. However, as speculation may fail, checkpoints are required in order to rollback the processor to a precise state, which requires both extra hardware to manage checkpoints and the enlargement of other major processor structures, which, in turn, might impact the processor cycle. This paper focuses on out-of-order commit in a nonspeculative way, thus, avoiding checkpointing. To this end, we replace the ROB with a validation buffer (VB) structure. This structure keeps dispatched instructions until they are nonspeculative or mispeculated, which allows an early retirement. By doing so, the performance bottleneck is largely alleviated. An aggressive register reclamation mechanism targeted to this microarchitecture is also devised. As experimental results show, the VB structure is much more efficient than a typical ROB since, with only 32 entries, it achieves a performance close to an in-ordercommit microprocessor using a 256-entry ROB.
暂无评论