检索结果-内蒙古大学图书馆

26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

作者： Hajiabadi, Ali Diavastos, Andreas Carlson, Trevor E. Natl Univ Singapore Singapore Singapore Univ Politecn Cataluna Barcelona Spain

ISBN: (纸本)9781450383172

Modern superscalar processors execute instructions out-of-order, but commit them in program order to provide precise exception handling and safe instruction retirement. However, in-order instruction commit is highly conservative and holds on to critical resources far longer than necessary, severely limiting the reach of general-purpose processors, ultimately reducing performance. Solutions that allow for efficient, early reclamation of these critical resources could seize the opportunity to improve performance. One such solution is out-of-order commit, which has traditionally been challenging due to inefficient, complex hardware used to guarantee safe instruction retirement and provide precise exception handling. In this work, we present NOREBA, a processor for Non-speculative out-of-order Retirement via Branch Re convergence Analysis. In NOREBA, we enable non-speculative out-of-order commit and resource reclamation in a light-weight manner, improving performance and efficiency. We accomplish this through a combination of (1) automatic compiler annotation of true branch dependencies, and (2) an efficient re-design of the reorder buffer from traditional processors. By exploiting compiler branch dependency information, this system achieves 95% of the performance of aggressive, speculative solutions, without any additional speculation, and while maintaining energy efficiency.

关键词： out-of-order commit compilers hardware-software co-design processor design

来源：评论

学校读者我要写书评

暂无评论

Maximizing Limited Resources: a Limit-Based Study and Taxonomy of out-of-order commit

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2019年第3-4期91卷 379-397页

作者： Alipour, Mehdi Carlson, Trevor E. Black-Schaffer, David Kaxiras, Stefanos Uppsala Univ Dept Informat Technol Uppsala Sweden NUS Dept Comp Sci Singapore Singapore

out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instruction executionin other words, in-order instruction commit. While in-order commit has a number of advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, physical registers) until they are released in program order. In contrast, out-of-order commit can release some resources much earlier, yielding improved performance and/or lower resource requirements. Non-speculative out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti (2004). In this paper we revisit out-of-order commit by examining the potential performance benefits of lifting these conditions one by one and in combination, for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. Through this analysis of the potential of out-of-order commit, we learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit;b) it is important to optimize the out-of-order commit depth for a balanced design, as smaller cores benefit from reduced depth while larger cores continue to benefit from deeper designs;c) the focus on implementing only a subset of the out-of-order commit conditions could lead to efficient implementations;d) the benefits of out-of-order commit increases with higher memory latency and in conjunction with prefetching;e) out-of-order commi

关键词： Superscalar processors out-of-order commit Performance evaluation Memory hierarchy parallelism

来源：评论

学校读者我要写书评

暂无评论

Orinoco: ordered Issue and Unordered commit with Non-Collapsible Queues 23

Orinoco: Ordered Issue and Unordered Commit with Non-Collaps...

引用

50th Annual International Symposium on Computer Architecture (ISCA)

作者： Chen, Dibei Zhang, Tairan Huang, Yi Zhu, Jianfeng Liu, Yang Gou, Pengfei Feng, Chunyang Li, Binghua Wei, Shaojun Liu, Leibo Tsinghua Univ Beijing Peoples R China Innovat Inst High Performance Server Beijing Peoples R China HEXIN Technol Xiamen Peoples R China

ISBN: (纸本)9798400700958

Modern out-of-order processors call for more aggressive scheduling techniques such as priority scheduling and out-of-order commit to make use of increasing core resources. Since these approaches prioritize the issue or commit of certain instructions, they face the conundrum of providing the capacity efficiency of scheduling structures while preserving the ideal ordering of instructions. Traditional collapsible queues are too expensive for today's processors, while state-of-the-art queue designs compromise with the pseudo-ordering of instructions, leading to performance degradation as well as other limitations. In this paper, we present Orinoco, a microarchitecture/circuit co-design that supports ordered issue and unordered commit with non-collapsible queues. We decouple the temporal ordering of instructions from their queue positions by introducing an age matrix with the bit count encoding, along with a commit dependency matrix and a memory disambiguation matrix to determine instructions to prioritize issue or commit. We leverage the Processingin-Memory (PIM) approach and efficiently implement the matrix schedulers as 8T SRAM arrays. Orinoco achieves an average IPC improvement of 14.8% over the baseline in-order commit core with the state-of-the-art scheduler while incurring overhead equivalent to a few kilobytes of SRAM.

关键词： microarchitecture out-of-order execution instruction scheduling out-of-order commit processing-in-memory

来源：评论

学校读者我要写书评

暂无评论

Compiler-Assisted, Selective out-of-order commit

IEEE COMPUTER ARCHITECTURE LETTERS

引用

IEEE COMPUTER ARCHITECTURE LETTERS 2013年第1期12卷 21-24页

作者： Duong, Nam Veidenbaum, Alexander V. Univ Calif Irvine Dept Comp Sci Irvine CA 92717 USA

This paper proposes an out-of-order instruction commit mechanism using a novel compiler/architecture interface. The compiler creates instruction "blocks" guaranteeing some commit conditions and the processor uses the block information to commit certain instructions out of order. Micro-architectural support for the new commit mode is made on top of the standard, ROB-based processor and includes out-of-order instruction commit with register and load queue entry release. The commit mode may be switched multiple times during execution. Initial results for a 4-wide processor show that, on average, 52% instructions are committed out of order resulting in 10% to 26% speedups over in-order commit, with minimal hardware overhead. The performance improvement is a result of an effectively larger instruction window that allows more cache misses to be overlapped for both L1 and L2 caches.

关键词： out-of-order commit resource release overlapping cache misses architecture/compiler co-design

来源：评论

学校读者我要写书评

暂无评论

Non-Speculative Load-Load Reordering in TSO 17

Non-Speculative Load-Load Reordering in TSO

引用

44th Annual International Symposium on Computer Architecture (ISCA)

作者： Ros, Alberto Carlson, Trevor E. Alipour, Mehdi Kaxiras, Stefanos Univ Murcia Dept Comp Engn Murcia Spain Uppsala Univ Dept Informat Technol Uppsala Sweden

ISBN: (纸本)9781450348928

In Total Store order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In architectures with an unordered interconnection network and directory coherence, this has been the established view for decades. We show, for the first time, that it is not necessary to squash and re-execute speculatively reordered loads in TSO when their reordering is seen. Instead, the reordering can be hidden form other cores by the coherence protocol. The implication is that we can irrevocably bind speculative loads. This allows us to commit reordered loads out-of-order without having to wait (for the loads to become non-speculative) or without having to checkpoint committed state (and rollback if needed), just to ensure correctness in the rare case of some core seeing the reordering. We show that by exposing a reordering to the coherence layer and by appropriately modifying a typical directory protocol we can successfully hide load-load reordering without perceptible performance cost and without deadlock. Our solution is cost-effective and increases the performance of out-of-order commit by a sizable margin, compared to the base case where memory operations are not allowed to commit if the consistency model could be violated.

关键词： Cache coherence memory consistency TSO load reordering out-of-order commit

来源：评论

学校读者我要写书评

暂无评论

A Complexity-Effective out-of-order Retirement Microarchitecture

引用

IEEE TRANSACTIONS ON COMPUTERS 2009年第12期58卷 1626-1639页

作者： Petit Marti, Salvador Sahuquillo Borras, Julio Lopez Rodriguez, Pedro Ubal Tena, Rafael Duato Marin, Jose Univ Politecn Valencia Dept Informat Sistemas & Computadores Valencia 46021 Spain Univ Politecn Valencia Escuela Tecn Super Ingn Informat Grp Arquitecturas Paralelas Valencia 46021 Spain

Current superscalar processors commit instructions in program order by using a reorder buffer ( ROB). The ROB provides support for speculation, precise exceptions, and register reclamation. However, committing instructions in program order may lead to significant performance degradation if a long latency operation blocks the ROB head. Several proposals have been published to deal with this problem. Most of them retire instructions speculatively. However, as speculation may fail, checkpoints are required in order to rollback the processor to a precise state, which requires both extra hardware to manage checkpoints and the enlargement of other major processor structures, which, in turn, might impact the processor cycle. This paper focuses on out-of-order commit in a nonspeculative way, thus, avoiding checkpointing. To this end, we replace the ROB with a validation buffer (VB) structure. This structure keeps dispatched instructions until they are nonspeculative or mispeculated, which allows an early retirement. By doing so, the performance bottleneck is largely alleviated. An aggressive register reclamation mechanism targeted to this microarchitecture is also devised. As experimental results show, the VB structure is much more efficient than a typical ROB since, with only 32 entries, it achieves a performance close to an in-order commit microprocessor using a 256-entry ROB.

关键词： Instruction-level parallelism out-of-order commit long latency operations control dependencies exception handling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：