检索结果-内蒙古大学图书馆

IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)

作者： Wada, Koichi Kawaguchi, Shinsuke Ono, Masaaki Yonezawa, Naoki Univ Tsukuba Dept Comp Sci 1-1-1 Tennodai Tsukuba Ibaraki 3058573 Japan Kanagawa Univ Dept Informat & Comp Sci Kanagawa Japan

ISBN: (纸本)9781457702518

Distributed shared memory (DSM) is an important technology that provides programmers the underlying execution mechanism for shared memory programs. To improve the performance of DSM, recent studies have been carried out with introducing compiler assistance. The compiler generates codes for dependency analysis and communication. This paper proposes high-performance DSM, called Offloaded-DSM, in which the processes of dependency analysis and communication are offloaded to the cluster network. In Offloaded-DSM, the host machine can concentrate on computation of an application itself, while the network maintains coherency in parallel. Through the results of preliminary evaluation, Offloaded-DSM reduces execution time up to 32% in eight nodes and exhibits good scalability.

关键词： Arrays Field programmable gate arrays Libraries Nickel Runtime library Switches Synchronization cluster network offloading compiler assistance compiler code generation compiler generators dependency analysis distributed shared memory distributed shared me

来源：评论

学校读者我要写书评

暂无评论

Cooperative data and computation partitioning for decentralized architectures

Cooperative data and computation partitioning for decentrali...

引用

作者： Chu, Michael L. University of Michigan

学位级别：Ph.D.

Scalability of future wide-issue processor designs is severely hampered by the use of centralized resources such as register files, memories and interconnect networks. While the use of centralized resources eases both hardware design and compiler code generation efforts, they can become performance bottlenecks as access latencies increase with larger designs. The natural solution to this problem is to adapt the architecture to use smaller, decentralized resources. Decentralized architectures use smaller, faster components and exploit distributed instruction-level parallelism across the resources. A multicluster architecture is an example of such a decentralized processor, where subsets of smaller register files, functional units, and memories are grouped together in a tightly coupled unit, forming a cluster. These clusters can then be replicated and connected together to form a scalable, high-performance architecture. The main difficulty with decentralized architectures resides in compiler code generation. In a centralized Very Long Instruction Word (VLIW) processor, the compiler must statically schedule each operation to both a functional unit and a time slot for execution. In contrast, for a decentralized multicluster VLIW, the compiler must consider the additional effects of cluster assignment, recognizing that communication between clusters will result in a delay penalty. In addition, if the multicluster processor also has partitioned data memories, the compiler has the additional task of assigning data objects to their respective memories. Each decision, of cluster, functional unit, memory, and time slot, are highly interrelated and can have dramatic effects on the best choice for every other decision. This dissertation addresses the issues of extracting and exploiting inherent parallelism across decentralized resources through compiler analysis and code generation techniques. First, a static analysis technique to partition data objects is presented, which maps

关键词： compiler code generation Multicluster Compilation Decentralized Architectures Data and code Partitioning Automatic Parallelization Thesis

来源：评论

学校读者我要写书评

暂无评论

A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors

引用

ACM TRANSACTIONS ON ARCHITECTURE AND code OPTIMIZATION 2019年第4期16卷 46-46页

作者： Yasin, Ahmad Haj-Yahya, Jawad Ben-Asher, Yosi Mendelson, Avi Univ Haifa Haifa Israel Intel Corp Santa Clara CA 95051 USA Swiss Fed Inst Technol Zurich Switzerland Univ Haifa Dept Comp Sci IL-3498838 Haifa Israel Technion CS Dept IL-320003 Haifa Israel POB 1997 IL-1790700 Kafr Manda Israel CAB F 72 Univ Str 6 CH-8092 Zurich Switzerland

The slowdown in technology scaling puts architectural features at the forefront of the innovation in modern processors. This article presents a Metric-Guided Method (MGM) that extends Top-Down analysis with carefully selected, dynamically adapted metrics in a structured approach. Using MGM, we conduct two evaluations at the microarchitecture and the Instruction Set Architecture (ISA) levels. Our results show that simple optimizations, such as improved representation of CISC instructions, broadly improve performance, while changes in the Floating-Point execution units had mixed impact. Overall, we report 10 architectural insights at the microarchitecture, ISA, and compiler fronts while quantifying their impact on the SPEC CPU benchmarks.

关键词： Performance analysis performance comparison benchmarking microarchitecture instruction set architecture compiler code generation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：