检索结果-内蒙古大学图书馆

Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

作者： Majedul Haque Sujon R. Clint Whaley Qing Yi University of Texas at San Antonio San Antonio TX USA Louisiana State University Baton Rouge LA USA University of Colorado Colorado Springs Colorado Springs CO USA

ISBN: (纸本)9781479910212

Modern architectures increasingly rely on simd vectorization to improve performance for floating point intensive scientific applications. However, existing compiler optimization techniques for automatic vectorization are inhibited by the presence of unknown control flow surrounding partially vectorizable computations. In this paper, we present a new approach, speculative vectorization, which speculates past dependent branches to aggressively vectorize computational paths that are expected to be taken frequently at runtime, while simply restarting the calculation using scalar instructions when the speculation fails. We have integrated our technique in an iterative optimizing compiler and have employed empirical tuning to select the profitable paths for speculation. When applied to optimize 9 floating-point benchmarks, our optimizing compiler has achieved up to 6.8X speedup for single precision and 3.4X for double precision kernels using AVX, while vectorizing some operations considered not vectorizable by prior techniques.

关键词： atlas IFKO iterative compilation compiler optimization speculation simd vectorization

来源：评论

学校读者我要写书评

暂无评论

Parallelization and performance comparison of the conjugate gradient equation solver on multicore Cell and Xeon computers

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2011年第18期23卷 2463-2476页

作者： Sibai, Fadi N. Saad, Mohammad Kidwai, Hashir K. UAE Univ IBM Cell Ctr Competency Al Ain U Arab Emirates UAE Univ Fac Informat Technol Al Ain U Arab Emirates

Multicore accelerators are used today to supplement traditional superscalar processors in massively parallel computer nodes with extra floating-point computation power. This paper presents our parallelization and performance enhancement and evaluation of the conjugate gradient (CG) linear equation solver with enhanced matrix multiplication on the Cell Broadband Engine accelerator. The paper also compares the CG performance results on the Cell and two CG implementations on a computer with two quadcore Xeon processors, one with OpenMP and the other with OpenMPI. We also report the enhancements made on the CG code and performance analysis of CG on single and dual Cell Broadband Engine packages with 8 and 16 synergistic processing elements and on Xeon for heptadiagonal matrices, in particular to matrix multiplication and synchronization. We also report the communication and computation time breakdowns and the floating point operations per second ratio. Our parallel CG solver is shown to scale well with data size, grid dimensionality, and number of cores. Copyright (C) 2011 John Wiley & Sons, Ltd.

关键词： conjugate gradient matrix multiplication Cell Broadband Engine atomic cache simd vectorization OpenMP OpenMPI

来源：评论

学校读者我要写书评

暂无评论

ACCELERATING VIDEO-MINING APPLICATIONS USING MANY SMALL, GENERAL-PURPOSE CORES

引用

IEEE MICRO 2008年第5期28卷 8-21页

作者： Li, Eric Li, Wenlong Tong, Xiaofeng Li, Jianguo Chen, Yurong Wang, Tao Wang, Patricia P. Hu, Wei Du, Yangzhou Zhang, Yimin Chen, Yen-Kuang Intel Corp Corp Technol Grp Santa Clara CA 95051 USA

EMERGING VIDEO-MINING APPLICATIONS SUCH AS IMAGE AND VIDEO RETRIEVAL AND INDEXING WILL REQUIRE REAL-TIME PROCESSING CAPABILITIES. A MANY-CORE ARCHITECTURE WITH 64 SMALL, IN-ORDER, GENERAL-PURPOSE CORES AS THE ACCELERATOR CAN HELP MEET THE NECESSARY PERFORMANCE GOALS AND REQUIREMENTS. THE KEY VIDEO-MINING MODULES CAN ACHIEVE PARALLEL SPEEDUPS OF 19x TO 62x FROM 64 CORES AND GET AN EXTRA 2.3x SPEEDUP FROM 128-BIT simd vectorization ON THE PROPOSED ARCHITECTURE.

关键词： data mining parallel architectures video retrieval simd vectorization accelerator image indexing image retrieval many-core architecture real-time processing video indexing video-mining image retrieval Parallel architectures core construction Accelerators data mining real-time process video retrieval image indexing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：