检索结果-内蒙古大学图书馆

2013 IEEE 37th Annual Computer software and Applications Conference, COMPSAC 2013

作者： Tian, Xinmin Saito, Hideki Preis, Serguei V. Garcia, Eric N. Kozhukhov, Sergey S. Masten, Matt Cherkasov, Aleksei G. Panchenko, Nikolay Mobile Computing and Compilers Software and Service Group Intel Corporation Santa Clara CA United States Novosibirsk Russia

ISBN: (纸本)9780769549798

Intel® Xeon Phi coprocessor is based on the Intel® Many Integrated Core (Intel® MIC) architecture, which is an innovative new processor architecture that combines abundant thread parallelism with long SIMD vector units. Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel® Xeon Phi coprocessors. In this paper, we present several practical SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel® MIC specific alignment optimization, and small matrix transpose/multiplication 2-D vectorization implemented in the Intel® C/C++ and Fortran production compilers for Intel® Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel® Xeon Phi coprocessor. © 2013 IEEE.

关键词： Coprocessor

来源：评论

学校读者我要写书评

暂无评论

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Practical SIMD Vectorization Techniques for Intel® Xeon Phi...

引用

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Xinmin Tian Hideki Saito Serguei V. Preis Eric N. Garcia Sergey S. Kozhukhov Matt Masten Aleksei G. Cherkasov Nikolay Panchenko Mobile Computing and Compilers Software and Service Group Intel Corporation Santa Clara California USA Mobile Computing and Compilers Software and Service Group Intel Corporation Novosibirsk Russia

关键词： Vectors Coprocessors Microwave integrated circuits Optimization Registers Computer architecture Parallel processing

来源：评论

学校读者我要写书评

暂无评论

Compiling C/C++ SIMD extensions for function and loop vectorizaion on multicore-SIMD processors

Compiling C/C++ SIMD extensions for function and loop vector...

引用

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

作者： Tian, Xinmin Saito, Hideki Girkar, Milind Preis, Serguei V. Kozhukhov, Sergey S. Cherkasov, Aleksei G. Nelson, Clark Panchenko, Nikolay Geva, Robert Mobile Computing and Compilers Software and Service Group Intel Corporation Santa Clara CA United States

ISBN: (纸本)9780769546766

SIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel® SSE, AVX, and IBM* AltiVec. However, most of the focus has been directed at loops, effectively executing their iterations on multiple SIMD lanes concurrently relying upon program hints and compiler analysis. This paper presents a set of new C/C++ high-level vector extensions for SIMD programming, and the Intel® C++ product compiler that is extended to translate these vector extensions and produce optimized SIMD instruction sequences of vectorized functions and loops. For a function, our main idea is to vectorize the entire function for callers instead of just vectorizing loops (if any) inside the function. It poses the challenge of dealing with complicated control-flow in the function body, and matching caller and callee for SIMD vector calls while vectorizing caller functions (or loops) and callee functions. Our compilation methods for automatically compiling vector extensions are described. We present performance results of several non-trivial visual computing, computational, and simulation workloads, utilizing SIMD units through the vector extensions on Intel® Multicore 128-bit SIMD processors, and we show that significant SIMD speedups (3.07x to 4.69x) are achieved over the serial execution. © 2012 IEEE.

关键词： Vectors

来源：评论

学校读者我要写书评

暂无评论

Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors

Compiling C/C++ SIMD Extensions for Function and Loop Vector...

引用

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Xinmin Tian Hideki Saito Milind Girkar Serguei V. Preis Sergey S. Kozhukhov Aleksei G. Cherkasov Clark Nelson Nikolay Panchenko Robert Geva Mobile Computing and Compilers Software and Service Group Intel Corporation Santa Clara CA USA Software and Service Group Mobile Computing and Compilers Intel Corporation Novosibirsk Russia

ISBN: (纸本)9781467309745

SIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel ® SSE, AVX, and IBM* AltiVec. However, most of the focus has been directed at loops, effectively executing their iterations on multiple SIMD lanes concurrently relying upon program hints and compiler analysis. This paper presents a set of new C/C++ high-level vector extensions for SIMD programming, and the Intel ® C++ product compiler that is extended to translate these vector extensions and produce optimized SIMD instruction sequences of vectorized functions and loops. For a function, our main idea is to vectorize the entire function for callers instead of just vectorizing loops (if any) inside the function. It poses the challenge of dealing with complicated control-flow in the function body, and matching caller and callee for SIMD vector calls while vectorizing caller functions (or loops) and callee functions. Our compilation methods for automatically compiling vector extensions are described. We present performance results of several non-trivial visual computing, computational, and simulation workloads, utilizing SIMD units through the vector extensions on Intel ® Multicore 128-bit SIMD processors, and we show that significant SIMD speedups (3.07x to 4.69x) are achieved over the serial execution.

关键词： Vectors Graphics processing unit Hardware Programming Parallel processing Cloning

来源：评论

学校读者我要写书评

暂无评论

Research topics and future trends

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2004年 3014卷 1-5页

作者： Bosch, Jan Obbink, Henk Maccari, Alessandro University of Groningen Department of Computing Science PO Box 800 NL 9700 AV Groningen Netherlands Philips Research Eindhoven Netherlands Nokia Mobile Software Web Service Technologies PO Box 100 FIN 00045 Nokia Group Finland

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：