检索结果-内蒙古大学图书馆

Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS 2013年第2期13卷 25-25页

作者： Papakonstantinou, Alexandros Gururaj, Karthik Stratton, John A. Chen, Deming Cong, Jason Hwu, Wen-Mei W. Univ Illinois Elect & Comp Engn Dept Urbana IL 60680 USA Univ Calif Los Angeles Dept Comp Sci Los Angeles CA 90024 USA

The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse-and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SIMT (Single Instruction, Multiple Thread) CUDA code into task-level parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multicore accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.

关键词： Design Performance FPGA high-level synthesis parallel programming model high-performance computing source-to-source compiler heterogeneous compute systems

来源：评论

学校读者我要写书评

暂无评论

MiX10: Compiling MATLAB to X10 for High Performance

引用

ACM SIGPLAN NOTICES 2014年第10期49卷 617-636页

作者： Kumar, Vineet Hendren, Laurie McGill Univ Montreal PQ H3A 2T5 Canada

MATLAB is a popular dynamic array-based language commonly used by students, scientists and engineers who appreciate the interactive development style, the rich set of array operators, the extensive builtin library, and the fact that they do not have to declare static types. Even though these users like to program in MATLAB, their computations are often very compute-intensive and are better suited for emerging high performance computing systems. This paper reports on MIX 10, a source-to-source compiler that automatically translates MATLAB programs to X10, a language designed for "Performance and Productivity at Scale";thus, helping scientific programmers make better use of high performance computing systems. There is a large semantic gap between the array-based dynamically-typed nature of MATLAB and the object-oriented, statically-typed, and high-level array abstractions of X10. This paper addresses the major challenges that must be overcome to produce sequential X10 code that is competitive with state-of-the-art static compilers for MATLAB which target more conventional imperative languages such as C and Fortran. Given that efficient basis, the paper then provides a translation for the MATLAB par for construct that leverages the powerful concurrency constructs in X10. The MIX 10 compiler has been implemented using the McLab compiler tools, is open source, and is available both for compiler researchers and end-user MATLAB programmers. We have used the implementation to perform many empirical measurements on a set of 17 MATLAB benchmarks. We show that our best MIX 10-generated code is significantly faster than the de facto Mathworks' MATLAB system, and that our results are competitive with state-of-the-art static compilers that target C and Fortran. We also show the importance of finding the correct approach to representing arrays in X10, and the necessity of an IntegerOkay analysis that determines which double variables can be safely represented as integers. Finally,

关键词： Experimentation Languages MATLAB X10 source-to-source compiler

来源：评论

学校读者我要写书评

暂无评论

ROSE::FTTransform - A source-to-source Translation Framework for Exascale Fault-Tolerance Research

ROSE::FTTransform - A Source-to-Source Translation Framework...

引用

IEEE/IFIP International Conference on Dependable Systems and Networks Workshops

作者： Jacob Lidman Daniel J. Quinlan Chunhua Liao Sally A. McKee Lawrence Livermore National Laboratory Department of Computer Science and Engineering Chalmers University of Technology

ISBN: (纸本)9781467322645

Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by processors driven at lower voltages (near the minimum 0.5 volts for current transistors). At these voltage levels, the rate of transient errors increases dramatically due to the sensitivity to transient and geographically localized voltage drops on parts of the processor chip. To achieve power efficiency, these processors are likely to be streamlined and minimal, and thus they cannot be expected to handle transient errors entirely in hardware. Here we present an open, compiler-based framework to automate the armoring of High Performance Computing (HPC) software to protect it from these types of transient processor errors. We develop an open infrastructure to support research work in this area, and we define tools that, in the future, may provide more complete automated and/or semi-automated solutions to support software resiliency on future exascale architectures. Results demonstrate that our approach is feasible, pragmatic in how it can be separated from the software development process, and reasonably efficient (0% to 30% overhead for the Jacobi iteration on common hardware;and 20%, 40%, 26%, and 2% overhead for a randomly selected subset of benchmarks from the Livermore Loops [1]).

关键词： High Performance Computing Redundancy Fault Tolerance Exascale source-to-source compiler High Performance Computing Fault tolerance redundancy PROCESSOR transient errors common hardware Armor VOLTAGE dynamic efficiency

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：