检索结果-内蒙古大学图书馆

Compilation of MATLAB computations to CPU/GPU via C/OpenCL generation

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2020年第22期32卷

作者： Reis, Luis Bispo, Joao Cardoso, Joao M. P. Univ Porto Fac Engn R Dr Roberto Frias S-N P-4200465 Porto Portugal Univ Porto INESC TEC R Dr Roberto Frias S-N P-4200465 Porto Portugal

In order to take advantage of the processing power of current computing platforms, programmers typically need to develop software versions for different target devices. This task is time-consuming and requires significant programming and computer architecture expertise. A possible and more convenient alternative is to start with a single high-level description of a program with minimum implementation details, and generate custom implementations according to the target platform. In this paper, we use MATLAB as a high-level programming language and propose a compiler that targets CPU/GPU computing platforms by generating customized implementations in C and OpenCL. We propose a number of compiler techniques to automatically generate efficient C and OpenCL code from MATLAB programs. One of such compiler techniques relies on heuristics to decide when and how to use Shared Virtual Memory (SVM). The experimental results show that our approach is able to generate code that provides significant speedups (eg, geometric mean speedup of 11x for a set of simple benchmarks) using a discrete GPU over equivalent sequential C code executing on a CPU. With more complex benchmarks, for which only some code regions can be parallelized, and are thus offloaded, the generated code achieved speedups of up to 2.2x. We also show the impact of using SVM, specifically fine-grained buffers, and the results show that the compiler is able to achieve significant speedups, both over the versions without SVM and with naive aggressive SVM use, across three CPU/GPU platforms.

关键词： compiler optimizations GPU MATLAB OpenCL parallel programming shared virtual memory

来源：评论

学校读者我要写书评

暂无评论

Deterministic parallel programming with Haskell

引用

COMPUTING IN SCIENCE & ENGINEERING 2012年第6期14卷 36-42页

作者： Coutts, Duncan Loh, Andres Well-Typed LLP Well-Typed LLP

Haskell is a modern, functional programming language with an interesting story to tell about parallelism: rather than using concurrent threads and locks, Haskell offers a variety of libraries that enable concise, high-level parallel programs with results that are guaranteed to be deterministic (independent of the number of cores and the scheduling being used).

关键词： parallel programming Functional Languages Functional programming Haskell Code Deterministic parallel programming Functional programming Language High Level parallel Programs parallel Processing Concurrent Computing parallel programming Poisson Equations Message Systems Computer Languages programming Scientific Computing High Performance Code Haskell Applicative Functional programming Concurrent programming

来源：评论

学校读者我要写书评

暂无评论

Design and evaluation of efficient global data movement in partitioned global address space

引用

parallel COMPUTING 2020年 96卷

作者： Murai, Hitoshi Sato, Mitsuhisa RIKEN Ctr Computat Sci Chuo Ku 7-1-26 Minatojima Minami Machi Kobe Hyogo 6500047 Japan

Global data movement is the most general, and therefore important, function of inter-node communication in the partitioned global address space programming models, such as XcalableMP. Our implementation of it consists of compile-time and run-time optimization for specific cases and run-time processing based on the calculus of common-stride section descriptors for general cases, which allows efficient construction of communication schedules for global data movement. As a result of the evaluation of the implementation on the K computer and a common Linux cluster, it is verified to be effective and useful as a compiler feature in most cases. (C) 2020 Elsevier B.V. All rights reserved.

关键词： parallel programming parallel language Partitioned global address space Compiler Data communication High performance computing

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of coupled phase equilibrium-mass transfer model: Efficient and accurate simulation of fractured reservoirs

引用

JOURNAL OF NATURAL GAS SCIENCE AND ENGINEERING 2020年 78卷

作者： Khaz'ali, Ali Reza Bordbar, Sadegh Mehrabani-Zeinabad, Arjomand Isfahan Univ Technol Dept Chem Engn Esfahan *** Iran

Molecular diffusion plays a vital role in production from fractured reservoirs in all stages of recovery, especially for fractured reservoirs with small matrix sizes and unfavorable wettability conditions. Molecular diffusion can only be simulated by compositional reservoir simulators, which have historically employed a decoupled phase equilibrium-mass transfer model. Regardless of having higher performance, such a model cannot properly simulate intra- and cross-phase molecular diffusion. In the current research, a compositional fractured reservoir simulator, called Osiris, has been developed in C++ using the coupled formulation. After presenting the primary equations and algorithms, the performance of Osiris has been evaluated through a series of case studies. Utilizing MPI, Osiris could keep its runtime reasonable, despite the high computational demand of coupled modeling. Additionally, the simulation results of Osiris clearly prove the precision of the coupled modeling;and considerable effects of diffusive mass transfer on fractured reservoir performance.

关键词： Compositional simulation Dual permeability model Immiscible gas injection Molecular diffusion parallel programming Preconditioner

来源：评论

学校读者我要写书评

暂无评论

DtCraft: A High-Performance Distributed Execution Engine at Scale

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2019年第6期38卷 1070-1083页

作者： Huang, Tsung-Wei Lin, Chun-Xun Wong, Martin D. F. Univ Illinois Dept Elect & Comp Engn Champaign IL 61801 USA

Recent years have seen rapid growth in data-driven distributed systems, such as Hadoop MapReduce, Spark, and Dryad. However, the counterparts for high-performance or compute-intensive applications including large-scale optimizations, modeling, and simulations are still nascent. In this paper, we introduce DtCraft, a modern C++ based distributed execution engine to streamline the development of high-performance parallel applications. Users need no understanding of distributed computing and can focus on high-level developments, leaving difficult details, such as concurrency controls, workload distribution, and fault tolerance handled by our system transparently. We have evaluated DtCraft on both micro-benchmarks and large-scale optimization problems, and shown the promising performance from single multicore machines to clusters of computers. In a particular semiconductor design problem, we achieved 30x speedup with 40 nodes and 15x less development efforts over hand-crafted implementation.

关键词： Distributed computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Safe Usage of Registers in BSPlib 19

Safe Usage of Registers in BSPlib

引用

34th ACM/SIGAPP Annual International Symposium on Applied Computing (SAC)

作者： Jakobsson, Arvid Dabrowski, Frederic Bousdira, Wadoud Huawei France Res Ctr Boulogne France Univ Orleans INSA Ctr LIFO EA 4022 Orleans France Univ Orleans LIFO Orleans France

ISBN: (纸本)9781450359337

Bulk Synchronous parallel (BSP) is a simple but powerful high-level model for parallel computation. Using BSPlib, programmers can write BSP programs in the general purpose language C. Direct Remote Memory Access (DRMA) communication in BSPlib is enabled using registrations: associations between the local memories of all processes in the BSP computation. However, the semantics of registration is non-trivial and ambiguously specified and thus its faulty usage is a potential source of errors. We give a formal semantics of BSPlib with which we characterize correct registration. Anticipating a static analysis, we give a simplified programming model that guarantees correct registration usage, drawing upon previous work on textual alignment.

关键词： parallel programming Bulk Synchronous parallelism Static Analysis Communication

来源：评论

学校读者我要写书评

暂无评论

Facilitating the learning process in parallel computing by using instant messaging 19

Facilitating the learning process in parallel computing by u...

引用

7th International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM)

作者： Manuel Guerrero-Higueras, Angel Sanchez-Gonzalez, Lidia Angel Conde, Miguel Rodriguez Lera, Francisco J. Castejon-Limas, Manuel Petkov, Nicolai Univ Leon Dept Mech Comp Sci & Aerosp Engn Campus Vegazana S-N E-24071 Leon Spain Univ Groningen Johann Bernoulli Inst Math & Comp Sci Groningen Netherlands

ISBN: (纸本)9781450371919

parallel programming skills may require long time to acquire. "Think in parallel" is a skill which requires time, effort, and experience. In this work, we propose to facilitate the learning process in parallel programming by using instant messaging by students. Our aim is to find out if students' interaction through instant messaging is beneficial for the learning process. We asked several students of an HPC course of the Master's degree in Computer Science to develop a specific parallel application, each of them using a different application program interface: OpenMP, MPI, CUDA, or OpenCL. Even though the used APIs are different, there are common points in the design process. We proposed to these students to interact with each other by using Gitter, an instant messaging tool for GitHub users. Our analysis of the communications and results demonstrate that the direct interaction of students through the Gitter tool has a positive impact on the learning process.

关键词： High-performance Computing Instant Messaging parallel programming

来源：评论

学校读者我要写书评

暂无评论

Twister2: TSet High-Performance Iterative Dataflow

Twister2: TSet High-Performance Iterative Dataflow

引用

International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS)

作者： Wickramasinghe, Pulasthi Kamburugamuve, Supun Govindarajan, Kannan Abeykoon, Vibhatha Widanage, Chathura Perera, Niranda Uyar, Ahmet Gunduz, Gurhan Akkas, Selahattin Fox, Geoffrey Indiana Univ SICE Bloomington IN 47405 USA

ISBN: (纸本)9781728104669

The dataflow model is gradually becoming the de facto standard for big data applications. While many popular frameworks are built around this model, very little research has been done on understanding its inner workings, which in turn has led to inefficiencies in existing frameworks. It is important to note that understanding the relationship between dataflow and HPC building blocks allows us to address and alleviate many of these fundamental inefficiencies by learning from the extensive research literature in the HPC community. In this paper we present TSet's, the dataflow abstraction of Twister2, which is a big data framework designed for high-performance dataflow and iterative computations. We discuss the dataflow model adopted by TSet's and the rationale behind implementing iteration handling at the worker level. Finally, we evaluate TSet's to show the performance of the framework.

关键词： dataflow big data mapreduce batch stream iterative parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallelisation of practical shared sampling alpha matting with OpenMP

引用

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING 2020年第1期21卷 105-115页

作者： Weng, Tien-Hsiung Chiu, Chi-Ching Hsieh, Meng-Yen Lu, Huimin Li, Kuan-Ching Providence Univ Dept Comp Sci & Informat Engn CSIE Taichung 43301 Taiwan Kyushu Inst Technol Dept Mech & Control Engn Kitakyushu Fukuoka Japan Hubei Univ Educ Hubei Educ Cloud Serv Engn Technol Res Ctr Wuhan Peoples R China

In modern filmmaking industry, image matting has been one of the common tasks in video side effects and the necessary intermediate steps in computer vision. It pulls the foreground object from the background of an image by estimating the alpha values. However, the computational speed for matting high resolution images can be significantly slow due to its complexity and computation that is proportional to the size of unknown region. In order to improve the performance, we implement a parallel alpha matting code with OpenMP from existing sequential code for running on the multicore servers. We present and discuss the algorithm and experimentation results from the perspective of the parallel application developer. The development takes less effort, and the results show significant performance improvement of the entire program.

关键词： image matting OpenMP multicore processing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Cooperation of CUDA and Intel multi-core architecture in the independent component analysis algorithm for EEG data

引用

BIO-ALGORITHMS AND MED-SYSTEMS 2020年第3期16卷

作者： Gajos-Balinska, Anna Wojcik, Grzegorz M. Stpiczynski, Przemyslaw Marie Curie Sklodowska Univ Inst Comp Sci Neuroinformat & Biomed Engn Akad 9 PL-20033 Lublin Poland Marie Curie Sklodowska Univ Inst Comp Sci Software & Informat Syst Lublin Poland

Objectives: The electroencephalographic signal is largely exposed to external disturbances. Therefore, an important element of its processing is its thorough cleaning. Methods: One of the common methods of signal improvement is the independent component analysis (ICA). However, it is a computationally expensive algorithm, hence methods are needed to decrease its execution time. One of the ICA algorithms (fastICA) and parallel computing on the CPU and GPU was used to reduce the algorithm execution time. Results: This paper presents the results of study on the implementation of fastICA, which uses some multi-core architecture and the GPU computation capabilities. Conclusions: The use of such a hybrid approach shortens the execution time of the algorithm.

关键词： CUDA electroencephalography independent component analysis parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：