检索结果-内蒙古大学图书馆

34th ACM/SIGAPP Annual International Symposium on Applied Computing (SAC)

作者： Jakobsson, Arvid Dabrowski, Frederic Bousdira, Wadoud Huawei France Res Ctr Boulogne France Univ Orleans INSA Ctr LIFO EA 4022 Orleans France Univ Orleans LIFO Orleans France

ISBN: (纸本)9781450359337

Bulk Synchronous parallel (BSP) is a simple but powerful high-level model for parallel computation. Using BSPlib, programmers can write BSP programs in the general purpose language C. Direct Remote Memory Access (DRMA) communication in BSPlib is enabled using registrations: associations between the local memories of all processes in the BSP computation. However, the semantics of registration is non-trivial and ambiguously specified and thus its faulty usage is a potential source of errors. We give a formal semantics of BSPlib with which we characterize correct registration. Anticipating a static analysis, we give a simplified programming model that guarantees correct registration usage, drawing upon previous work on textual alignment.

关键词： parallel programming Bulk Synchronous parallelism Static Analysis Communication

来源：评论

学校读者我要写书评

暂无评论

Service Level Objectives via C++11 Attributes 24th

Service Level Objectives via C++11 Attributes

引用

International European Conference on parallel and Distributed Computing (Euro-Par)

作者： Griebler, Dalvan De Sensi, Daniele Vogel, Adriano Danelutto, Marco Fernandes, Luiz Gustavo Pontificia Univ Catolica Rio Grande do Sul Sch Technol Porto Alegre RS Brazil Univ Pisa Dept Comp Sci Pisa Italy Tres De Maio Fac Lab Adv Res Cloud Comp Tres De Maio Brazil

ISBN: (纸本)9783030105495;9783030105488

In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness.

关键词： parallel programming Adaptive and autonomic computing Power-aware computing Domain-specific language

来源：评论

学校读者我要写书评

暂无评论

Twister2: TSet High-Performance Iterative Dataflow

Twister2: TSet High-Performance Iterative Dataflow

引用

International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS)

作者： Wickramasinghe, Pulasthi Kamburugamuve, Supun Govindarajan, Kannan Abeykoon, Vibhatha Widanage, Chathura Perera, Niranda Uyar, Ahmet Gunduz, Gurhan Akkas, Selahattin Fox, Geoffrey Indiana Univ SICE Bloomington IN 47405 USA

ISBN: (纸本)9781728104669

The dataflow model is gradually becoming the de facto standard for big data applications. While many popular frameworks are built around this model, very little research has been done on understanding its inner workings, which in turn has led to inefficiencies in existing frameworks. It is important to note that understanding the relationship between dataflow and HPC building blocks allows us to address and alleviate many of these fundamental inefficiencies by learning from the extensive research literature in the HPC community. In this paper we present TSet's, the dataflow abstraction of Twister2, which is a big data framework designed for high-performance dataflow and iterative computations. We discuss the dataflow model adopted by TSet's and the rationale behind implementing iteration handling at the worker level. Finally, we evaluate TSet's to show the performance of the framework.

关键词： dataflow big data mapreduce batch stream iterative parallel programming

来源：评论

学校读者我要写书评

暂无评论

From Mathematical Model to parallel Execution to Performance Improvement: Introducing Students to a Workflow for Scientific Computing 24th

From Mathematical Model to Parallel Execution to Performance...

引用

International European Conference on parallel and Distributed Computing (Euro-Par)

作者： Kasielke, Franziska Tschueter, Ronny Tech Univ Dresden Fac Comp Sci D-01062 Dresden Germany Tech Univ Dresden Ctr Informat Serv & High Performance Comp D-01062 Dresden Germany

ISBN: (纸本)9783030105495;9783030105488

Current courses in parallel and distributed computing (PDC) often focus on programming models and techniques. However, PDC is embedded in a scientific workflow that incorporates more than programming skills. The workflow spans from mathematical modeling to programming, data interpretation, and performance analysis. Especially the last task is covered insufficiently in educational courses. Often scientists from different fields of knowledge, each with individual expertise, collaborate to perform these tasks. In this work, the general design and the implementation of an exercise within the course "Supercomputers and their programming" at Technische Universitat Dresden, Faculty of Computer Science is presented. In the exercise, the students pass through a complete workflow for scientific computing. The students gain or improve their knowledge about: (i) mathematical modeling of systems, (ii) transferring the mathematical model to a (parallel) program, (iii) visualization and interpretation of the experiment results, and (iv) performance analysis and improvements. The exercise exactly aims at bridging the gap between the individual tasks of a scientific workflow and equip students with wide knowledge.

关键词： Workflow for scientific computing Teaching parallel programming Performance analysis Heat transfer

来源：评论

学校读者我要写书评

暂无评论

Performance of Map-Reduce Using Java-8 parallel Streams

Performance of Map-Reduce Using Java-8 Parallel Streams

引用

Computing Conference

作者： Lester, Bruce P. Maharishi Univ Management Comp Sci Dept Fairfield IA 52557 USA

ISBN: (纸本)9783030011741;9783030011734

The primary purpose of parallel streams in the recent release of Java 8 is to help Java programs make better use of multi-core processors for improved performance. However, in some cases, parallel streams can actually perform considerably worse than ordinary sequential Java code. This paper presents a Map-Reduce parallel programming pattern for Java parallel streams that produces good speedup over sequential code. An important component of the Map-Reduce pattern is two optimizations: grouping and locality. Three parallel application programs are used to illustrate the Map-Reduce pattern and its optimizations: Histogram of an Image, Document Keyword Search, and Solution to a Differential Equation. A proposal is included for a new terminal stream operation for the Java language called MapReduce() that applies this pattern and its optimizations automatically.

关键词： parallel programming Multi-core programming MapReduce Java parallel streams parallel computing

来源：评论

学校读者我要写书评

暂无评论

Analysis and Optimization of Pipelined Broadcast Algorithms on Gigabit Ethernet and InfiniBand Networks 15

Analysis and Optimization of Pipelined Broadcast Algorithms ...

引用

15th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS)

作者： Kurnosov, Mikhail Berlizov, Daniil Tkacheva, Tatiana Tokmasheva, Elizaveta Siberian State Univ Telecommun & Informat Sci Novosibirsk Russia

ISBN: (纸本)9781728129860

Theoretical and experimental analysis of MPI_Bcast algortihms is presented. The optimal tree degrees and segment sizes for pipelined versions of algorithms are obtained. Algorithms were investigated according to their implementation in the Open MPI library. Theoretical results are consistent with experiments on a computer cluster with Gigabit Ethernet and InfiniBand communication networks.

关键词： broadcast MPI parallel programming high-performance computing

来源：评论

学校读者我要写书评

暂无评论

Extending LLVM for Lightweight SPMD Vectorization: Using SIMD and Vector Instructions Easily from Any Language 2019

Extending LLVM for Lightweight SPMD Vectorization: Using SIM...

引用

17th IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

作者： Kruppe, Robin Oppermann, Julian Sommer, Lukas Koch, Andreas Tech Univ Darmstadt Embedded Syst & Applicat Grp Darmstadt Germany

ISBN: (纸本)9781728114361

Popular language extensions for parallel programming such as OpenMP or CUDA require considerable compiler support and runtime libraries and are therefore only available for a few programming languages and/or targets. We present an approach to vectorizing kernels written in an existing generalpurpose language that requires minimal changes to compiler front- ends. Programmers annotate parallel (SPMD) code regions with a few intrinsic functions, which then guide an ordinary automatic vectorization algorithm. This mechanism allows programming SIMD and vector processors effectively while avoiding much of the implementation complexity of more comprehensive and powerful approaches to parallel programming. Our prototype implementation, based on a custom vectorization pass in LLVM, is integrated into C, C++ and Rust compilers using only 29-37 lines of frontend-specific code each.

关键词： Kernel Graphics processing units parallel processing Runtime library parallel programming C++ languages

来源：评论

学校读者我要写书评

暂无评论

Theoretical and Practical Approaches for Teaching parallel Code Correctness 26

Theoretical and Practical Approaches for Teaching Parallel C...

引用

26th International Conference on High Performance Computing, Data and Analytics (HiPCW)

作者： Redondo, Carlos Arora, Ritu Trung Nguyen Ba Univ Texas Austin Austin TX 78712 USA Univ Texas Austin Texas Adv Comp Ctr Austin TX 78712 USA Univ Massachusetts Amherst MA 01003 USA

ISBN: (纸本)9781728148946

The introductory-level courses on parallel programming, typically, do not cover the topic of code correctness. Often, students learn about the logical errors in parallel programs and troubleshoot them through trial and error, and spend a significant amount of time and effort in the process. A systematic pedagogical approach to teaching parallel code correctness is therefore needed to enhance the productivity of students and instructors. In this paper, we describe some theoretical and practical approaches that can be adopted for assessing and teaching parallel code correctness. The theoretical approaches include using formal methods (e.g., Petri nets and Hoare Logic). We apply these approaches on the test cases discussed in this paper. The practical approach involves teaching code correctness through demonstrations. For enabling this, we have not only curated a repository of parallel programs with commonly made logical errors but have also added a high-level interface on top of the repository for quickly comparing fixed and incorrect versions of the sample code in the repository, seeing the explanation text on the errors, and searching the repository on the basis of the causes and symptoms of logical errors. The work presented in this paper can potentially motivate the instructors in including the content on code correctness in their parallel programming courses and trainings.

关键词： formal verification Hoare logic Petri nets mathematical modeling parallel programming CUDA MPI OpenMP

来源：评论

学校读者我要写书评

暂无评论

To Be, or Not To Be: That is the Recursive Question 10

To Be, or Not To Be: That is the Recursive Question

引用

10th IEEE Global Engineering Education Conference (EDUCON)

作者： Fernandez de Vega, Francisco Univ Extremadura Badajoz Spain

ISBN: (纸本)9781538695067

This paper discusses the opportunity of Functional programming for making students aware about data dependencies and their implications when using parallel and distributed computing infrastructures. Although other programming methodologies, such as Object Oriented programming (OOP) are usually preferred to he taught at computer science degrees, the problem is that the sequential programming approach is inherent to the model, and once students have entered the framework, it is not easy for them to learn modern parallel programming models. Thus, the methodology learned may act as a straitjacket, preventing students from taking advantage of the parallel architectures widely available. The idea presented here relies on choosing Functional programming as the methodology to be learned first. Moreover, when any selected language that embodies the functional model is shown to students, we propose to forbid loops, similarly as how go-to sentences are classically forbidden in high level programming languages, or global variables are forbidden to avoid side effects. Students must thus resort instead to recursive functions if data dependencies are present and a sequential order of operations is required, or to map functions when no dependencies exist. This way, students naturally develop the skill to automatically write parallel code within the functional programming context, and then the map/reduce model can be easily exploited in any context when parallel and distributed infrastructures are available. We describe preliminary results obtained when the model has been successfully tested with a group of middle school students.

关键词： Functional programming programming profession Task analysis parallel programming Tools Handheld computers

来源：评论

学校读者我要写书评

暂无评论

Cooperation of CUDA and Intel multi-core architecture in the independent component analysis algorithm for EEG data

引用

BIO-ALGORITHMS AND MED-SYSTEMS 2020年第3期16卷

作者： Gajos-Balinska, Anna Wojcik, Grzegorz M. Stpiczynski, Przemyslaw Marie Curie Sklodowska Univ Inst Comp Sci Neuroinformat & Biomed Engn Akad 9 PL-20033 Lublin Poland Marie Curie Sklodowska Univ Inst Comp Sci Software & Informat Syst Lublin Poland

Objectives: The electroencephalographic signal is largely exposed to external disturbances. Therefore, an important element of its processing is its thorough cleaning. Methods: One of the common methods of signal improvement is the independent component analysis (ICA). However, it is a computationally expensive algorithm, hence methods are needed to decrease its execution time. One of the ICA algorithms (fastICA) and parallel computing on the CPU and GPU was used to reduce the algorithm execution time. Results: This paper presents the results of study on the implementation of fastICA, which uses some multi-core architecture and the GPU computation capabilities. Conclusions: The use of such a hybrid approach shortens the execution time of the algorithm.

关键词： CUDA electroencephalography independent component analysis parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：