检索结果-内蒙古大学图书馆

17th IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

作者： Kruppe, Robin Oppermann, Julian Sommer, Lukas Koch, Andreas Tech Univ Darmstadt Embedded Syst & Applicat Grp Darmstadt Germany

ISBN: (纸本)9781728114361

Popular language extensions for parallel programming such as OpenMP or CUDA require considerable compiler support and runtime libraries and are therefore only available for a few programming languages and/or targets. We present an approach to vectorizing kernels written in an existing generalpurpose language that requires minimal changes to compiler front- ends. Programmers annotate parallel (SPMD) code regions with a few intrinsic functions, which then guide an ordinary automatic vectorization algorithm. This mechanism allows programming SIMD and vector processors effectively while avoiding much of the implementation complexity of more comprehensive and powerful approaches to parallel programming. Our prototype implementation, based on a custom vectorization pass in LLVM, is integrated into C, C++ and Rust compilers using only 29-37 lines of frontend-specific code each.

关键词： Kernel Graphics processing units parallel processing Runtime library parallel programming C++ languages

来源：评论

学校读者我要写书评

暂无评论

To Be, or Not To Be: That is the Recursive Question 10

To Be, or Not To Be: That is the Recursive Question

引用

10th IEEE Global Engineering Education Conference (EDUCON)

作者： Fernandez de Vega, Francisco Univ Extremadura Badajoz Spain

ISBN: (纸本)9781538695067

This paper discusses the opportunity of Functional programming for making students aware about data dependencies and their implications when using parallel and distributed computing infrastructures. Although other programming methodologies, such as Object Oriented programming (OOP) are usually preferred to he taught at computer science degrees, the problem is that the sequential programming approach is inherent to the model, and once students have entered the framework, it is not easy for them to learn modern parallel programming models. Thus, the methodology learned may act as a straitjacket, preventing students from taking advantage of the parallel architectures widely available. The idea presented here relies on choosing Functional programming as the methodology to be learned first. Moreover, when any selected language that embodies the functional model is shown to students, we propose to forbid loops, similarly as how go-to sentences are classically forbidden in high level programming languages, or global variables are forbidden to avoid side effects. Students must thus resort instead to recursive functions if data dependencies are present and a sequential order of operations is required, or to map functions when no dependencies exist. This way, students naturally develop the skill to automatically write parallel code within the functional programming context, and then the map/reduce model can be easily exploited in any context when parallel and distributed infrastructures are available. We describe preliminary results obtained when the model has been successfully tested with a group of middle school students.

关键词： Functional programming programming profession Task analysis parallel programming Tools Handheld computers

来源：评论

学校读者我要写书评

暂无评论

Theoretical and Practical Approaches for Teaching parallel Code Correctness 26

Theoretical and Practical Approaches for Teaching Parallel C...

引用

26th International Conference on High Performance Computing, Data and Analytics (HiPCW)

作者： Redondo, Carlos Arora, Ritu Trung Nguyen Ba Univ Texas Austin Austin TX 78712 USA Univ Texas Austin Texas Adv Comp Ctr Austin TX 78712 USA Univ Massachusetts Amherst MA 01003 USA

ISBN: (纸本)9781728148946

The introductory-level courses on parallel programming, typically, do not cover the topic of code correctness. Often, students learn about the logical errors in parallel programs and troubleshoot them through trial and error, and spend a significant amount of time and effort in the process. A systematic pedagogical approach to teaching parallel code correctness is therefore needed to enhance the productivity of students and instructors. In this paper, we describe some theoretical and practical approaches that can be adopted for assessing and teaching parallel code correctness. The theoretical approaches include using formal methods (e.g., Petri nets and Hoare Logic). We apply these approaches on the test cases discussed in this paper. The practical approach involves teaching code correctness through demonstrations. For enabling this, we have not only curated a repository of parallel programs with commonly made logical errors but have also added a high-level interface on top of the repository for quickly comparing fixed and incorrect versions of the sample code in the repository, seeing the explanation text on the errors, and searching the repository on the basis of the causes and symptoms of logical errors. The work presented in this paper can potentially motivate the instructors in including the content on code correctness in their parallel programming courses and trainings.

关键词： formal verification Hoare logic Petri nets mathematical modeling parallel programming CUDA MPI OpenMP

来源：评论

学校读者我要写书评

暂无评论

Cooperation of CUDA and Intel multi-core architecture in the independent component analysis algorithm for EEG data

引用

BIO-ALGORITHMS AND MED-SYSTEMS 2020年第3期16卷

作者： Gajos-Balinska, Anna Wojcik, Grzegorz M. Stpiczynski, Przemyslaw Marie Curie Sklodowska Univ Inst Comp Sci Neuroinformat & Biomed Engn Akad 9 PL-20033 Lublin Poland Marie Curie Sklodowska Univ Inst Comp Sci Software & Informat Syst Lublin Poland

Objectives: The electroencephalographic signal is largely exposed to external disturbances. Therefore, an important element of its processing is its thorough cleaning. Methods: One of the common methods of signal improvement is the independent component analysis (ICA). However, it is a computationally expensive algorithm, hence methods are needed to decrease its execution time. One of the ICA algorithms (fastICA) and parallel computing on the CPU and GPU was used to reduce the algorithm execution time. Results: This paper presents the results of study on the implementation of fastICA, which uses some multi-core architecture and the GPU computation capabilities. Conclusions: The use of such a hybrid approach shortens the execution time of the algorithm.

关键词： CUDA electroencephalography independent component analysis parallel programming

来源：评论

学校读者我要写书评

暂无评论

Novice-Friendly Multi-Armed Robotics programming 2

Novice-Friendly Multi-Armed Robotics Programming

引用

2nd IEEE/ACM International Workshop on Robotics Software Engineering (RoSE)

作者： Ritschel, Nico Holmes, Reid Garcia, Ronald Shepherd, David C. Univ British Columbia Dept Comp Sci Vancouver BC Canada ABB Corp Res Raleigh NC USA

ISBN: (纸本)9781728122496

Collaborative robots are being applied in a growing number of usage scenarios, but their adoption is slowed down by the high complexity of robot programming. As previous prototype studies have shown, block-based programming environments can enable novice or end users to program industrial single-armed robots. Some existing block-based tools support parallel programming and therefore show potential to be used for multi-armed robot programming as well. We analyze their designs and argue how improved abstractions and visualizations could make multi-armed parallelism accessible to novice users. Based on this analysis, we then extract a list of features that a block-based environment designed for multi-armed robot programming should provide. Finally, we present our design vision for a novel programming environment for two-armed robots, show how it provides these features and discuss how it can enable both novices and experienced intermediate users to perform parallelized programming tasks.

关键词： Application programming interfaces programming environments Robot programming parallel programming

来源：评论

学校读者我要写书评

暂无评论

Dataflow Execution of Hierarchically Tiled Arrays 25th

Dataflow Execution of Hierarchically Tiled Arrays

引用

25th International Conference on parallel and Distributed Computing (Euro-Par)

作者： Yang, Chih-Chieh Pichel, Juan C. Padua, David A. IBM Corp Thomas J Watson Res Ctr POB 704 Yorktown Hts NY 10598 USA Univ Santiago de Compostela CiTIUS Santiago De Compostela Spain Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9783030294007;9783030293994

As the parallelism in high-performance supercomputers continues to grow, new programming models become necessary to maintain programmer productivity at today's levels. Dataflow is a promising execution model because it can represent parallelism at different granularity levels and to dynamically adapt for efficient execution. The downside is the low-level programming interface inherent to dataflow. We present a strategy to translate programs written in Hierarchically Tiled Arrays (HTA) to the dataflow API of Open Community Runtime (OCR) system. The goal is to enable program development in a convenient notation and at the same time take advantage of the benefits of a dataflow runtime system. Using HTA produces more comprehensive codes than those written using the dataflow runtime programming interface. Moreover, the experiments show that, for applications with high asynchrony and sparse data dependences, our implementation delivers superior performance than OpenMP using parallel for loops.

关键词： parallel programming Dataflow High-level programming abstraction parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Facilitating the learning process in parallel computing by using instant messaging 19

Facilitating the learning process in parallel computing by u...

引用

7th International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM)

作者： Manuel Guerrero-Higueras, Angel Sanchez-Gonzalez, Lidia Angel Conde, Miguel Rodriguez Lera, Francisco J. Castejon-Limas, Manuel Petkov, Nicolai Univ Leon Dept Mech Comp Sci & Aerosp Engn Campus Vegazana S-N E-24071 Leon Spain Univ Groningen Johann Bernoulli Inst Math & Comp Sci Groningen Netherlands

ISBN: (纸本)9781450371919

parallel programming skills may require long time to acquire. "Think in parallel" is a skill which requires time, effort, and experience. In this work, we propose to facilitate the learning process in parallel programming by using instant messaging by students. Our aim is to find out if students' interaction through instant messaging is beneficial for the learning process. We asked several students of an HPC course of the Master's degree in Computer Science to develop a specific parallel application, each of them using a different application program interface: OpenMP, MPI, CUDA, or OpenCL. Even though the used APIs are different, there are common points in the design process. We proposed to these students to interact with each other by using Gitter, an instant messaging tool for GitHub users. Our analysis of the communications and results demonstrate that the direct interaction of students through the Gitter tool has a positive impact on the learning process.

关键词： High-performance Computing Instant Messaging parallel programming

来源：评论

学校读者我要写书评

暂无评论

PaScal Viewer: A Tool for the Visualization of parallel Scalability Trends 6th

PaScal Viewer: A Tool for the Visualization of Parallel Scal...

引用

5th International Workshop on Visual Performance Analysis (VPA)

作者： da Silva, Anderson B. N. Cunha, Daniel A. M. Silva, Vitor R. G. Furtunato, Alex F. de A. Xavier-de-Souza, Samuel IFPB Pesquisa Inovacao & Posgrad Joao Pessoa Paraiba Brazil Univ Fed Rio Grande do Norte Dept Engn Comp & Automacao Natal RN Brazil IFRN Tecnol Informacao Natal RN Brazil

ISBN: (纸本)9783030178727;9783030178710

Taking advantage of the growing number of cores in super-computers to increase the scalability of parallel programs is an increasing challenge. Many advanced profiling tools have been developed to assist programmers in the process of analyzing data related to the execution of their program. Programmers can act upon the information generated by these data and make their programs reach higher performance levels. However, the information provided by profiling tools is generally designed to optimize the program for a specific execution environment, with a target number of cores and a target problem size. A code optimization driven towards scalability rather than specific performance requires the analysis of many distinct execution environments instead of details about a single environment. With the goal of providing more useful information for the analysis and optimization of code for parallel scalability, this work introduces the PaScal Viewer tool. It presents an novel and productive way to visualize scalability trends of parallel programs. It consists of four diagrams that offers visual support to identify parallel efficiency trends of the whole program, or parts of it, when running on scaling parallel environments with scaling problem sizes.

关键词： parallel programming Efficiency Scalability Performance optimization Visualization tool

来源：评论

学校读者我要写书评

暂无评论

PhD Forum: Towards Embedded Heterogeneous FPGA-GPU Smart Camera Architectures for CNN Inference 13

PhD Forum: Towards Embedded Heterogeneous FPGA-GPU Smart Cam...

引用

13th International Conference on Distributed Smart Cameras (ICDSC)

作者： Carballo-Hernandez, Walther Berry, Francois Pelcat, Maxime Arias-Estrada, Miguel Inst Pascal Dept Images Percept Syst & Robot Aubiere France UMR CNRS Inst Natl Sci Appliquees INSA Rennes IETR Dept Images Rennes France INAOE Dept Comp Sci Puebla Mexico

ISBN: (纸本)9781450371896

The success of Deep Learning (DL) algorithms in computer vision tasks have created an on-going demand of dedicated hardware architectures that could keep up with the their required computation and memory complexities. This task is particularly challenging when embedded smart camera platforms have constrained resources such as power consumption, Processing Element (PE) and communication. This article describes a heterogeneous system embedding an FPGA and a GPU for executing CNN inference for computer vision applications. The built system addresses some challenges of embedded CNN such as task and data partitioning, and workload balancing. The selected heterogeneous platform embeds an Nvidia (R) Jetson TX2 for the CPU-GPU side and an Intel Altera (R) Cyclone10GX for the FPGA side interconnected by PCIe Gen2 with a MIPI-CSI camera for prototyping. This test environment will be used as a support for future work on a methodology for optimized model partitioning.

关键词： Heterogeneous Computing Edge Computing Internet of Things parallel programming Single Instruction Multiple Data Pipelining Models of Computation and Architecture

来源：评论

学校读者我要写书评

暂无评论

Multilevel Checkpoint/Restart for Large Computational Jobs on Distributed Computing Resources 38

Multilevel Checkpoint/Restart for Large Computational Jobs o...

引用

IEEE 38th International Symposium on Reliable Distributed Systems (SRDS)

作者： Gholami, Masoud Schintke, Florian Zuse Inst Berlin Berlin Germany

ISBN: (纸本)9781728142227

New generations of high-performance computing applications depend on an increasing number of components to satisfy their growing demand for computation. On such large systems, the execution of long-running jobs is more likely affected by component failures. Failure classes vary from frequent transient memory faults to rather rare correlated node errors. Multilevel checkpoint/restart has been introduced to proactively cope with failures at different levels. Writing checkpoints on slower stable devices, which survive fatal failures, causes more overhead than writing them on fast devices (main memory or local SSD), which, however, only protect against light faults. Given a graph of the components of a particular storage hierarchy mapping their fault-domains and their expected mean time to failure (MTTF), we optimize the checkpoint frequencies for each level of the storage hierarchy (multilevel checkpointing) to minimize the overhead and runtime of a given job. We reduce the checkpoint/restart overhead of large dataintensive jobs compared to state-of-the-art solutions on multilevel checkpointing by up to 10 percent in the investigated cases. The improvement increases further with growing checkpoint sizes.

关键词： Checkpoint Checkpoint/restart Checkpointing Distributed computing Exascale Fault tolerance High performance computing Hpc Large jobs Mpi Mttf Multilevel checkpoint restart Multilevel checkpoint/restart parallel programming Supercomputer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：