检索结果-内蒙古大学图书馆

34th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Ferguson, Michael P. Hewlett Packard Enterprise Palo Alto CA 94304 USA

Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will d... 详细信息

ISBN: (纸本)9781728174457

关键词： parallel programming programming languages

来源：评论

学校读者我要写书评

暂无评论

A Fast and Concise parallel Implementation of the 8x8 2D IDCT using Halide 32

A Fast and Concise Parallel Implementation of the 8x8 2D IDC...

引用

32nd IEEE International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD) / 11th Workshop on Applications for Multi-Core Architectures (WAMCA)

作者： Johnson, Martin Playne, Daniel Massey Univ Sch Nat & Computat Sci Auckland New Zealand

ISBN: (纸本)9781728199245

The Inverse Discrete Cosine Transform (IDCT) is commonly used for image and video decoding. Due to the ubiquitous nature of this application area, very efficient implementations of the IDCT transform are of great importance and have lead to the development of highly optimized libraries. The popular libjpeg-turbo library contains 1000s of lines of handwritten assembly code utilizing SIMD instruction sets for a variety of architectures. We present an alternative approach, implementing the 8x8 2D IDCT written in the image processing language Halide - a high-level, functional language that allows for concise, portable, parallel and very efficient code. We show how less than 100 lines of Halide can replace over 1000 lines of code for each architecture in the libjpeg-turbo library to perform JPEG decoding. The Halide implementation is compared for ARMv8 and x86-64 SIMD extensions and shows a 5-25 percent performance improvement over the SIMD code in libjpeg-turbo while also being much easier to maintain and port to new architectures.

关键词： Halide Inverse Discrete Cosine Transform JPEG decoding parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimizing the Multimode Brownian Oscillator Model for the Optical Response of Carotenoids in Solution by Fine Tuning of Differential Evolution

引用

LOBACHEVSKII JOURNAL OF MATHEMATICS 2020年第8期41卷 1545-1553页

作者： Pishchalnikov, R. Y. Bondarenko, A. A. Ashikhmin, A. A. Russian Acad Sci Prokhorov Gen Phys Inst Moscow 119991 Russia Russian Acad Sci Keldysh Inst Appl Math Moscow 125047 Russia Russian Acad Sci Inst Basic Biol Problems Pushchino Sci Ctr Biol Res Pushchino 142290 Russia

During last twenty years, the Differential evolution algorithm (DE) has proved to be one of the powerful methods to solve minimization problems for multidimensional functions. Being a member of the family of evolutionary optimization algorithms, its main principle is based upon the concepts of natural selection and mutation. In this study, we test the potential of DE to find a proper set of parameters for the multimode Brownian oscillator model, which was then used to simulate absorption lineshapes of carotenoid molecules in solution: spheroidene and spheroidenone. This theory assumes that the correlation function of a particular electronic state of the carotenoid is calculated using the semiclassical spectral density function. Considering our previous studies on photosynthetic pigments, we employed several DE strategies to do fitting of the carotenoid experimental spectra. We found that simulated absorption spectra are very sensitive to several parameters that characterize carotenoid vibronic modes, namely, Huang-Rhys factors. Fine tuning of DE crossover parameter (Cr) and the scaling factor (F) provided acceptable convergence of the algorithm. It appears that to get good convergence of DE, a certain spectral range of carotenoid absorption from 400 to 600 nm must be chosen. This fact can be explained by the limitations of the applied theory, which simply does not predict properly the carotenoid absorption at higher frequencies.

关键词： differential evolution parallel programming carotenoids absorption spectrum cumulant expansion multimode Brownian oscillator model

来源：评论

学校读者我要写书评

暂无评论

The future of aliasing in parallel programming

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2013年 7850卷 501-502页

作者： Bocchino Jr., Robert L. Carnegie Mellon University United States

ISBN: (纸本)9783642369452

In recent years, the research community has made great strides in alias annotations that support parallel programming [1]. Using these techniques, programmers no longer have to guess where aliased mutable state may cause unintended data races or nondeterminism;instead, such problems can simply be eliminated, either at compile time or at runtime. This represents a major advance in the safety and reliability of parallel code. © Springer-Verlag Berlin Heidelberg 2013.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Special design aspects of a biomechanical program Two completely different levels of program design 15

Special design aspects of a biomechanical program Two comple...

引用

IEEE 15th International Conference of System of Systems Engineering (SoSE)

作者： Fekete, Gyorgy Molnar, Andras Obuda Univ Doctoral Sch Appl Informat & Appl Math Budapest Hungary Obuda Univ Inst Cyber Phys Syst John von Neumann Fac Informat Budapest Hungary

ISBN: (纸本)9781728180502

In this paper, we go around two completely different levels of program design of a biomechanical program. First, the broadest level is the data level, where we show that we can use the whole world's data. This is covered by the System of Systems engineering. The second and most particular level is the algorithm level. Our goal is to achieve the fastest program run we can. For this, we overview the possibilities and show an example of how a parallel paradigm accelerates our program.

关键词： biomechanical program design system of systems engineering parallel programming GPU multi-thread

来源：评论

学校读者我要写书评

暂无评论

Towards Profile-Guided Optimization for Safe and Efficient parallel Stream Processing in Rust 32

Towards Profile-Guided Optimization for Safe and Efficient P...

引用

32nd IEEE International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD) / 11th Workshop on Applications for Multi-Core Architectures (WAMCA)

作者： Sydow, Stefan Nabelsee, Mohannad Glesner, Sabine Herber, Paula Tech Univ Berlin Berlin Germany Univ Munster Munster Germany

ISBN: (纸本)9781728199245

The efficient mapping of stream processing applications to parallel hardware architectures is a difficult problem. While parallelization is often highly desirable as it reduces the overall execution time, its advantages must be carefully weighed against the parallelization overhead of complexity and communication costs. This paper presents a novel profile-guided optimization for parallel stream processing based on the multi-paradigm system programming language Rust. Our approach's key idea is to systematically balance the performance gain that can be achieved from parallelization with the communication overhead. To achieve this, we 1) use profiling to gain tight estimates of task execution times, 2) evaluate the cost of the fundamental concurrency constructs in Rust with synthetic benchmarks, and exploit this information to estimate the communication overhead introduced by various degrees of parallelism, and 3) present a novel optimization algorithm that exploits both estimates to finetune the degree of parallelism and train processing in a given application. Overall, our approach enables us to map parallel stream processing applications to parallel hardware efficiently. The safety concepts anchored in Rust ensure the reliability of the resulting implementation. We demonstrate our approach's practical applicability with two case studies: the word count problem and aircraft telemetry decoding.

关键词： Stream Processing parallel programming Rust Performance Modelling

来源：评论

学校读者我要写书评

暂无评论

Chapel on Accelerators 34

Chapel on Accelerators

引用

34th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Ghangas, Rahul Milthorpe, Josh Australian Natl Univ Res Sch Comp Sci Canberra ACT Australia

ISBN: (纸本)9781728174457

Chapel's high level data-parallel constructs make parallel programming productive for general programmers. This talk introduces the 'Chapel on Accelerators' project, which proposes compiler enhancements to extend data-parallel constructs to hardware accelerators including GPUs. Previous attempts to extend Chapel to GPUs [1]-[3] have not been successfully integrated, and any such extension needs to maintain portability and consistency with the Chapel design philosophy and implementation. © 2020 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Peachy parallel Assignments (EduHPC 2020)

Peachy Parallel Assignments (EduHPC 2020)

引用

Workshop on Education for High Performance Computing (EduHPC)

作者： Casanova, Henri da Silva, Rafael Ferreira Gonzalez-Escribano, Arturo Koch, William Torres, Yuri Bunde, David P. Univ Hawaii Honolulu HI 96822 USA Univ Southern Calif Marina Del Rey CA USA Univ Valladolid Valladolid Spain Knox Coll Galesburg IL USA

ISBN: (纸本)9780738143057

Peachy parallel Assignments are high-quality assignments for teaching parallel and distributed computing. They are selected competitively for presentation at the Edu* workshops. All of the assignments have been successfully used in class and they are selected based on the their ease of adoption by other instructors and for being cool and inspirational to students. This paper presents a paper-and-pencil assignment asking students to analyze the performance of different system configurations and an assignment in which students parallelize a simulation of the evolution of simple living organisms.

关键词： Peachy parallel Assignments parallel computing education High-Performance Computing education parallel programming Curriculum Development Performance analysis parallel simulation OpenMP MPI GPGPU

来源：评论

学校读者我要写书评

暂无评论

Fancier: A Unified Framework for Java, C, and OpenCL Integration

引用

IEEE ACCESS 2021年 9卷 164570-164588页

作者： Afonso, Sergio Almeida, Francisco Univ La Laguna Dept Comp Engn & Syst San Cristobal De La Lagu 38200 Spain

Graphics Processing Units (GPUs) have evolved from very specialized designs geared towards computer graphics to accommodate general-purpose highly-parallel workloads. Harnessing the performance that these accelerators provide requires the use of specialized native programming interfaces, such as CUDA or OpenCL, or higher-level programming models like OpenMP or OpenACC. However, on managed programming languages, offloading execution into GPUs is much harder and error-prone, mainly due to the need to call through a native API (Application programming Interface), and because of mismatches between value and reference semantics. The Fancier framework provides a unified interface to Java, C/C++, and OpenCL C compute kernels, together with facilities to smooth the transitions between these programming languages. This combination of features makes GPU acceleration on Java much more approachable. In addition, Fancier Java code can be directly translated into equivalent C/C++ or OpenCL C code easily, which simplifies the implementation of higher-level abstractions targeting GPU or parallel execution on Java. Furthermore, it reduces the programming effort without adding significant overhead on top of the necessary OpenCL and Java Native Interface (JNI) API calls. We validate our approach on several image processing workloads running on different Android devices.

关键词： Java Codes programming Standards Runtime Libraries parallel programming Application programming interfaces hardware acceleration heterogeneous systems image processing mobile computing parallel programming performance analysis

来源：评论

学校读者我要写书评

暂无评论

Enabling System Wide Shared Memory for Performance Improvement in PyCOMPSs Applications 9

Enabling System Wide Shared Memory for Performance Improveme...

引用

9th Workshop on Python for High-Performance and Scientific Computing (PYHPC)

作者： Foyer, Clement Conejero, Javier Ejarque, Jorge Badia, Rosa M. Tate, Adrian McIntosh-Smith, Simon HPE HPC AI EMEA Res Lab Bristol Avon England Barcelona Supercomp Ctr Barcelona Spain Numer Algorithms Grp Ltd NAG Oxford England Univ Bristol Dept Comp Sci High Performance Comp Res Grp Bristol Avon England

ISBN: (纸本)9780738110868

Python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To address this, multiple strategies, sometimes complementary, have been developed to enrich the software ecosystem either by relying on additional libraries dedicated to efficient computation (e.g., NumPy) or by providing a framework to better use HPC scale infrastructures (e.g., PyCOMPSs). In this paper, we present a Python extension based on SharedArray that enables the support of system-provided shared memory and its integration into the PyCOMPSs programming model as an example of integration to a complex Python environment. We also evaluate the impact such a tool may have on performance in two types of distributed execution-flows, one for linear algebra with a blocked matrix multiplication application and the other in the context of data-clustering with a k-means application. We show that with very little modification of the original decorator (3 lines of code to be modified) of the task-based application the gain in performance can rise above 40% for tasks relying heavily on data reuse on a distributed environment, especially when loading the data is prominent in the execution time.

关键词： Memory Shared Memory Task Python parallel programming Distributed Memory NumPy Data Management

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：