检索结果-内蒙古大学图书馆

Adaptive tiling for parallel N-body simulations on many core

ASTRONOMY AND COMPUTING 2021年 36卷 100466-100466页

作者： Khan, M. A. Al-Mouhamed, M. A. Mohammad, N. Prince Mohammad Bin Fahd Univ Coll Comp Engn & Sci Khobar Saudi Arabia King Fahd Univ Petr & Minerals Comp Engn Khobar Saudi Arabia Prince Mohammad Bin Fahd Univ Cybersecur Ctr Khobar Saudi Arabia

The N-body simulations consist of computing mutual gravitational forces exerted on each body in O(N). The Barnes-Hut approximation allows processing a group of bodies in O(1) if they are far enough from a given body, which drops the complexity of the whole simulation to O(NLogN). The octree is used to ease the pruning process but at the cost of some irregularity in the access pattern. In a parallel N-body implementation the bodies are partitioned among threads that are executed on multiple cores. The depth-first traversal of the octree is used for processing each body, which causes repeated cache misses during traversal. This paper proposes different types of tiling methods to improve the performance of N-body simulations. It presents an experimental analysis of octree traversal by using these tiling methods to identify the potential of cache data reuse. It then evaluates these tiling methods for varying tile sizes with different galaxy sizes and a varying number of threads on several machine architectures. The efficiency of tiling approaches depends on the chosen tile size. It is shown that a speedup of 8 times can be achieved by choosing the appropriate tile size on a 60-core Intel accelerator. In order to determine appropriate tile size, the paper proposes an adaptive tiling approach to implicitly adapt the tile size to the distribution of threads, the cache capacity, cache latency, problem size and dynamic changes in the access pattern over the iterations. The proposed adaptive tiling approach can be used as an optimization option in parallel compilers. (C) 2021 Elsevier B.V. All rights reserved.

关键词： N-body simulations Tiling Cache optimization Many Integrated Core (MIC) parallel programming

来源：评论

学校读者我要写书评

暂无评论

Informed scenario-based RRT* for aircraft trajectory planning under ensemble forecasting of thunderstorms

引用

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2021年 129卷 103232-103232页

作者： Andre, Eduardo Gonzalez-Arribas, Daniel Soler, Manuel Kamgarpour, Maryam Sanjurjo-Rivo, Manuel Univ Carlos III Dept Bioengn & Aerosp Engn Madrid Spain Univ British Columbia Elect & Comp Engn Vancouver BC Canada

Thunderstorms represent a major hazard for flights, as they compromise the safety of both the airframe and the passengers. To address trajectory planning under thunderstorms, three variants of the scenario-based rapidly exploring random trees (SB-RRTs) are proposed. During an iterative process, the so-called SB-RRT, the SB-RRT* and the Informed SB-RRT* find safe trajectories by meeting a user-defined safety threshold. Additionally, the last two techniques converge to solutions of minimum flight length. Through parallelization on graphical processing units the required computational times are reduced substantially to become compatible with near real-time operation. The proposed methods are tested considering a kinematic model of an aircraft flying between two waypoints at constant flight level and airspeed;the test scenario is based on a realistic weather forecast and assumed to be described by an ensemble of equally likely members. Lastly, the influence of the number of scenarios, safety margin and iterations on the results is analyzed. Results show that the SB-RRTs are able to find safe and, in two of the algorithms, close to-optimum solutions.

关键词： Aircraft path planning Sampling-based algorithms Uncertain thunderstorm avoidance parallel programming

来源：评论

学校读者我要写书评

暂无评论

Comparison of massively parallel algorithms on graphics processing unit for MIMO radar

引用

e-Prime - Advances in Electrical Engineering, Electronics and Energy 2022年 2卷

作者： Pitre, Eric Roberge, Vincent Bray, Joey Hefnawi, Mostafa Royal Military College of Canada Department of Electrical and Computer Engineering Canada

This paper proposes a method for accelerating an enhanced resolution 3D Multiple Input Multiple Output (MIMO) radar on a Graphics Processing Unit (GPU). Due to the size of the data required for range, bearing, and doppler processing, computations for the MIMO radar are extensive and seldom permit real-time operation without performance compromises. Current methods for achieving reasonable frame rates include reducing the scope of the radar (i.e., limiting the number of dimensions, the field of view, or the ranges of interest), choosing efficient but coarse algorithms (i.e., the FFT for range, velocity, and bearing estimation), or offloading the computations on task specific hardware, DSP, or FPGA. The proposed framework enables real-time operation of the MIMO radar by performing the signal processing on a GPU without compromising the radar coverage, while replacing the widely used 3D FFT with an enhanced resolution alternative. This paper compares the execution times of the various algorithms when performed on a Central Processing Unit (CPU), and when performed on the GPU. © 2022

关键词： Chirp Z transform Graphics processing unit MIMO radar parallel programming

来源：评论

学校读者我要写书评

暂无评论

Kcollections: A Fast and Efficient Library for K-mers 34

Kcollections: A Fast and Efficient Library for K-mers

引用

34th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Fujimoto, M. Stanley Lyman, Cole A. Clement, Mark J. Brigham Young Univ Dept Comp Sci Provo UT 84602 USA

ISBN: (纸本)9781728174457

K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as k increases. Many algorithms exist for compressed storage of k-mers but suffer from slow insert times or are probabilistic resulting in false-positive k-mers. Furthermore, k-mer libraries usually specialize in associating specific values with k-mers such as a color in colored de Bruijn Graphs or k-mer count. We present kcollections1, a compressed and parallel data structure designed for k-mers generated from whole, assembled genomes. Kcollections is available for C++ and provides set- and map-like structures as well as a k-mer counting data structure all of which utilize parallel operations designed using a MapReduce paradigm. Additionally, we provide basic Python bindings for rapid prototyping. Kcollections makes developing bioinformatic algorithms simpler by abstracting away the tedious task of storing k-mers.

关键词： data structure genomics k-mer parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Stream Processing with MPI for Video Analytics and Data Visualization 19th

Parallel Stream Processing with MPI for Video Analytics and ...

引用

19th Symposium on High-Performance Computing Systems (WSCAD)

作者： Vogel, Adriano Rista, Cassiano Justo, Gabriel Ewald, Endrius Griebler, Dalvan Mencagli, Gabriele Fernandes, Luiz Gustavo Pontificia Univ Catolica Rio Grande do Sul Sch Technol Porto Alegre RS Brazil Univ Pisa Dept Comp Sci Pisa Italy Tres de Maio Fac SETREM Lab Adv Res Cloud Comp LARCC Tres De Maio Brazil

ISBN: (纸本)9783030410506;9783030410490

The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectures such as clusters. This paper aims to use it to accelerate video analytics and data visualization applications so that insight can be obtained as soon as the data arrives. Experiments were conducted with a Domain-Specific Language for Geospatial Data Visualization and a Person Recognizer video application. We applied the same stream parallelism strategy and two task distribution strategies. The dynamic task distribution achieved better performance than the static distribution in the HPC cluster. The data visualization achieved lower throughput with respect to the video analytics due to the I/O intensive operations. Also, the MPI programming model shows promising performance outcomes for stream processing applications.

关键词： parallel programming Stream parallelism Distributed processing Cluster

来源：评论

学校读者我要写书评

暂无评论

Towards Stability in the Chapel Language 34

Towards Stability in the Chapel Language

引用

34th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Ferguson, Michael P. Hewlett Packard Enterprise Palo Alto CA 94304 USA

Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will d... 详细信息

ISBN: (纸本)9781728174457

关键词： parallel programming programming languages

来源：评论

学校读者我要写书评

暂无评论

A Fast and Concise parallel Implementation of the 8x8 2D IDCT using Halide 32

A Fast and Concise Parallel Implementation of the 8x8 2D IDC...

引用

32nd IEEE International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD) / 11th Workshop on Applications for Multi-Core Architectures (WAMCA)

作者： Johnson, Martin Playne, Daniel Massey Univ Sch Nat & Computat Sci Auckland New Zealand

ISBN: (纸本)9781728199245

The Inverse Discrete Cosine Transform (IDCT) is commonly used for image and video decoding. Due to the ubiquitous nature of this application area, very efficient implementations of the IDCT transform are of great importance and have lead to the development of highly optimized libraries. The popular libjpeg-turbo library contains 1000s of lines of handwritten assembly code utilizing SIMD instruction sets for a variety of architectures. We present an alternative approach, implementing the 8x8 2D IDCT written in the image processing language Halide - a high-level, functional language that allows for concise, portable, parallel and very efficient code. We show how less than 100 lines of Halide can replace over 1000 lines of code for each architecture in the libjpeg-turbo library to perform JPEG decoding. The Halide implementation is compared for ARMv8 and x86-64 SIMD extensions and shows a 5-25 percent performance improvement over the SIMD code in libjpeg-turbo while also being much easier to maintain and port to new architectures.

关键词： Halide Inverse Discrete Cosine Transform JPEG decoding parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimizing the Multimode Brownian Oscillator Model for the Optical Response of Carotenoids in Solution by Fine Tuning of Differential Evolution

引用

LOBACHEVSKII JOURNAL OF MATHEMATICS 2020年第8期41卷 1545-1553页

作者： Pishchalnikov, R. Y. Bondarenko, A. A. Ashikhmin, A. A. Russian Acad Sci Prokhorov Gen Phys Inst Moscow 119991 Russia Russian Acad Sci Keldysh Inst Appl Math Moscow 125047 Russia Russian Acad Sci Inst Basic Biol Problems Pushchino Sci Ctr Biol Res Pushchino 142290 Russia

During last twenty years, the Differential evolution algorithm (DE) has proved to be one of the powerful methods to solve minimization problems for multidimensional functions. Being a member of the family of evolutionary optimization algorithms, its main principle is based upon the concepts of natural selection and mutation. In this study, we test the potential of DE to find a proper set of parameters for the multimode Brownian oscillator model, which was then used to simulate absorption lineshapes of carotenoid molecules in solution: spheroidene and spheroidenone. This theory assumes that the correlation function of a particular electronic state of the carotenoid is calculated using the semiclassical spectral density function. Considering our previous studies on photosynthetic pigments, we employed several DE strategies to do fitting of the carotenoid experimental spectra. We found that simulated absorption spectra are very sensitive to several parameters that characterize carotenoid vibronic modes, namely, Huang-Rhys factors. Fine tuning of DE crossover parameter (Cr) and the scaling factor (F) provided acceptable convergence of the algorithm. It appears that to get good convergence of DE, a certain spectral range of carotenoid absorption from 400 to 600 nm must be chosen. This fact can be explained by the limitations of the applied theory, which simply does not predict properly the carotenoid absorption at higher frequencies.

关键词： differential evolution parallel programming carotenoids absorption spectrum cumulant expansion multimode Brownian oscillator model

来源：评论

学校读者我要写书评

暂无评论

Special design aspects of a biomechanical program Two completely different levels of program design 15

Special design aspects of a biomechanical program Two comple...

引用

IEEE 15th International Conference of System of Systems Engineering (SoSE)

作者： Fekete, Gyorgy Molnar, Andras Obuda Univ Doctoral Sch Appl Informat & Appl Math Budapest Hungary Obuda Univ Inst Cyber Phys Syst John von Neumann Fac Informat Budapest Hungary

ISBN: (纸本)9781728180502

In this paper, we go around two completely different levels of program design of a biomechanical program. First, the broadest level is the data level, where we show that we can use the whole world's data. This is covered by the System of Systems engineering. The second and most particular level is the algorithm level. Our goal is to achieve the fastest program run we can. For this, we overview the possibilities and show an example of how a parallel paradigm accelerates our program.

关键词： biomechanical program design system of systems engineering parallel programming GPU multi-thread

来源：评论

学校读者我要写书评

暂无评论

Towards Profile-Guided Optimization for Safe and Efficient parallel Stream Processing in Rust 32

Towards Profile-Guided Optimization for Safe and Efficient P...

引用

32nd IEEE International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD) / 11th Workshop on Applications for Multi-Core Architectures (WAMCA)

作者： Sydow, Stefan Nabelsee, Mohannad Glesner, Sabine Herber, Paula Tech Univ Berlin Berlin Germany Univ Munster Munster Germany

ISBN: (纸本)9781728199245

The efficient mapping of stream processing applications to parallel hardware architectures is a difficult problem. While parallelization is often highly desirable as it reduces the overall execution time, its advantages must be carefully weighed against the parallelization overhead of complexity and communication costs. This paper presents a novel profile-guided optimization for parallel stream processing based on the multi-paradigm system programming language Rust. Our approach's key idea is to systematically balance the performance gain that can be achieved from parallelization with the communication overhead. To achieve this, we 1) use profiling to gain tight estimates of task execution times, 2) evaluate the cost of the fundamental concurrency constructs in Rust with synthetic benchmarks, and exploit this information to estimate the communication overhead introduced by various degrees of parallelism, and 3) present a novel optimization algorithm that exploits both estimates to finetune the degree of parallelism and train processing in a given application. Overall, our approach enables us to map parallel stream processing applications to parallel hardware efficiently. The safety concepts anchored in Rust ensure the reliability of the resulting implementation. We demonstrate our approach's practical applicability with two case studies: the word count problem and aircraft telemetry decoding.

关键词： Stream Processing parallel programming Rust Performance Modelling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：