The N-body simulations consist of computing mutual gravitational forces exerted on each body in O(N). The Barnes-Hut approximation allows processing a group of bodies in O(1) if they are far enough from a given body, ...
详细信息
The N-body simulations consist of computing mutual gravitational forces exerted on each body in O(N). The Barnes-Hut approximation allows processing a group of bodies in O(1) if they are far enough from a given body, which drops the complexity of the whole simulation to O(NLogN). The octree is used to ease the pruning process but at the cost of some irregularity in the access pattern. In a parallel N-body implementation the bodies are partitioned among threads that are executed on multiple cores. The depth-first traversal of the octree is used for processing each body, which causes repeated cache misses during traversal. This paper proposes different types of tiling methods to improve the performance of N-body simulations. It presents an experimental analysis of octree traversal by using these tiling methods to identify the potential of cache data reuse. It then evaluates these tiling methods for varying tile sizes with different galaxy sizes and a varying number of threads on several machine architectures. The efficiency of tiling approaches depends on the chosen tile size. It is shown that a speedup of 8 times can be achieved by choosing the appropriate tile size on a 60-core Intel accelerator. In order to determine appropriate tile size, the paper proposes an adaptive tiling approach to implicitly adapt the tile size to the distribution of threads, the cache capacity, cache latency, problem size and dynamic changes in the access pattern over the iterations. The proposed adaptive tiling approach can be used as an optimization option in parallel compilers. (C) 2021 Elsevier B.V. All rights reserved.
Thunderstorms represent a major hazard for flights, as they compromise the safety of both the airframe and the passengers. To address trajectory planning under thunderstorms, three variants of the scenario-based rapid...
详细信息
Thunderstorms represent a major hazard for flights, as they compromise the safety of both the airframe and the passengers. To address trajectory planning under thunderstorms, three variants of the scenario-based rapidly exploring random trees (SB-RRTs) are proposed. During an iterative process, the so-called SB-RRT, the SB-RRT* and the Informed SB-RRT* find safe trajectories by meeting a user-defined safety threshold. Additionally, the last two techniques converge to solutions of minimum flight length. Through parallelization on graphical processing units the required computational times are reduced substantially to become compatible with near real-time operation. The proposed methods are tested considering a kinematic model of an aircraft flying between two waypoints at constant flight level and airspeed;the test scenario is based on a realistic weather forecast and assumed to be described by an ensemble of equally likely members. Lastly, the influence of the number of scenarios, safety margin and iterations on the results is analyzed. Results show that the SB-RRTs are able to find safe and, in two of the algorithms, close to-optimum solutions.
This paper proposes a method for accelerating an enhanced resolution 3D Multiple Input Multiple Output (MIMO) radar on a Graphics Processing Unit (GPU). Due to the size of the data required for range, bearing, and dop...
详细信息
K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as k increases. Many algorithms exist for compr...
详细信息
ISBN:
(纸本)9781728174457
K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as k increases. Many algorithms exist for compressed storage of k-mers but suffer from slow insert times or are probabilistic resulting in false-positive k-mers. Furthermore, k-mer libraries usually specialize in associating specific values with k-mers such as a color in colored de Bruijn Graphs or k-mer count. We present kcollections1, a compressed and parallel data structure designed for k-mers generated from whole, assembled genomes. Kcollections is available for C++ and provides set- and map-like structures as well as a k-mer counting data structure all of which utilize parallel operations designed using a MapReduce paradigm. Additionally, we provide basic Python bindings for rapid prototyping. Kcollections makes developing bioinformatic algorithms simpler by abstracting away the tedious task of storing k-mers.
The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. parallel stream processing can be implemented for handling high frequency and...
详细信息
ISBN:
(纸本)9783030410506;9783030410490
The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectures such as clusters. This paper aims to use it to accelerate video analytics and data visualization applications so that insight can be obtained as soon as the data arrives. Experiments were conducted with a Domain-Specific Language for Geospatial Data Visualization and a Person Recognizer video application. We applied the same stream parallelism strategy and two task distribution strategies. The dynamic task distribution achieved better performance than the static distribution in the HPC cluster. The data visualization achieved lower throughput with respect to the video analytics due to the I/O intensive operations. Also, the MPI programming model shows promising performance outcomes for stream processing applications.
Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will d...
详细信息
ISBN:
(纸本)9781728174457
Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will discuss recent efforts to complete the big changes to the Chapel language so that the language can stabilize.
The Inverse Discrete Cosine Transform (IDCT) is commonly used for image and video decoding. Due to the ubiquitous nature of this application area, very efficient implementations of the IDCT transform are of great impo...
详细信息
ISBN:
(纸本)9781728199245
The Inverse Discrete Cosine Transform (IDCT) is commonly used for image and video decoding. Due to the ubiquitous nature of this application area, very efficient implementations of the IDCT transform are of great importance and have lead to the development of highly optimized libraries. The popular libjpeg-turbo library contains 1000s of lines of handwritten assembly code utilizing SIMD instruction sets for a variety of architectures. We present an alternative approach, implementing the 8x8 2D IDCT written in the image processing language Halide - a high-level, functional language that allows for concise, portable, parallel and very efficient code. We show how less than 100 lines of Halide can replace over 1000 lines of code for each architecture in the libjpeg-turbo library to perform JPEG decoding. The Halide implementation is compared for ARMv8 and x86-64 SIMD extensions and shows a 5-25 percent performance improvement over the SIMD code in libjpeg-turbo while also being much easier to maintain and port to new architectures.
During last twenty years, the Differential evolution algorithm (DE) has proved to be one of the powerful methods to solve minimization problems for multidimensional functions. Being a member of the family of evolution...
详细信息
During last twenty years, the Differential evolution algorithm (DE) has proved to be one of the powerful methods to solve minimization problems for multidimensional functions. Being a member of the family of evolutionary optimization algorithms, its main principle is based upon the concepts of natural selection and mutation. In this study, we test the potential of DE to find a proper set of parameters for the multimode Brownian oscillator model, which was then used to simulate absorption lineshapes of carotenoid molecules in solution: spheroidene and spheroidenone. This theory assumes that the correlation function of a particular electronic state of the carotenoid is calculated using the semiclassical spectral density function. Considering our previous studies on photosynthetic pigments, we employed several DE strategies to do fitting of the carotenoid experimental spectra. We found that simulated absorption spectra are very sensitive to several parameters that characterize carotenoid vibronic modes, namely, Huang-Rhys factors. Fine tuning of DE crossover parameter (Cr) and the scaling factor (F) provided acceptable convergence of the algorithm. It appears that to get good convergence of DE, a certain spectral range of carotenoid absorption from 400 to 600 nm must be chosen. This fact can be explained by the limitations of the applied theory, which simply does not predict properly the carotenoid absorption at higher frequencies.
In this paper, we go around two completely different levels of program design of a biomechanical program. First, the broadest level is the data level, where we show that we can use the whole world's data. This is ...
详细信息
ISBN:
(纸本)9781728180502
In this paper, we go around two completely different levels of program design of a biomechanical program. First, the broadest level is the data level, where we show that we can use the whole world's data. This is covered by the System of Systems engineering. The second and most particular level is the algorithm level. Our goal is to achieve the fastest program run we can. For this, we overview the possibilities and show an example of how a parallel paradigm accelerates our program.
The efficient mapping of stream processing applications to parallel hardware architectures is a difficult problem. While parallelization is often highly desirable as it reduces the overall execution time, its advantag...
详细信息
ISBN:
(纸本)9781728199245
The efficient mapping of stream processing applications to parallel hardware architectures is a difficult problem. While parallelization is often highly desirable as it reduces the overall execution time, its advantages must be carefully weighed against the parallelization overhead of complexity and communication costs. This paper presents a novel profile-guided optimization for parallel stream processing based on the multi-paradigm system programming language Rust. Our approach's key idea is to systematically balance the performance gain that can be achieved from parallelization with the communication overhead. To achieve this, we 1) use profiling to gain tight estimates of task execution times, 2) evaluate the cost of the fundamental concurrency constructs in Rust with synthetic benchmarks, and exploit this information to estimate the communication overhead introduced by various degrees of parallelism, and 3) present a novel optimization algorithm that exploits both estimates to finetune the degree of parallelism and train processing in a given application. Overall, our approach enables us to map parallel stream processing applications to parallel hardware efficiently. The safety concepts anchored in Rust ensure the reliability of the resulting implementation. We demonstrate our approach's practical applicability with two case studies: the word count problem and aircraft telemetry decoding.
暂无评论