Collision detection of a large number N of particles can be challenging. Directly testing N particles for collisions among each other leads to N-2 queries. Especially in scenarios, where fast, densely packed particles...
详细信息
Collision detection of a large number N of particles can be challenging. Directly testing N particles for collisions among each other leads to N-2 queries. Especially in scenarios, where fast, densely packed particles interact, challenges arise for classical methods like Particle-in-Cell or Monte-Carlo. Modern collision detection methods utilising bounding volume hierarchies are suitable to overcome these challenges and allow a detailed analysis of the interaction of large number of particles. This approach is applied to the analysis of the collision of two photon beams leading to the creation of electron-positron pairs. (c) 2017 Elsevier Inc. All rights reserved.
The tree method is a widely implemented algorithm for collisionless N-body simulations in astrophysics well suited for GPU(s). Adopting hierarchical time stepping can accelerate N-body simulations;however, it is infre...
详细信息
The tree method is a widely implemented algorithm for collisionless N-body simulations in astrophysics well suited for GPU(s). Adopting hierarchical time stepping can accelerate N-body simulations;however, it is infrequently implemented and its potential remains untested in GPU implementations. We have developed a Gravitational Oct-tree code accelerated by Hierarchical time step Controlling named GOTHIC, which adopts both the tree method and the hierarchical time step. The code adopts some adaptive optimizations by monitoring the execution time of each function on-the-fly and minimizes the time-to solution by balancing the measured time of multiple functions. Results of performance measurements with realistic particle distribution performed on NVIDIA Tesla M2090, K20X, and GeForce GTX TITAN X, which are representative GPUs of the Fermi, Kepler, and Maxwell generation of GPUs, show that the hierarchical time step achieves a speedup by a factor of around 3-5 times compared to the shared time step. The measured elapsed time per step of GOTHIC is 0.30 s or 0.44 s on GTX TITAN X when the particle distribution represents the Andromeda galaxy or the NFW sphere, respectively, with 2(24) = 16,777,216 particles. The averaged performance of the code corresponds to 10-30% of the theoretical single precision peak performance of the GPU. (C) 2016 The Authors. Published by Elsevier B.V.
Motivated by a concept studied in [1], we consider a property of matrices over finite fields that generalizes triangular totally nonsingular matrices to block matrices. We show that (1) matrices with this property suf...
详细信息
Motivated by a concept studied in [1], we consider a property of matrices over finite fields that generalizes triangular totally nonsingular matrices to block matrices. We show that (1) matrices with this property suffice to construct asymptotically good tree codes and (2) a random block-triangular matrix over a field of quadratic size satisfies this property. We will also show that a generalization of this randomized construction yields codes over quadratic size fields for which the sum of the rate and minimum relative distance gets arbitrarily close to 1. (C) 2021 Elsevier B.V. All rights reserved.
The Matern family of functions is a widely used covariance kernel in spatial statistics for Gaussian process modeling, which in many instances requires calculations with a covariance matrix. In this paper, we design a...
详细信息
The Matern family of functions is a widely used covariance kernel in spatial statistics for Gaussian process modeling, which in many instances requires calculations with a covariance matrix. In this paper, we design a fast summation algorithm for the Matern kernel in order to efficiently perform matrix-vector multiplications. This algorithm is based on the Barnes-Hut tree code framework and addresses several practical issues: the anisotropy of the kernel, the nonuniform distribution of the point set, and a tight error estimate of the approximation. Even though the algorithmic details differ from the standard tree code in several aspects, empirically the computational cost of our algorithm scales as O(n log n) for n points. Comprehensive numerical experiments are shown to demonstrate the practicality of the design.
In this paper, the massive multiple-input multiple-output un-sourced random access (URA) is studied, and an index modulation (IM)-aided transmission scheme is proposed. The proposed URA-IM scheme is built on the conve...
详细信息
ISBN:
(纸本)9781538682098
In this paper, the massive multiple-input multiple-output un-sourced random access (URA) is studied, and an index modulation (IM)-aided transmission scheme is proposed. The proposed URA-IM scheme is built on the conventional URA consisting of common inner compressed sensing (CS) code in all channel blocks and outer tree code across blocks. Different from conventional URA, URA-IM firstly partitions channel blocks of one transmission frame into multiple groups, and then employs the IM principle to activate only part of the channel blocks in each group. By URA-IM, the average number of active CS codewords in each channel block can be decreased, which will improve the performance of inner CS detector. Moreover, a modified tree decoder is proposed to take the IM signal property into account. Computer simulations show that the proposed URA-IM has better performance than conventional URA in terms of permitting shorter block length and servicing more active users in the scenario of 36-bit information transmission per user.
We reduce the problem of constructing asymptotically good tree codes to the construction of triangular totally nonsingular matrices over fields with polynomially many elements. We show a connection of this problem to ...
详细信息
We reduce the problem of constructing asymptotically good tree codes to the construction of triangular totally nonsingular matrices over fields with polynomially many elements. We show a connection of this problem to Birkhoff interpolation in finite fields. (C) 2015 Elsevier Inc. All rights reserved.
The highly scalable parallel tree code PEPC for rapid computation of long-range (1/r) Coulomb forces is presented. It can be used as a library for applications involving electrostatics or Newtonian gravity in 3D. The ...
详细信息
The highly scalable parallel tree code PEPC for rapid computation of long-range (1/r) Coulomb forces is presented. It can be used as a library for applications involving electrostatics or Newtonian gravity in 3D. The code is based on the hashed oct-tree algorithm, in which particle coordinates are projected onto a space-filling curve prior to sorting and construction of multipole moments. However, standard particle sorting techniques can ultimately limit the scalability of such algorithms for thousands of cores, a bottleneck which can be alleviated by a recursive sort scheme specially adapted to the Morton curve. More serious limitations of the original locally essential tree concept of Salmon and Warren, which ultimately lead to a failure in memory scaling, are identified and analyzed rigorously. Benchmarks for the code on the IBM Blue Gene/P Jugene are presented which demonstrate scaling for multi-million particle systems on up to 8192 cores. (C) 2011 Elsevier B.V. All rights reserved.
We revisit the problem of reliable interactive communication over a noisy channel and obtain the first fully (randomized) efficient constant-rate emulation procedure for reliable interactive communication. Our protoco...
详细信息
We revisit the problem of reliable interactive communication over a noisy channel and obtain the first fully (randomized) efficient constant-rate emulation procedure for reliable interactive communication. Our protocol works for any discrete memoryless noisy channel with constant capacity and fails with exponentially small probability in the total length of the protocol. Following a work by Schulman (1993), our simulation uses a tree-code, yet as opposed to the nonefficient construction of absolute tree-code used by Schulman, we introduce a relaxation in the notion of goodness for a tree code and define a potent tree code. This relaxation allows us to construct an efficient emulation procedure for any two-party protocol. Our results also extend to the case of interactive multiparty communication. We show that a randomly generated tree code (with suitable constant alphabet size) is an efficiently decodable potent tree code with overwhelming probability. Furthermore, we are able to partially derandomize this result by means of epsilon-biased distributions using only O(N) random bits, where N is the depth of the tree.
Minimal cut sets are a valuable tool for analyzing metabolic networks and for identifying optimal gene intervention strategies by eliminating unwanted metabolic functions and keeping desired functionality. Minimal cut...
详细信息
Minimal cut sets are a valuable tool for analyzing metabolic networks and for identifying optimal gene intervention strategies by eliminating unwanted metabolic functions and keeping desired functionality. Minimal cut sets rely on the concept of elementary flux modes, which are sets of indivisible metabolic pathways under steady-state condition. However, the computation of minimal cut sets is nontrivial, as even medium-sized metabolic networks with just 100 reactions easily have several hundred million elementary flux modes. We developed a minimal cut set tool that implements the well-known Berge algorithm and utilizes a novel approach to significantly reduce the program runtime by using binary bit pattern trees. By using the introduced tree approach, the size of metabolic models that can be analyzed and optimized by minimal cut sets is pushed to new and considerably higher limits.
The computation time for a particle simulation where long-reaching forces, such as electrostatic forces for charged particles, have to be taken into account, rises asymptotically with O (N(2)). A significant reduction...
详细信息
The computation time for a particle simulation where long-reaching forces, such as electrostatic forces for charged particles, have to be taken into account, rises asymptotically with O (N(2)). A significant reduction of computation time can be achieved by applying a tree code algorithm. The algorithm exploits the idea of grouping suitable particles to macroparticles thus reducing the number of necessary calculations within a reasonably limited error bound. Although this algorithm was developed for astronomical galaxy simulations, it can be also used for simulations of microscopical particle with finite radii. The tree code can be used for an efficient collision routine if some additional boundary conditions are taken into account. This results in an asymptotic computation time dependence of O (N log N).
暂无评论