For decades, rigid body dynamics has been used in several active research fields to simulate the behavior of completely undeformable, rigid bodies. Due to the focus of the simulations to either high physical accuracy ...
详细信息
For decades, rigid body dynamics has been used in several active research fields to simulate the behavior of completely undeformable, rigid bodies. Due to the focus of the simulations to either high physical accuracy or real time environments, the state-of-the-art algorithms cannot be used in excess of several thousand rigid bodies. Either the complexity of the algorithms would result in infeasible runtimes, or the simulation could no longer satisfy the real time aspects. In this paper we present a novel approach for large-scale rigid body dynamics simulations. The presented algorithm enables for the first time rigid body simulations of several million rigid bodies. We describe in detail the parallel rigid body algorithm and its necessary extensions for a large-scale MPI parallelization and analyze the parallel algorithm by means of a particular simulation scenario.
A method for optimizing the schedule and allocation of uniform algorithms onto processor arrays is derived. The main results described in the following paper are: (1) single (integer) linear programs are given for the...
详细信息
A method for optimizing the schedule and allocation of uniform algorithms onto processor arrays is derived. The main results described in the following paper are: (1) single (integer) linear programs are given for the optimal schedule of regular algorithms with and without resource constraints, (2) the class of algorithms is extended by allowing certain non-convex index domains, (3) efficient branch and bound techniques are used such that problems of relevant size can be solved. Moreover, additional constraints such as cache memory, bus bandwidths and access conflicts can be considered also. The results are applied to an example of relevant size.
In this paper we consider the organization of three iterative methods for solving self-adjoint elliptic difference equations on a set of linearly connected processors. These algorithms are the cyclic Chebyshev semi-it...
详细信息
In this paper we consider the organization of three iterative methods for solving self-adjoint elliptic difference equations on a set of linearly connected processors. These algorithms are the cyclic Chebyshev semi-iterative scheme, a preconditioned conjugate gradient method, and a generalization of the Chebyshev method. We also compare their performance on this multiprocessor as a function of the cost of interprocessor communication.
The on-line event reconstruction in ALICE is performed by the High Level Trigger, which should process up to 2000 events per second in proton-proton collisions and up to 300 central events per second in heavy-ion coll...
详细信息
The on-line event reconstruction in ALICE is performed by the High Level Trigger, which should process up to 2000 events per second in proton-proton collisions and up to 300 central events per second in heavy-ion collisions, corresponding to an input data stream of 30 GB/s. In order to fulfill the time requirements, a fast on-line tracker has been developed. The algorithm combines a Cellular Automaton method being used for a fast pattern recognition and the Kalman Filter method for fitting of found trajectories and for the final track selection. The tracker was adapted to run on Graphics Processing Units (GPU) using the NVIDIA Compute Unified Device Architecture (CUDA) framework. The implementation of the algorithm had to be adjusted at many points to allow for an efficient usage of the graphics cards. In particular, achieving a good overall workload for many processor cores, efficient transfer to and from the GPU, as well as optimized utilization of the different memories the GPU offers turned out to be critical. To cope with these problems a dynamic scheduler was introduced, which redistributes the workload among the processor cores. Additionally a pipeline was implemented so that the tracking on the GPU, the initialization and the output processed by the CPU, as well as the DMA transfer can overlap. The GPU tracking algorithm significantly outperforms the CPU version for large events while it entirely maintains its efficiency.
We present new numerical algorithms for solving the structural inverse gravimetry problem for the case of multiple surfaces. The inverse problem of finding the multiple surfaces that divide the constant density layers...
详细信息
We present new numerical algorithms for solving the structural inverse gravimetry problem for the case of multiple surfaces. The inverse problem of finding the multiple surfaces that divide the constant density layers is an ill-posed one described by a nonlinear integral equation of the first kind. To solve it, it is necessary to apply the regularization ideas. The new regularized variants of the gradient type methods with the weighting factors are constructed, namely, the steepest descent and conjugate gradient method. We suggest the empirical rule for choosing the regularization parameters. On the basis of the constructed methods, we elaborate the parallel algorithms and implement them in the multicore CPU using the OpenMP technology. A set of experiments with the disturbed data is performed to test the gradient algorithms and study performance of the developed code. For the test problems with quasi-real data, these new regularized algorithms increase the accuracy and speed up computation in comparison with the unregularized ones. By using the 8-core CPU, we achieve the speedup of 8 times.
One of the fundamental algorithmic problems in computer science involves selecting the kth smallest element in a set S of n elements. In this paper we present a fast selection algorithm which runs in O(n1/8 log n) tim...
详细信息
One of the fundamental algorithmic problems in computer science involves selecting the kth smallest element in a set S of n elements. In this paper we present a fast selection algorithm which runs in O(n1/8 log n) time on a mesh with multiple broadcasting of size n3/8 x n5/8. Our result shows that, just like semigroup computations, selection can be done much faster on a suitably chosen rectangular mesh than on square meshes. We also show that if every processor can store n1/9 items, then our selection algorithm runs in O(n1/8 log n) time on a mesh with multiple broadcasting of size n1/3 x n5/9.
In this paper, a parallel algorithm for analyzing connected components in binary images is described. It is based on the extension of the Cylindrical Algebraic Decomposition (CAD) to a two-dimensional (2D) discrete sp...
详细信息
In this paper, a parallel algorithm for analyzing connected components in binary images is described. It is based on the extension of the Cylindrical Algebraic Decomposition (CAD) to a two-dimensional (2D) discrete space. This extension allows us to find the number of connected components, to determine their connectivity degree, and to solve the visibility problem . The parallel implementation of the algorithm is outlined and its time/space complexity is given.
We present an efficient algorithm for computing the matching polynomial of a series-parallel graph in O(n 2 ) time. This algorithm improves on the previous result of O(n 3 ). We also present a cost-optimal parallel al...
详细信息
We present an efficient algorithm for computing the matching polynomial of a series-parallel graph in O(n 2 ) time. This algorithm improves on the previous result of O(n 3 ). We also present a cost-optimal parallel algorithm for computing the matching polynomial of a series-parallel graph using an EREW PRAM computer with the number of processors p less than n 2 / log n.
In this paper, the parallel algorithm of JPEG coding based on GPU is proposed, most image compression systems have efficiency problem and the real-time of wireless multimedia sensor networks (WMSN) which used in image...
详细信息
New parallel algorithms and comparative test results are given for solving triangular systems of linear equations on distributed-memory multiprocessors. These results supplement those given in a previous paper. All of...
详细信息
New parallel algorithms and comparative test results are given for solving triangular systems of linear equations on distributed-memory multiprocessors. These results supplement those given in a previous paper. All of the new algorithms are variations on the cyclic algorithms discussed previously. The new algorithms are shown to provide substantial performance improvements.
暂无评论