The goal of this paper is to describe a novel parallel high-resolution 3D numerical method for the solution of high-frequency electromagnetic wave propagation. The sequential numerical method was developed by the firs...
详细信息
ISBN:
(数字)9781510626904
ISBN:
(纸本)9781510626904
The goal of this paper is to describe a novel parallel high-resolution 3D numerical method for the solution of high-frequency electromagnetic wave propagation. The sequential numerical method was developed by the first author in 2014. The discussed parallel algorithm will be used later by the authors to computationally simulate data for the solution of the inverse problem of imaging mine-like targets. Thus the solution of the forward problem presented in this paper is a necessary prelude to the future solution of a related inverse problem. In this paper, land mines are modeled as small abnormalities embedded in an otherwise uniform media with an air-ground interface. These abnormalities are characterized by the electrical permittivity and the conductivity, whose values differ from those of the host media. The main challenge in the calculation of the scattered electromagnetic signal in these settings is the requirement of solving the Helmholtz equation for high frequencies. This is excessively time-consuming using standard direct solution techniques. A high-resolution and scalable numerical procedure for the solution of this equation is described in this paper. The kernel of this algorithm is a combination of a second, fourth or sixth order compact finite-difference scheme and a preconditioned Krylov subspace approach. Both fourth and sixth order compact approximations for the Helmholtz equation are considered to reduce approximation and pollution errors, thereby softening the point-per-wavelength constraint. The coefficient matrix of the resulting system is not Hermitian and possesses positive as well as negative eigenvalues. This represents a significant challenge for constructing an efficient iterative solver. In our approach, this system is solved by a combination of Krylov subspace-type method with a direct parallel FFT-type preconditioner. The resulting numerical method allows a natural and efficient implementation on parallel computers. Numerical results for r
We propose the "Multi-Split-Row" LDPC decoding method which allows further reductions in routing complexity, greater throughput, and smaller circuit area implementations compared to the previously proposed S...
详细信息
ISBN:
(纸本)1424407281
We propose the "Multi-Split-Row" LDPC decoding method which allows further reductions in routing complexity, greater throughput, and smaller circuit area implementations compared to the previously proposed Split-Row decoding method. Multi-Split-Row is especially useful for regular high row weight LDPC codes. A 2048-bit full parallel decoder is implemented in a 0.18 mu m CMOS technology using standard MinSum, Split-Row-2 and Split-Row-4 methods. The Split-Row-4 decoder delivers 7.1 Gbps throughput with 15 decoding iterations, and has 3.2 times smaller circuit area and 5.2 times higher throughput than the standard MinSum decoder.
As data scales continue to increase, studying the porting and implementation of shared memory parallel algorithms for distributed memory architectures becomes increasingly important. We consider the problem of biconne...
详细信息
ISBN:
(数字)9781665497862
ISBN:
(纸本)9781665497862
As data scales continue to increase, studying the porting and implementation of shared memory parallel algorithms for distributed memory architectures becomes increasingly important. We consider the problem of biconnectivity for this current study, which identifies cut vertices and cut edges in a graph. As part of our study, we implemented and optimized a shared memory biconnectivity algorithm based on color propagation within a distributed memory context. This algorithm is neither work nor time efficient. However, when we compare to distributed implementations of theoretically efficient algorithms, we find that simple non-optimal algorithms can greatly outperform time-efficient algorithms in practice when implemented for real distributed-memory environments and real data. Overall, our distributed implementation for computing graph biconnectivity demonstrates an average strong scaling speedup of 15 x across 64 MPI ranks on a suite of irregular real-world inputs. We also note an average of 11 x and 7.3 x speedup relative to the optimal serial algorithm and fastest shared-memory implementation for the biconnectivity problem, respectively.
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches The approach is to design nested-parallel...
详细信息
ISBN:
(纸本)9781450300797
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches The approach is to design nested-parallel algorithms that have low depth (span. critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators Using known mappings. our results lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache We generalize these mappings to multi-level cache hierarchies of private or shared caches, implying that our algorithms also have low cache complexities on such hierarchies The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose.
Based on the second-order compact upwind scheme, a group explicit method for solving the two-dimensional time-independent convection-dominated diffusion problem is developed. The stability of the group explicit method...
详细信息
Based on the second-order compact upwind scheme, a group explicit method for solving the two-dimensional time-independent convection-dominated diffusion problem is developed. The stability of the group explicit method is proven strictly. The method has second-order accuracy and good stability. This explicit scheme can be used to solve all Reynolds number convection-dominated diffusion problems. A numerical test using a parallel computer shows high efficiency. The numerical results conform closely to the analytic solution.
In order to improve the performance of large data volume image denoising, this paper uses parallel algorithm to eliminate the noise and the DUD static load balancing strategy to improve the data handling capacity of t...
详细信息
This paper presents a parallel algorithm for the modified Cholesky factorization of sparse matrices on a distributed memory system. The parallel strategy is based on an asynchronous scheme, to obtain maximum performan...
详细信息
This paper presents a parallel algorithm for the modified Cholesky factorization of sparse matrices on a distributed memory system. The parallel strategy is based on an asynchronous scheme, to obtain maximum performance by taking advantage of the sparse nature of the matrices. The authors carry out an analysis of the communication overhead and the computational load balance in order to model the efficiency of the algorithm and predict the organization of the processors resulting in the best performance.
A bound for the number of steps that are required to evaluate Boolean expressions is obtained. It is shown that any Boolean expression of n distinct variables may be evaluated in 2log2n<span class="mo" id...
详细信息
A bound for the number of steps that are required to evaluate Boolean expressions is obtained. It is shown that any Boolean expression of n distinct variables may be evaluated in 2log2n
In this paper it was proven that Torus T(5m,2n) might be embedded into Petersen-Torus PT(m,n) at dilation 5, congestion 5, and expansion 1. It was also proven that Torus might be embedded in PT at 3 or less of average...
详细信息
ISBN:
(纸本)9780769533223
In this paper it was proven that Torus T(5m,2n) might be embedded into Petersen-Torus PT(m,n) at dilation 5, congestion 5, and expansion 1. It was also proven that Torus might be embedded in PT at 3 or less of average dilation. The embedding algorithm could be available in both wormhole routing system and store-and-forward routing system by embedding the generally known Torus network in PT at 5 or less of dilation and congestion, and the processor throughput could be minimized at simulation through one-to-one embedding.
The field of pharmaceutical modelling has, in recent years, benefited from using probabilistic methods based on cellular automata, which seek to overcome some of the limitations of differential equation based models. ...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
The field of pharmaceutical modelling has, in recent years, benefited from using probabilistic methods based on cellular automata, which seek to overcome some of the limitations of differential equation based models. By modelling discrete structural element interactions instead, these are able to provide data quality adequate for the early design phases in drug modelling. In relevant literature, both synchronous (CA) and asynchronous (ACA) types of automata have been used, without analysing their comparative impact on the model outputs. In this paper, we compare several variations of probabilistic CA and ACA update algorithms for building models of complex systems used in controlled drug delivery, analysing the advantages and disadvantages related to different modelling scenarios. Choosing the appropriate update mechanism, besides having an impact on the perceived realism of the simulation, also has practical benefits on the applicability of different model parallelisation algorithms and their performance when used in large-scale simulation contexts.
暂无评论