This paper presents simulation of large-scale antenna arrays on parallel platforms with a conventional MoM technique (such as mixed potential integral equation). We discuss parallel schemes for partitioning large-size...
详细信息
ISBN:
(纸本)0780378466;0852967527
This paper presents simulation of large-scale antenna arrays on parallel platforms with a conventional MoM technique (such as mixed potential integral equation). We discuss parallel schemes for partitioning large-sized geometry files, computation and storage of system matrix, and parallel solution process. In addition, we briefly explain a parallel algorithm for radiation pattern computation in the partitioning scheme. Lastly, we demonstrate application of the developed schemes to a 256-element slot-coupled patch antenna array with 101,555 unknowns. The size of partitioned geometry files, simulation time, and computed radiation pattern are presented as results.
In previous work we compared two parallel algorithms for calculating 3D forward and backprojection on a distributed-memory cluster computer. These two methods were used to develop an implementation of fully three-dime...
详细信息
In previous work we compared two parallel algorithms for calculating 3D forward and backprojection on a distributed-memory cluster computer. These two methods were used to develop an implementation of fully three-dimensional ordered subset expectation maximization iterative reconstruction for emission tomography (OSEM3D). It is, however, necessary to embed these computational kernels in an environment that supports efficient data movement and other infrastructural operations, such as process management. Here we briefly describe two particular components of the infrastructure: the I/O subsystem and the service demon. For the I/O subsystem, a fortuitous relationship between the traditional representation of the fully three-dimensional sinogram S/sub 4DO/(/spl theta/,z,/spl phi/,r) and the distributed representation for image space decomposition permits an efficient solution involving minimal data movement. The service demon is similar to others, but contains additional recovery-oriented mechanisms for minimizing mean time to repair. We conclude with performance benchmarks for fully 3D reconstruction for a scanner that produces a significant volume of data.
We describe a parallel algorithm using the BSP/CGM model (Bulk Synchronous parallel/Coarse Grained Multicomputer) to obtain the Euler tours in graphs. It is based on the PRAM (parallel random access machine) algorithm...
详细信息
We describe a parallel algorithm using the BSP/CGM model (Bulk Synchronous parallel/Coarse Grained Multicomputer) to obtain the Euler tours in graphs. It is based on the PRAM (parallel random access machine) algorithm by Caceres et al. For an input graph of n vertices and m edges, the algorithm requires local computation time of O((m+n)/p), O((m+n'p) memory and O(logp) communication rounds, where p is the number of processors. To our knowledge there are no other parallel algorithms under the coarse-grained models for the Euler tours in graphs. The proposed algorithm is implemented using MPI (message passing interface) and the C language. The parallel program runs on a Beowulf with 66 nodes. The implementation results confirm the theoretical complexity results of the algorithm.
The Euclidean distance transform (EDT) is an important tool in image analysis and machine vision. It is compute-intensive and real-time applications call for highly parallel solutions. A new linear-time parallel algor...
详细信息
The Euclidean distance transform (EDT) is an important tool in image analysis and machine vision. It is compute-intensive and real-time applications call for highly parallel solutions. A new linear-time parallel algorithm for EDT is proposed in this paper. The algorithm readily maps to hardware. A pipelined cellular architecture is presented. The architecture is modular and cascadable. Preliminary results of FPGA implementation indicate that the proposed architecture can compute EDT at speeds much higher than the video rate using only a small percentage of the chip (components) for fairly large image sizes.
In this work we describe and analyze sequential and parallel algorithms for the lossless compression of angiography sequences. These algorithms are based on the three-dimensional wavelet transform (3D-WT) and the SPIH...
详细信息
In this work we describe and analyze sequential and parallel algorithms for the lossless compression of angiography sequences. These algorithms are based on the three-dimensional wavelet transform (3D-WT) and the SPIHT coding method. We consider two different approaches: a 3D wavelet pyramidal structure method and a temporal wavelet decomposition method. We describe our parallel implementations developed on a shared memory system, an SGI Origin 3800 supercomputer, and using a message-passing paradigm ensuring the inter-processor synchronization. We evaluate and analyze the overall processing time and the speed-up factor by varying the number of nodes and the video coding parameters.
A general framework is proposed for the study of real-time algorithms. The framework unifies previous algorithmic definitions of real-time computation. In it, state space traversal is used as a model for computational...
详细信息
A general framework is proposed for the study of real-time algorithms. The framework unifies previous algorithmic definitions of real-time computation. In it, state space traversal is used as a model for computational problems in a real-time environment. The proposed framework also employs a paradigm, known as discrete steepest descent, for algorithms designed to solve these problems. Sequential and parallel algorithms for traversing a state space by discrete steepest descent are then analyzed and compared. The analysis measures the value (or worth) of a computed solution. The quantity used in the evaluation may be the time required by an algorithm to reach the solution, the quality of the solution obtained, or any similar measure. The value of a real-time solution obtained in parallel is shown to be consistently superior to that of a solution computed sequentially by an amount superlinear in the size of the problem.
Clusters of homogeneous workstations built around fast networks have become popular means of solving scientific problems, and users often have access to several such clusters. Harnessing the collective power of these ...
详细信息
Clusters of homogeneous workstations built around fast networks have become popular means of solving scientific problems, and users often have access to several such clusters. Harnessing the collective power of these clusters to solve a single, challenging problem is desirable, but is often impeded by large inter-cluster network latencies and heterogeneity of different clusters. The complexity of these environments requires commensurate advances in parallel algorithm design. We support this thesis by utilizing two techniques: 1) multigrain, a novel algorithmic technique that induces coarse granularity to parallel iterative methods, providing tolerance for large communication latencies, and 2) an application-level load balancing technique applicable to a specific but important class of iterative methods. We implement both algorithmic techniques on the popular Jacobi-Davidson eigenvalue iterative solver. Our experiments on a cluster environment show that the combination of the two techniques enables effective use of heterogeneous, possibly distributed resources, that cannot be achieved by traditional implementations of the method.
Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y. The sequential algor...
详细信息
Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y. The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < /spl radic/m processors that takes O(mn/p) time and O(n/spl radic/m) space per processor, with O(log p) communication rounds. The proposed parallel algorithm also solves the well-known LCS problem. To our knowledge this is the best BSP/CGM algorithm for the ALCS problem in the literature.
This paper describes a new algorithm for packet classification using the concept of independent sets. The algorithm has very small memory requirements. The search speed is neither sensitive to the rule table nor to th...
详细信息
This paper describes a new algorithm for packet classification using the concept of independent sets. The algorithm has very small memory requirements. The search speed is neither sensitive to the rule table nor to the percentage of wildcards in the fields. It also scales well from two dimensional classifiers to high dimensional ones. In particular, the algorithm is inherently parallel. Hardware tailored to this algorithm can achieve very fast search speed.
The contact-impact model was widely used to simulate vehicle crashworthiness and sheet form process. Due to the considerable cost of contact-impact computation, a parallel algorithm was presented and implemented on th...
详细信息
The contact-impact model was widely used to simulate vehicle crashworthiness and sheet form process. Due to the considerable cost of contact-impact computation, a parallel algorithm was presented and implemented on the distributed scalable parallel platform. From numerical examples, the effectiveness and scalability are proved for the parallel algorithm. The stability of parallel platform is also tested, which can carry out the parallel computation of contact-impact problems reliably and be applied in the developing countries because of its relatively lower cost than large parallel machines.
暂无评论