A simulated annealing (SA) algorithm called Sample-Sort that is artificially extended across an array of samplers is proposed. The sequence of temperatures for a serial SA algorithm is replaced with an array of sample...
详细信息
A simulated annealing (SA) algorithm called Sample-Sort that is artificially extended across an array of samplers is proposed. The sequence of temperatures for a serial SA algorithm is replaced with an array of samplers operating at static temperatures and the single stochastic sampler is replaced with a set of samplers. The set of samplers uses a biased generator to sample the same distribution of a serial SA algorithm to maintain the same convergence property. Sample-Sort was compared to SA by applying both to a set of global optimization problems and found to be comparable if the number of iterations per sampler was sufficient. If the evaluation phase dominates the computational requirements, Sample-Sort could take advantage of parallel processing.
In this study, we introduce cost effective strategies and algorithms for parallelizing the Krylov subspace based non-stationary iterative solvers such as Bi-CGM and Bi-CGSTAB for distributed computing on a cluster of ...
详细信息
In this study, we introduce cost effective strategies and algorithms for parallelizing the Krylov subspace based non-stationary iterative solvers such as Bi-CGM and Bi-CGSTAB for distributed computing on a cluster of PCs using ANULIB message passing libraries. We investigate the effectiveness of the parallel solvers on the linear systems resulting in numerical solution of some 2D and 3D nonlinear partial differential equations governing heat convection process by finite element, finite difference and wavelet based numerical schemes. Largely Bi-CGM is found to give better performance measured in terms of speedup factors. (c) 2005 Elsevier Ltd. All rights reserved.
In this paper, we present two O(1) time algorithms for solving the 2D all nearest neighbor (2D_ANN) problem, the 2D closest pair (2D_CP) problem, the 3D all nearest neighbor (3D_ANN) problem and the 3D closest pair (3...
详细信息
In this paper, we present two O(1) time algorithms for solving the 2D all nearest neighbor (2D_ANN) problem, the 2D closest pair (2D_CP) problem, the 3D all nearest neighbor (3D_ANN) problem and the 3D closest pair (3D_CP) problem of n points on the linear array with a reconfigurable pipelined bus system (LARPBS) from the computational geometry perspective. The first O(1) time algorithm, which invokes the ANN properties ( introduced in this paper) only once, can solve the 2D_ANN and 2D_CP problems of n points on an LARPBS of size 1/2 n(5/3+epsilon), and the 3D_ ANN and 3D_ CP problems of n points on an LARPBS of size 1/2 n(7/4+epsilon), where 0 < epsilon = 1/2(-1)(c+1) much less than 1, c is a constant and positive integer. The second O(1) time algorithm, which recursively invokes the ANN properties k times, can solve the kD_ANN, and kD_CP problems of n points on an LARPBS of size 1/2 n(3/2+epsilon), where k = 2 or 3, 0 algorithms known.
The rewarming process of cryopreserved organs using microwave technology is analyzed by numerical simulation. The FDTD (finite-difference time-domain) method is applied to calculate the electromagnetic field in a real...
详细信息
The rewarming process of cryopreserved organs using microwave technology is analyzed by numerical simulation. The FDTD (finite-difference time-domain) method is applied to calculate the electromagnetic field in a real microwave rewarming system, composed by a cylindrical resonant cavity, an antenna source, and a frozen rabbit-kidney phantom with temperature-dependent properties. The efficiency of the FDTD codes is improved by nonuniform grid techniques and parallel algorithms. Meanwhile, an apparent specific-heat method is introduced in the temperature-field calculation. Coupling the solutions of the two fields is realized by a formerly developed algorithm. The numerical results show that in the rewarming, process of the rabbit kidney phantom, the warming rate can reach 300 degrees-500 degrees C/min, which may prevent devitrification, but the maximum temperature difference in the sample (18 mm in radius) can reach 15 degrees C at the end, which may cause severe thermal stress. (c) 2005 Wiley Periodicals, Inc.
We discuss the use of current shared-memory systems for discrete-particle modeling of heterogeneous mesoscopic complex fluids in irregular geometries. This has been demonstrated by way of mesoscopic blood flow in bifu...
详细信息
We discuss the use of current shared-memory systems for discrete-particle modeling of heterogeneous mesoscopic complex fluids in irregular geometries. This has been demonstrated by way of mesoscopic blood flow in bifurcating capillary vessels. The plasma is represented by fluid particles, while the other blood constituents are made of "solid" particles interacting with harmonic forces. The particle code was tested on 4 and 8 processors of SGI/Origin 3800 (R14000/500), IBM Regatta (Power4/1300), SGI Altix 3000 (Itanium (R) 2/1300) systems and 2-processor AMD Opteron 240 motherboard. The tests were performed for the same system employing two million fluid and "solid" particles. We show that irregular boundary conditions and heterogeneity of the particle fluid inhibit efficient implementation of the model on superscalar processors. We improve the efficiency almost threefold by reducing the effect of computational imbalance using a simple load-balancing scheme. Additionally, in employing MPI on shared memory machines, we have constructed a simple middleware library to simplify parallelization. The efficiency of the particle code depends critically on the memory latency. Therefore, the latest architectures with the fastest CPU-memory interface, such as AMD Opteron and Power4, represent the most promising platforms for modeling the complex mesoscopic systems with fluid particles. As an example of application of small, shared-memory clusters in solving very complex problems we demonstrate the results of modeling red blood cells clotting in blood flow in capillary vessels due to fibrin aggregation.
In this work, we developed an accurate and efficient radiative finite volume method applicable for the complex 2D planar and 3D geometries using ail unstructured grid finite Volume method, The present numerical model ...
详细信息
In this work, we developed an accurate and efficient radiative finite volume method applicable for the complex 2D planar and 3D geometries using ail unstructured grid finite Volume method, The present numerical model has fully been validated by several benchmark cases including the radiative heat transfer in quadrilateral enclosure with isothermal medium, tetrahedral enclosure, a three-dimensional idealized furnace, as well as convection coupled radiative heat transfer in a square enclosure. The numerical results for all cases are well agreed with the previous results, Special emphasis is given to the parallelization of the unstructured grid radiative FVM using the domain decomposition approach. Numerical results indicate that the present parallel unstructured-grid FVM has the good performance in terms of accuracy, geometric flexibility. and computational efficiency.
Consider an n-dimensional SIMD hypercube H-n with [3n/2] - 1 faulty nodes. With n + 3 log(n - 1) + 7, n + 2 log(n - 1) + 9, n + log(n - 1) + O(log log(n - 1)), n + log(n - 1) + 12, and n + 19 steps, this paper present...
详细信息
Consider an n-dimensional SIMD hypercube H-n with [3n/2] - 1 faulty nodes. With n + 3 log(n - 1) + 7, n + 2 log(n - 1) + 9, n + log(n - 1) + O(log log(n - 1)), n + log(n - 1) + 12, and n + 19 steps, this paper presents some one-to-all broadcasting algorithms on the faulty SIMD H-n. The sequence of dimensions used for broadcasting in each algorithm is the same regardless of which node is the source. The proposed one-to-all broadcasting algorithms can tolerate [n/2] more faulty nodes than Raghavendra. and Sridhar's algorithms (J. parallel Distrb. Comput. 35 (1996) 57) although 8 extra steps are needed. The fault-tolerance improvement of this paper is about 50%. (c) 2004 Published by Elsevier Inc.
A valuable geometric structure in mobile robot path planning is the complete visibility graph. This letter proposes new parallel algorithms that can be mapped to reconfigurable hardware for construction of the complet...
详细信息
A valuable geometric structure in mobile robot path planning is the complete visibility graph. This letter proposes new parallel algorithms that can be mapped to reconfigurable hardware for construction of the complete visibility graph in an environment with: 1) multiple convex polygonal objects and 2) multiple nonconvex polygonal objects. Results of implementation in a Xilinx Virtex field-programmable gate array demonstrate that the proposed approach is area-time efficient: the design for an environment with roughly 60 vertices fits on one XCV3200E device and operates at close to 60 MHz.
In this work we consider deterministic oblivious k-k routing algorithms with buffer size O(k). We present an asymptotically optimal O(k-root n(d)) step oblivious k-k routing algorithm for d-dimensional n x(...)x n mes...
详细信息
In this work we consider deterministic oblivious k-k routing algorithms with buffer size O(k). We present an asymptotically optimal O(k-root n(d)) step oblivious k-k routing algorithm for d-dimensional n x(...)x n meshes of n(d) processors for all k >= 1 and d > 1. We further show how the algorithm can be used to achieve asymptotically optimal oblivious k-k routing algorithms on other networks. (c) 2005 Elsevier B.V. All rights reserved.
Emerging computing environments, such as the Grid, promise enormous raw computational power. However, effective use of such platforms is often difficult, because conventional spatial decomposition leads to fine granul...
详细信息
Emerging computing environments, such as the Grid, promise enormous raw computational power. However, effective use of such platforms is often difficult, because conventional spatial decomposition leads to fine granularity, resulting in high communication overhead. We introduce the concept of guided simulations to parallelize along the time domain. Here, we use the fact that typically results of other simulations of closely related problems are available. In this approach, we automatically and dynamically determine a relationship between old simulations and the one being performed, and use this to parallelize along the time domain. We demonstrate the validity of this approach by applying the technique to an important application involving molecular dynamics simulation of nanomaterials. In this application, spatial decomposition is not effective due to the small size of the physical system. However, time parallelization is effective, since the granularity is much coarser. We also mention how this approach can be extended to make it inherently fault tolerant. (c) 2005 Elsevier B.V. All rights reserved.
暂无评论