We describe the use of spectrally-based numerical methods in process studies of rotating stratified fluid dynamics relevant to oceans, lakes and the atmosphere. The objective is to take advantage of the well-known num...
详细信息
We describe the use of spectrally-based numerical methods in process studies of rotating stratified fluid dynamics relevant to oceans, lakes and the atmosphere. The objective is to take advantage of the well-known numerical properties of methods based on expansions in terms of trigonometric functions in applications for which inhomogeneous boundary conditions and/or irregular domains are desired. The underlying mathematical idea is the exchange of inhomogeneity from boundary conditions to forcing terms. The fundamental techniques for handling inhomogeneity in boundary conditions, symmetry mismatches between body forces and dependent variables at boundaries and the imposition of boundary conditions on internal or immersed boundaries are described and illustrated using simple idealized examples. These techniques are then combined to illustrate how these methods can be applied to several examples of flows from laboratory experiments. (C) 2012 Elsevier Ltd. All rights reserved.
Network coding, a well-known technique for optimizing data-flow in wired and wireless network systems, has attracted considerable attention in various fields. However, the decoding complexity in network coding becomes...
详细信息
Network coding, a well-known technique for optimizing data-flow in wired and wireless network systems, has attracted considerable attention in various fields. However, the decoding complexity in network coding becomes a major performance bottleneck in the practical network systems;thus, several researches have been conducted for improving the decoding performance in network coding. Nevertheless, previously proposed parallel network coding algorithms have shown limited scalability and performance imbalance for different-sized transfer units and multiple streams. In this paper, we propose a new parallel decoding algorithm for network coding using a graphics processing unit (GPU). This algorithm can simultaneously process multiple incoming streams and can maintain its maximum decoding performance irrespective of the size and number of transfer units. Our experimental results show that the proposed algorithm exhibits a 682.2 Mbps decoding bandwidth on a system with GeForce GTX 285 GPU and speed-ups of up to 26 as compared to the existing single stream decoding procedure with a 128 x 128 coefficient matrix and different-sized data blocks.
The AllReduce algorithm is a promising new algorithm for parallelizing the Householder QR decomposition A = QR of a tall and skinny matrix. It divides the input matrix A vertically in a recursive manner, computes the ...
详细信息
The AllReduce algorithm is a promising new algorithm for parallelizing the Householder QR decomposition A = QR of a tall and skinny matrix. It divides the input matrix A vertically in a recursive manner, computes the QR decompositions of each submatrix independently, and merges the results to obtain the QR decomposition of A. While this algorithm has been shown to achieve excellent speedup in various parallel environments, its rounding error properties have not been elucidated yet. In this paper, we present theoretical error analysis of the AllReduce algorithm. Specifically, we derive bounds for the backward error of A and deviation from orthogonality of the computed Q factor. Our analysis shows that both of these bounds are smaller than their counterparts for the conventional Householder QR algorithm. Moreover, the bounds decrease as the number of submatrices increases. These results are supported by numerical experiments. Thus we can conclude that the AllReduce algorithm can be used as a reliable method of orthogonalization in parallel environments.
In order to solve a non-stationary Stokes-Darcy model with Beavers-Joseph interface condition, two non-iterative domain decomposition methods are proposed. At each time step, results from previous time steps are utili...
详细信息
In order to solve a non-stationary Stokes-Darcy model with Beavers-Joseph interface condition, two non-iterative domain decomposition methods are proposed. At each time step, results from previous time steps are utilized to approximate the information on the interface and decouple the two physics. Both of the two methods are parallel. Numerical results suggest that the first method has accuracy order O(h(3) + Delta t). In order to improve the accuracy and efficiency, a three-step backward differentiation is used in the second method to achieve an accuracy order O(h(3) + Delta t(3)), which is illustrated by a numerical example. (C) 2012 Elsevier Inc. All rights reserved.
Feature selection is an indispensable preprocessing step for effective analysis of high dimensional data. It removes irrelevant features, improves the predictive accuracy and increases the comprehensibility of the mod...
详细信息
Feature selection is an indispensable preprocessing step for effective analysis of high dimensional data. It removes irrelevant features, improves the predictive accuracy and increases the comprehensibility of the model constructed by the classifiers sensitive to features. Finding an optimal feature subset for a problem in an outsized domain becomes intractable and many such feature selection problems have been shown to be NP-hard. Optimization algorithms are frequently designed for NP-hard problems to find nearly optimal solutions with a practical time complexity. This paper formulates the text feature selection problem as a combinatorial problem and proposes an Ant Colony Optimization (ACO) algorithm to find the nearly optimal solution for the same. It differs from the earlier algorithm by Aghdam et al. by including a heuristic function based on statistics and a local search. The algorithm aims at determining a solution that includes 'n' distinct features for each category. Optimization algorithms based on wrapper models show better results but the processes involved in them are time intensive. The availability of parallel architectures as a cluster of machines connected through fast Ethernet has increased the interest on parallelization of algorithms. The proposed ACO algorithm was parallelized and demonstrated with a cluster formed with a maximum of six machines. Documents from 20 newsgroup benchmark dataset were used for experimentation. Features selected by the proposed algorithm were evaluated using Naive bayes classifier and compared with the standard feature selection techniques. It was observed that the performance of the classifier had been improved with the features selected by the enhanced ACO and local search. Error of the classifier decreases over iterations and it was observed that the number of positive features increases with the number of iterations. (C) 2011 Elsevier Ltd. All rights reserved.
In this paper, we propose a parallel algorithm for data classification, and its application for Magnetic Resonance Images (MRI) segmentation. The studied classification method is the well-known c-means method. The use...
详细信息
In this paper, we propose a parallel algorithm for data classification, and its application for Magnetic Resonance Images (MRI) segmentation. The studied classification method is the well-known c-means method. The use of the parallel architecture in the classification domain is introduced in order to improve the complexities of the corresponding algorithms, so that they will be considered as a pre-processing procedure. The proposed algorithm is assigned to be implemented on a parallel machine, which is the reconfigurable mesh computer (RMC). The image of size (m x n) to be processed must be stored on the RMC of the same size, one pixel per processing element (PE). (C) 2011 Elsevier B.V. All rights reserved.
The letter presents a non-massive parallel procedure to compute the biorthogonal dual system used for signal reconstruction in the case of spline-type spaces with multiple generators. The basis of this algorithm are t...
详细信息
The letter presents a non-massive parallel procedure to compute the biorthogonal dual system used for signal reconstruction in the case of spline-type spaces with multiple generators. The basis of this algorithm are the properties of the projection operator and the invertibility of the Gramian in the case of a Riesz basis. We use a parallel approach in both time and frequency for the computation of the dual system obtained by translation and sampling of a finitely number of atoms. Since there are many applications in signal and image processing where the spline-type spaces (also known as shift-invariant spaces) play a central role, fast computing methods are needed, especially in the multi-windows case, where the computations are expensive from the execution time and from memory storage point of view. We test the implementation on car crash data.
Current MPSoCs typically consist of less than a dozen processing units. Future MPSoCs are likely to integrate many more. With this trend, dozens of applications can be running on an MPSoC concurrently and application ...
详细信息
Current MPSoCs typically consist of less than a dozen processing units. Future MPSoCs are likely to integrate many more. With this trend, dozens of applications can be running on an MPSoC concurrently and application deadlock on MPSoCs will become a severe problem. To address the application deadlock problem in current and future MPSoCs, this article proposes a parallel multi-unit resource deadlock detection algorithm, incorporating four contributions: (1) a classification of resource events that enables each category of events to be handled efficiently, (2) a parallel node hopping mechanism that explores the entire graph exponentially in parallel to obtain information about reachable processes of every resource, (3) an innovative hardware implementation of the node hopping mechanism using bit-wise computations, and (4) proofs of correctness and run-time complexity of the proposed algorithm. Based on information about reachable processes as well as sink nodes in the graph, the proposed algorithm detects deadlock in O(1) run-time. Compared with the worst case run-time of any previous algorithm, which employs a single scheme to handle all resource events, ours is considerably reduced to O(log(2)(min(m, n))) when implemented in hardware, where m and n are the number of processes and resources, respectively. (C) 2011 Elsevier Inc. All rights reserved.
Based on two-grid discretization, a new parallel finite element algorithm for the stationary Navier-Stokes equations is proposed and analyzed. This algorithm first solves the Navier-Stokes equations using a coarse gri...
详细信息
Based on two-grid discretization, a new parallel finite element algorithm for the stationary Navier-Stokes equations is proposed and analyzed. This algorithm first solves the Navier-Stokes equations using a coarse grid, and then corrects the resultant residual on a fine grid by solving local Navier-Stokes equations in a parallel manner with homogeneous boundary conditions. Existing sequential Navier-Stokes solver is available for each problem on sub-domains, so that the proposed parallel algorithm can be implemented on the top of existing sequential software. The error bounds of the approximate solution are estimated. Moreover, the efficiency of the algorithm is also demonstrated by numerical simulations of the lid-driven cavity flow, the backward-facing step flow, and the flow past a circular cylinder. (C) 2011 Elsevier B.V. All rights reserved.
A fast parallel thinning algorithm is analyzed in this *** improved algorithm with two-step method is proposed to thin the contour of military *** make programming easy,the deleting array is given and the operation sp...
详细信息
A fast parallel thinning algorithm is analyzed in this *** improved algorithm with two-step method is proposed to thin the contour of military *** make programming easy,the deleting array is given and the operation speed of the algorithm is *** algorithm is programmed with Visual C++6.0 and a good result is obtained. There is no distorted skeleton and excessive corrosion phenomenon,and the connectedness is satisfied.
暂无评论