In this paper, a parallel nonoverlapping non-conformal domain decomposition method using combined-field integral equation is proposed for fast and accurate analysis of electrically large objects situated in half space...
详细信息
In this paper, a parallel nonoverlapping non-conformal domain decomposition method using combined-field integral equation is proposed for fast and accurate analysis of electrically large objects situated in half space. This method decomposes the whole model into several easily solvable closed subdomains in a flexible way, and an explicit boundary condition is applied to ensure the continuity of electric currents across the boundary. The adaptive direction partitioning parallelization scheme for the oct-tree of a multilevel fast multipole algorithm is adopted to accelerate matrix-vector multiplications of subdomains as well as the coupling between them. The introducing of real image source to account for far interaction matrix greatly accelerates the process and reduces memory requirement with precision guaranteed. Numerical examples demonstrate that the proposed method is able to simulate realistic problems with a maximum dimension greater than 1000 wavelengths.
In railway traffic systems, whenever disturbances occur, it is important to effectively reschedule trains while optimizing the goals of various stakeholders. algorithms can provide significant benefits to support the ...
详细信息
In railway traffic systems, whenever disturbances occur, it is important to effectively reschedule trains while optimizing the goals of various stakeholders. algorithms can provide significant benefits to support the traffic controllers in train rescheduling, if well integrated into the overall traffic management process. In the railway research literature, many algorithms are proposed to tackle different versions of the train rescheduling problem. However, limited research has been performed to assess the capabilities and performance of alternative approaches, with the purpose of identifying their main strengths and weaknesses. Evaluation of train rescheduling algorithms enables practitioners and decision support systems to select a suitable algorithm based on the properties of the type of disturbance scenario in focus. It also guides researchers and algorithm designers in improving the algorithms. In this paper, we (1) propose an evaluation framework for train rescheduling algorithms, (2) present two train rescheduling algorithms: a heuristic and a MILP-based exact algorithm, and (3) conduct an experiment to compare the two multi-objective algorithms using the proposed framework (a proof-of-concept). It is found that the heuristic algorithm is suitable for solving simpler disturbance scenarios since it is quick in producing decent solutions. For complex disturbances wherein multiple trains experience a primary delay due to an infrastructure failure, the exact algorithm is found to be more appropriate.
The paper describes the development, research, and numerical implementation of interrelated mathematical models of hydrophysics and biological kinetics. These models were implemented on a supercomputer as a system for...
详细信息
ISBN:
(纸本)9783030281632;9783030281625
The paper describes the development, research, and numerical implementation of interrelated mathematical models of hydrophysics and biological kinetics. These models were implemented on a supercomputer as a system for monitoring and controlling the quality of shallow waters and predicting processes of dispersion of contaminants in boundary layers of the atmosphere and water bodies. The program complex can be used as an efficient tool for monitoring the ecological situation in water bodies subjected to increasing anthropogenic pressure, climatic and industrial challenges, and emergency situations of anthropogenic or natural character. The software complex (the monitoring and control systems) comprises discrete analogs of models of water ecology based on high-order accuracy schemes. We apply the modified alternating triangular method to the solution of grid equations used for discretization of model problems in aquatic ecology. This method has the best convergence rate under the condition of asymptotic stability of difference schemes for parabolic equations with efficiency improved on the basis of updated spectral estimates. The design of effective parallel algorithms for the numerical implementation of problems of hydrophysics and biological kinetics offers an opportunity to consider processes of sediment dispersion in "air-water" systems in real and accelerated time.
In recent years, in order to reduce the execution time, some evolutionary algorithms that run on GPUs using Compute Unified Device Architecture (i.e., CUDA) have been proposed. In these evolutionary algorithms, they c...
详细信息
ISBN:
(纸本)9781538677322
In recent years, in order to reduce the execution time, some evolutionary algorithms that run on GPUs using Compute Unified Device Architecture (i.e., CUDA) have been proposed. In these evolutionary algorithms, they compared the execution time and precision of GPU versions with those of CPU versions. In this study, we parallelize a self-adaptive harmony search algorithm and compare with the existing evolutionary algorithms on the same GPU platform. The proposed algorithm is divided into four steps: initialization, improvising, sorting, and updating. In the experiments, we use eight well-known optimization problems to evaluate the proposed algorithm and the other existing algorithms. As a result, our algorithm achieves the best performances among all the algorithms on the single-objective optimization problems with more dimensions or populations.
In 1988, Vazirani gave an NC algorithm for computing the number of perfect matchings in K-3,K-3-minor-free graphs by building on Kasteleyn's scheme for planar graphs, and stated that this "opens up the possib...
详细信息
ISBN:
(纸本)9781450361842
In 1988, Vazirani gave an NC algorithm for computing the number of perfect matchings in K-3,K-3-minor-free graphs by building on Kasteleyn's scheme for planar graphs, and stated that this "opens up the possibility of obtaining an NC algorithm for finding a perfect matching in K-3,K-3-free graphs." In this paper, we finally settle this 30-year-old open problem. Building on recent NC algorithms for planar and bounded-genus perfect matching by Anari and Vazirani and by Sankowski, we obtain NC algorithms for perfect matching in any minor-closed graph family that forbids a one-crossing graph. This result applies to several well-studied graph families including the K-3,K-3-minor-free graphs and K-5-minor-free graphs. Graphs in these families not only have unbounded genus, but can have genus as high as O(n). Our method applies as well to several other problems related to perfect matching. In particular, we obtain NC algorithms for the following problems in any family of graphs (or networks) with a one-crossing forbidden minor: Determining whether a given graph has a perfect matching and if so, finding one. Finding a minimum weight perfect matching in the graph, assuming that the edge weights are polynomially bounded. Computing the number of perfect matchings in the graph. Finding a maximum st-flow in the network, with arbitrary capacities. The main new idea enabling our results is the definition and use of matching-mimicking networks, small replacement networks that behave the same, with respect to matching problems involving a fixed set of terminals, as the larger network they replace.
Edit distance is one of the most fundamental problems in combinatorial optimization. Ulam distance is a special case of edit distance where no character is allowed to appear more than once in a string. Recent developm...
详细信息
ISBN:
(纸本)9781450361842
Edit distance is one of the most fundamental problems in combinatorial optimization. Ulam distance is a special case of edit distance where no character is allowed to appear more than once in a string. Recent developments have been very fruitful for obtaining fast and parallel algorithms for both edit distance and Ulam distance. In this work, we present an almost optimal MPC algorithm for Ulam distance and improve MPC algorithms for edit distance. Our algorithm for Ulam distance is optimal in the sense that (1) the approximation factor of our algorithm is 1 + c, (2) the round complexity of our algorithm is constant, (3) the total memory of our algorithm is almost linear (0(n)), and (4)] the overall running time of our algorithm is almost linear which is the best known for Ulam distance. Similar to edit distance and longest common subsequence (LCS) which are considered as dual problems, Ulam distance and longest increasing subsequence (LIS) are also seen as dual problems. LIS is equivalent to a special case of LCS where each string can contain each character at most once. In that sense, our result for Ulam distance complements the work of Im et al., wherein a similar result is presented for LIS. We also improve the work of Hajiaghayi et al. for edit distance in terms of total memory. The best previously known MPC algorithm for edit distance requires 5(n2x) machines when the memory of each machine is bounded by 5(ni-x). In this work, we improve the number of machines to a(n1-75x) while keeping the memory limit intact. Moreover, the round complexity of our algorithm is constant and the total running time of our algorithm is truly sub quadratic. However, our improvement comes at the expense of a constant factor in the approximation guarantee of the algorithm. This improvement is inspired by the recent techniques of Boroujeni et al. and Chakraborty et al. for obtaining truly subquadratic time algorithms for edit distance.
This article describes an approach to parallelizing of data mining algorithms in logical programming framework, for distributed data processing in cluster. As an example Naive Bayes algorithm implementation in Prolog ...
详细信息
ISBN:
(纸本)9783030308599;9783030308582
This article describes an approach to parallelizing of data mining algorithms in logical programming framework, for distributed data processing in cluster. As an example Naive Bayes algorithm implementation in Prolog framework, its conversion into parallel type and execution on cluster with MPI system are described.
A Bloom filter is a space-efficient bit array data structure that can test if a query element is in a set (i.e. true) or not (i.e. false). It returns positive (i.e. the query element is in a set) or negative (i.e. the...
详细信息
ISBN:
(纸本)9781728147253
A Bloom filter is a space-efficient bit array data structure that can test if a query element is in a set (i.e. true) or not (i.e. false). It returns positive (i.e. the query element is in a set) or negative (i.e. the query element is not in a set). The Bloom filter may have false positive errors, because it returns positive for a false query. This paper focuses on the pattern test using Bloom filters such that a set P of strings (or patterns) is registered in bit arrays. For an input string T, the Bloom filter detects all substrings in T that are matched with one of the patterns in T. The main contribution of this paper is to present a new Bloom filter that we call folded Bloom filter and to implement it on the GPU for the pattern test. The folded Bloom filter is designed and implemented so that GPU memory access performance and parallelism are maximized. Our GPU implementation of the folded Bloom filter on NVIDIA Tesla V100 GPU can perform the pattern test for a set P with 744M (= 744, 261, 120) patterns in 25.6Gbps throughput with false positive probability 1.20x10(-18). As far as we know, no previous Bloom filter implementation can attain such high throughput for a large set of patterns with quite small false positive probability.
Because of the speed and data rates of time-resolved experiments at facilities such as synchrotron beamlines, automation is critical during time-resolved experiments. In 3D imaging experiments like microCT (mu CT), th...
详细信息
ISBN:
(纸本)9781728159836
Because of the speed and data rates of time-resolved experiments at facilities such as synchrotron beamlines, automation is critical during time-resolved experiments. In 3D imaging experiments like microCT (mu CT), this includes recognizing features of interest and "zooming in" spatially and temporally to those features;ideally without requiring advanced information about which features are being imaged. Digital Volume Correlation (DVC) can achieve this by measuring the deformation field between images, but has not been used during autonomous experiments because of the scalability of the codes. In this work, we propose a model for global DVC and a parallel algorithm for solving it for large-scale images, suitable for giving feedback for autonomous experiments at synchrotron-based microCT beamlines. In particular, we leverage recent advancements in entropy-regularized optimal transport to develop efficient, simple-to-implement, parallel algorithms which scale linearly (O(N)) in space and time, where N is the number of voxels, and well with an increasing number of processors. As a demonstration, we compute the deformation field for every voxel from a mu CT volume with dimensions 2560x2560x2160. We discuss implementation details, drawbacks and future directions.
Abstract: Using a conservative numerical method, a flow of viscous heat-conducting gas in the diffusor part of a spatial axisymmetric nozzle with a slanting exit section and partially overlapped critical section is si...
详细信息
暂无评论