In this paper, we develop a least-squares support vector machine (LS-SVM) for solving a nonlinear fractional-order Volterra's population model in a closed system. The fractional rational Legendre functions with an...
详细信息
In this paper, we develop a least-squares support vector machine (LS-SVM) for solving a nonlinear fractional-order Volterra's population model in a closed system. The fractional rational Legendre functions with an orthogonal property on a semi-infinite domain have been used as the kernel of LS-SVM. Learning the solution is done by solving a non-linear constrained optimization problem. To accelerate the learning process, we propose two different approaches based on the orthogonality of kernels and a shared-memory task parallelization scheme for multi-core systems. By carrying out several experiments, it is seen that the proposed approaches provide accurate solutions for fractional-order Volterra's population model. (C) 2021 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria University.
Computing an arrangement of segments with some geometrical and topological guarantees is a critical step in many geometry processing applications. In this paper, we propose a method to efficiently compute arrangements...
详细信息
Computing an arrangement of segments with some geometrical and topological guarantees is a critical step in many geometry processing applications. In this paper, we propose a method to efficiently compute arrangements of segments using a strip-based data structure. Thanks to this new data structure, the arrangement computation algorithm can easily be parallelized as the per strip computations are independent. Another interest of our approach is that we can propose an out-of-core and streamed construction for large datasets, while keeping a low memory footprint. We prove the correctness of our structure and provide a complete comparative evaluation with respect to state-of-the-art demonstrating the interest of our construction for the computation of an exact arrangement.
Based on a fully overlapping domain decomposition approach, a parallel stabilized finite element variational multiscale method for the incompressible Navier-Stokes equations is proposed, where the stabilizations both ...
详细信息
Based on a fully overlapping domain decomposition approach, a parallel stabilized finite element variational multiscale method for the incompressible Navier-Stokes equations is proposed, where the stabilizations both for the velocity and pressure are based on two local Gauss integrations at the element level. The basic idea of the method is to use a locally refined global mesh to compute a stabilized solution in the given subdomain of interest. The proposed method only requires the application of an existing Navier-Stokes sequential solver on the locally refined global mesh associated with each subdomain, and thus can reuse the existing sequential solver without substantial recoding. Error bound of the approximate solutions from the proposed method is estimated with the use of local a priori error estimate for the stabilized solution. algorithmic parameter scalings of the method are also derived. Some numerical simulations are presented to demonstrate the effectiveness of the method. (C) 2020 IMACS. Published by Elsevier B.V. All rights reserved.
The standard lattice Boltzmann method, which employs certain regular lattices coupled with discrete velocities as the computational grid, is limited in its flexibility to simulate flows in irregular geometries. To sim...
详细信息
The standard lattice Boltzmann method, which employs certain regular lattices coupled with discrete velocities as the computational grid, is limited in its flexibility to simulate flows in irregular geometries. To simulate large-scale complex flows, we present a cell-centered finite volume lattice Boltzmann method for incompressible flows on three-dimensional (3D) unstructured grids and its corresponding parallel algorithm. The advective fluxes are calculated by the low-diffusion Roe scheme, and the gradients of the particle distribution functions are computed with a least squares method. The presented scheme is validated by three benchmark flows: (a) a 3D Poiseuille flow, (b) cubic cavity flows with Reynolds numbers Re = 100 and 400, and (c) flows past a sphere with Re = 50, 100, 150, 200, and 250. Some parallel performance results are presented to show the scalability of the method, which reveal that the proposed parallel algorithm has considerable scalability and that the parallel efficiency is higher than 87% on 3840 processor cores. It can be seen that the presented parallel solver has significant potential for the accurate simulation of flows in complex 3D geometries.
As the distributed computing systems have been widely used in many research and industrial areas, the problem of allocating tasks to available processors in the system efficiently has been an important concern. Since ...
详细信息
As the distributed computing systems have been widely used in many research and industrial areas, the problem of allocating tasks to available processors in the system efficiently has been an important concern. Since the problem is proven to be NP-hard, heuristic-based optimization techniques have been proposed to solve the task allocation problem. Particularly, the current cloud-based systems have been grown massively requiring multiple features like lower cost, higher reliability, and higher throughput;therefore, the problem has become more challenging and approximate methods have gained more importance. Migrating birds optimization (MBO) algorithm offers successful solutions, especially for quadratic assignment problems. Inspired by the movement of the birds, it exhibits good results by its population-based approach . Since the algorithm needs to deal with many individuals in the population, and the neighbor solution generation phase takes substantial time for large problem instances, we need parallelism to have execution time improvements and make the algorithm practical for large-scale problems. In this work, we propose a scalable parallel implementation of the MBO algorithm, PMBO, for the multi-objective task allocation problem. We redesigned the implementation of the MBO algorithm so that its computationally heavy independent tasks are executed concurrently in separate threads. We compare our implementation with three parallel island-based approaches. The experimental results demonstrate that our implementation exhibits substantial solution quality improvements for difficult problem instances as the computing resources, namely parallelism, increase. Our scalability analysis also presents that higher parallelism levels offer larger solution improvement for the PMBO over the island-based parallel implementations on very hard problem instances.
This paper discusses efficient parallel algorithms for obtaining strong lower bounds and exact solutions for large instances of the quadratic assignment problem (QAP). Our parallel architecture is comprised of both mu...
详细信息
This paper discusses efficient parallel algorithms for obtaining strong lower bounds and exact solutions for large instances of the quadratic assignment problem (QAP). Our parallel architecture is comprised of both multicore processors and compute unified device architecture-enabled NVIDIA graphics processing units (GPUs) on the Blue Waters Supercomputing Facility at the University of Illinois at Urbana-Champaign. We propose novel parallelization of the Lagrangian dual ascent algorithm on the GPUs, which is used for solving a QAP formulation based on the level-2 reformulation linearization technique. The linear assignment subproblems in this procedure are solved using our accelerated Hungarian algorithm [Date K, Rakesh N (2016) GPU-accelerated Hungarian algorithms for the linear assignment problem. parallel Computing 57:52-72.]. We embed this accelerated dual-ascent algorithm in a parallel branch-and-bound scheme and conduct extensive computational experiments on single and multiple GPUs, using problem instances with up to 42 facilities from the quadratic assignment problem library (QAPLIB). The experiments suggest that our GPU-based approach is scalable, and it can be used to obtain tight lower bounds on large QAP instances. Our accelerated branch-and-bound scheme is able to comfortably solve Nugent and Taillard instances (up to 30 facilities) from the QAPLIB, using a modest number of GPUs.
Modular exponentiation, an operation widely utilized in cryptographic protocols to transfer text and other forms of data, can also be applied to Internet-of-Things (IoT) devices with high security requirements. Howeve...
详细信息
Modular exponentiation, an operation widely utilized in cryptographic protocols to transfer text and other forms of data, can also be applied to Internet-of-Things (IoT) devices with high security requirements. However, due to the high resource consumption of modular exponentiation, IoT devices can face the problem of resource insufficient. Fortunately, the secure outsourcing scheme offers a new solution for resource-constrained devices. In this article, we apply a parallel secure outsourcing scheme to provide the possibility for modular exponentiation operation, which is used in the IoT devices. After that, the task of modular exponentiation is decomposed and we introduce the scheme in more detail. In addition, based on this scheme, we designed an extension scheme for RSA, providing enhanced security for IoT devices. Finally, the analysis of experimental results based on 512-4096 b of data indicates the superiority in scalability and time consumption over the previous schemes.
Prony's method is a standard tool exploited for solving many imaging and data analysis problems that result in parameter identification in sparse exponential sums f(k) = Sigma(M)(j=1) c(j)(e-2 pi i ), k is an elem...
详细信息
Prony's method is a standard tool exploited for solving many imaging and data analysis problems that result in parameter identification in sparse exponential sums f(k) = Sigma(M)(j=1) c(j)(e-2 pi i < tj, k >), k is an element of Z(d), where the parameters are pairwise different {t(j)}(j=1)(M) subset of [0, 1)(d), and{c(j)}(j=1)(M) subset of C\parallel algorithm are nonzero. The focus of our investigation is on a Prony's method variant based on a multivariate matrix pencil approach. The method constructs matrices S-1, ..., S-d from the sampling values, and their simultaneous diagonalization yields the parameters {t(j)}(j=1)(M). The parameters {c(j)}(j=1)(M) are computed as the solution of an linear least squares problem, where the matrix of the problem is determined by {tj}(j=1)(M). Since the method involves independent generation and manipulation of a certain number of matrices, there is an intrinsic capacity for parallelization of the whole computational process on several levels. Hence, we propose a parallel version of the Prony's method in order to increase its efficiency. The tasks concerning the generation of matrices are divided among the block of threads of the graphics processing unit (GPU) and the central processing unit (CPU), where heavier load is put on the GPU. From the algorithmic point of view, the CPU is dedicated to the more complex tasks of computing the singular value decomposition, the eigendecomposition, and the solution of the least squares problem, while the GPU is performing matrix-matrix multiplications and summations. With careful choice of algorithms solving the subtasks, the load between CPU and GPU is balanced. Besides the parallelization techniques, we are also concerned with some numerical issues, and we provide detailed numerical analysis of the method in case of noisy input data. Finally, we performed a set of numerical tests which confirm superior efficiency of the parallel algorithm and consistency of the forward error with the results of numeric
DBSCAN is a well-known density-based clustering algorithm to discover arbitrary shape clusters. While conceptually simple in serial, the algorithm is challenging to efficiently parallelize on manycore GPU architecture...
详细信息
ISBN:
(纸本)9798400708435
DBSCAN is a well-known density-based clustering algorithm to discover arbitrary shape clusters. While conceptually simple in serial, the algorithm is challenging to efficiently parallelize on manycore GPU architectures. Common pitfalls, such as asynchronous range query calls, result in high thread execution divergence in many implementations. In this paper, we propose a new framework for GPU-accelerated DBSCAN, and describe two tree-based algorithms within that framework. Both algorithms fuse the search for neighbors with updating cluster information, but differ in their treatment of dense regions of the data. We show that the time taken to compute clusters is at most twice that of determination of the neighbors. We compare the proposed algorithms with existing CPU and GPU implementations, and demonstrate their competitiveness and performance using a fast traversal structure (bounding volume hierarchy) for low dimensional data. We also show that the memory usage can be reduced by processing object neighbors dynamically without storing them.
In this paper, a novel numerical algorithm for efficient modeling of three-dimensional shape transformation governed by the modified Allen-Cahn (A-C) equation is developed, which has important significance for compute...
详细信息
In this paper, a novel numerical algorithm for efficient modeling of three-dimensional shape transformation governed by the modified Allen-Cahn (A-C) equation is developed, which has important significance for computer science and graphics technology. The new idea of the proposed method is as follows. Firstly, the operator splitting method is used to decompose the three-dimensional problem into a series of one-dimensional subproblems that can be solved in parallel in the same direction. Secondly, a temporal p-adaptive strategy, which is based on the extrapolation technique, is proposed to improve the convergence order in time and preserve the computational efficiency simultaneously. Finally, a parallel least distance modification technique is developed to force the discrete maximum bound principle. The proposed method achieves high precision and high efficiency at the same time. Numerical examples include the effectiveness of the p-adaptive method and the bound preserving least distance modification, and a series of complex three-dimensional shape transformation modelings.
暂无评论