检索结果-内蒙古大学图书馆

parallel LS-SVM for the numerical simulation of fractional Volterra's population model

ALEXANDRIA ENGINEERING JOURNAL 2021年第6期60卷 5637-5647页

作者： Parand, K. Aghaei, A. A. Jani, M. Ghodsi, A. Shahid Beheshti Univ Fac Math Sci Dept Comp Sci Gc Tehran Iran Shahid Beheshti Univ Inst Cognit & Brain Sci Dept Cognit Modeling Gc Tehran Iran Univ Waterloo Dept Stat & Actuarial Sci Waterloo ON Canada Univ Waterloo David R Cheriton Sch Comp Sci Waterloo ON Canada

In this paper, we develop a least-squares support vector machine (LS-SVM) for solving a nonlinear fractional-order Volterra's population model in a closed system. The fractional rational Legendre functions with an orthogonal property on a semi-infinite domain have been used as the kernel of LS-SVM. Learning the solution is done by solving a non-linear constrained optimization problem. To accelerate the learning process, we propose two different approaches based on the orthogonality of kernels and a shared-memory task parallelization scheme for multi-core systems. By carrying out several experiments, it is seen that the proposed approaches provide accurate solutions for fractional-order Volterra's population model. (C) 2021 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria University.

关键词： Least squares support vector machine Volterra's population model Fractional derivative Collocation LS-SVR parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Stripped halfedge data structure for parallel computation of arrangements of segments

引用

VISUAL COMPUTER 2021年第9-11期37卷 2461-2472页

作者： Damiand, Guillaume Coeurjolly, David Bourquat, Pierre Univ Lyon UCBL LIRIS INSALyonCNRSUMR5205 F-69622 Lyon France

Computing an arrangement of segments with some geometrical and topological guarantees is a critical step in many geometry processing applications. In this paper, we propose a method to efficiently compute arrangements of segments using a strip-based data structure. Thanks to this new data structure, the arrangement computation algorithm can easily be parallelized as the per strip computations are independent. Another interest of our approach is that we can propose an out-of-core and streamed construction for large datasets, while keeping a low memory footprint. We prove the correctness of our structure and provide a complete comparative evaluation with respect to state-of-the-art demonstrating the interest of our construction for the computation of an exact arrangement.

关键词： Arrangement of segments parallel algorithm Out-of-core construction Halfedge data structure

来源：评论

学校读者我要写书评

暂无评论

A parallel stabilized finite element variational multiscale method based on fully overlapping domain decomposition for the incompressible Navier-Stokes equations

引用

APPLIED NUMERICAL MATHEMATICS 2021年 159卷 138-158页

作者： Zheng, Bo Shang, Yueqiang Southwest Univ Sch Math & Stat Chongqing 400715 Peoples R China

Based on a fully overlapping domain decomposition approach, a parallel stabilized finite element variational multiscale method for the incompressible Navier-Stokes equations is proposed, where the stabilizations both for the velocity and pressure are based on two local Gauss integrations at the element level. The basic idea of the method is to use a locally refined global mesh to compute a stabilized solution in the given subdomain of interest. The proposed method only requires the application of an existing Navier-Stokes sequential solver on the locally refined global mesh associated with each subdomain, and thus can reuse the existing sequential solver without substantial recoding. Error bound of the approximate solutions from the proposed method is estimated with the use of local a priori error estimate for the stabilized solution. algorithmic parameter scalings of the method are also derived. Some numerical simulations are presented to demonstrate the effectiveness of the method. (C) 2020 IMACS. Published by Elsevier B.V. All rights reserved.

关键词： Incompressible Navier-Stokes equations Finite element Stabilized method Variational multiscale method parallel algorithm Domain decomposition

来源：评论

学校读者我要写书评

暂无评论

A scalable parallel unstructured finite volume lattice Boltzmann method for three-dimensional incompressible flow simulations

引用

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS 2021年第8期93卷 2744-2762页

作者： Xu, Lei Li, Jingzhi Chen, Rongliang Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Peoples R China Shenzhen Key Lab Exascale Engn & Sci Comp Shenzhen Peoples R China Southern Univ Sci & Technol Dept Math Shenzhen 518055 Peoples R China

The standard lattice Boltzmann method, which employs certain regular lattices coupled with discrete velocities as the computational grid, is limited in its flexibility to simulate flows in irregular geometries. To simulate large-scale complex flows, we present a cell-centered finite volume lattice Boltzmann method for incompressible flows on three-dimensional (3D) unstructured grids and its corresponding parallel algorithm. The advective fluxes are calculated by the low-diffusion Roe scheme, and the gradients of the particle distribution functions are computed with a least squares method. The presented scheme is validated by three benchmark flows: (a) a 3D Poiseuille flow, (b) cubic cavity flows with Reynolds numbers Re = 100 and 400, and (c) flows past a sphere with Re = 50, 100, 150, 200, and 250. Some parallel performance results are presented to show the scalability of the method, which reveal that the proposed parallel algorithm has considerable scalability and that the parallel efficiency is higher than 87% on 3840 processor cores. It can be seen that the presented parallel solver has significant potential for the accurate simulation of flows in complex 3D geometries.

关键词： discrete velocity lattice Boltzmann equation finite volume method incompressible flows parallel algorithm unstructured grids

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel implementation of migrating birds optimization for the multi-objective task allocation problem

引用

JOURNAL OF SUPERCOMPUTING 2021年第3期77卷 2689-2712页

作者： Oz, Dindar Oz, Isil Yasar Univ Software Engn Dept Izmir Turkey Izmir Inst Technol Comp Engn Dept Izmir Turkey

As the distributed computing systems have been widely used in many research and industrial areas, the problem of allocating tasks to available processors in the system efficiently has been an important concern. Since the problem is proven to be NP-hard, heuristic-based optimization techniques have been proposed to solve the task allocation problem. Particularly, the current cloud-based systems have been grown massively requiring multiple features like lower cost, higher reliability, and higher throughput;therefore, the problem has become more challenging and approximate methods have gained more importance. Migrating birds optimization (MBO) algorithm offers successful solutions, especially for quadratic assignment problems. Inspired by the movement of the birds, it exhibits good results by its population-based approach . Since the algorithm needs to deal with many individuals in the population, and the neighbor solution generation phase takes substantial time for large problem instances, we need parallelism to have execution time improvements and make the algorithm practical for large-scale problems. In this work, we propose a scalable parallel implementation of the MBO algorithm, PMBO, for the multi-objective task allocation problem. We redesigned the implementation of the MBO algorithm so that its computationally heavy independent tasks are executed concurrently in separate threads. We compare our implementation with three parallel island-based approaches. The experimental results demonstrate that our implementation exhibits substantial solution quality improvements for difficult problem instances as the computing resources, namely parallelism, increase. Our scalability analysis also presents that higher parallelism levels offer larger solution improvement for the PMBO over the island-based parallel implementations on very hard problem instances.

关键词： parallel algorithm Combinatorial optimization Task allocation problem Migrating birds optimization

来源：评论

学校读者我要写书评

暂无评论

Level 2 Reformulation Linearization Technique-Based parallel algorithms for Solving Large Quadratic Assignment Problems on Graphics Processing Unit Clusters

引用

INFORMS JOURNAL ON COMPUTING 2019年第4期31卷 771-789页

作者： Date, Ketan Nagi, Rakesh Univ Illinois Dept Ind & Enterprise Syst Engn Urbana IL 61801 USA

This paper discusses efficient parallel algorithms for obtaining strong lower bounds and exact solutions for large instances of the quadratic assignment problem (QAP). Our parallel architecture is comprised of both multicore processors and compute unified device architecture-enabled NVIDIA graphics processing units (GPUs) on the Blue Waters Supercomputing Facility at the University of Illinois at Urbana-Champaign. We propose novel parallelization of the Lagrangian dual ascent algorithm on the GPUs, which is used for solving a QAP formulation based on the level-2 reformulation linearization technique. The linear assignment subproblems in this procedure are solved using our accelerated Hungarian algorithm [Date K, Rakesh N (2016) GPU-accelerated Hungarian algorithms for the linear assignment problem. parallel Computing 57:52-72.]. We embed this accelerated dual-ascent algorithm in a parallel branch-and-bound scheme and conduct extensive computational experiments on single and multiple GPUs, using problem instances with up to 42 facilities from the quadratic assignment problem library (QAPLIB). The experiments suggest that our GPU-based approach is scalable, and it can be used to obtain tight lower bounds on large QAP instances. Our accelerated branch-and-bound scheme is able to comfortably solve Nugent and Taillard instances (up to 30 facilities) from the QAPLIB, using a modest number of GPUs.

关键词： quadratic assignment problem linear assignment problem branch-and-bound parallel algorithm graphics processing unit CUDA RLT2

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel Secure Outsourcing of Modular Exponentiation to Cloud for IoT Applications

引用

IEEE INTERNET OF THINGS JOURNAL 2021年第16期8卷 12782-12791页

作者： Hu, Qilin Duan, Mingxing Yang, Zhibang Yu, Siyang Xiao, Bin Hunan Univ Coll Comp Sci & Elect Engn Changsha 410082 Peoples R China Changsha Univ Hunan Prov Key Lab Ind Internet Technol & Secur Changsha 410022 Peoples R China Hunan Univ Finance & Econ Coll Informat Technol & Management Changsha 410000 Peoples R China Hong Kong Polytech Univ Dept Comp Hong Kong Peoples R China

Modular exponentiation, an operation widely utilized in cryptographic protocols to transfer text and other forms of data, can also be applied to Internet-of-Things (IoT) devices with high security requirements. However, due to the high resource consumption of modular exponentiation, IoT devices can face the problem of resource insufficient. Fortunately, the secure outsourcing scheme offers a new solution for resource-constrained devices. In this article, we apply a parallel secure outsourcing scheme to provide the possibility for modular exponentiation operation, which is used in the IoT devices. After that, the task of modular exponentiation is decomposed and we introduce the scheme in more detail. In addition, based on this scheme, we designed an extension scheme for RSA, providing enhanced security for IoT devices. Finally, the analysis of experimental results based on 512-4096 b of data indicates the superiority in scalability and time consumption over the previous schemes.

关键词： Outsourcing Servers Cloud computing Task analysis Cryptography Internet of Things Cloud computing Internet of Things (IoT) modular exponentiation parallel algorithm secure outsourcing

来源：评论

学校读者我要写书评

暂无评论

parallel PRONY'S METHOD WITH MULTIVARIATE MATRIX PENCIL APPROACH AND ITS NUMERICAL ASPECTS

引用

SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS 2021年第2期42卷 635-658页

作者： Bosner, Nela Univ Zagreb Fac Sci Dept Math Zagreb 10000 Croatia

Prony's method is a standard tool exploited for solving many imaging and data analysis problems that result in parameter identification in sparse exponential sums f(k) = Sigma(M)(j=1) c(j)(e-2 pi i < tj, k >), k is an element of Z(d), where the parameters are pairwise different {t(j)}(j=1)(M) subset of [0, 1)(d), and{c(j)}(j=1)(M) subset of C\parallel algorithm are nonzero. The focus of our investigation is on a Prony's method variant based on a multivariate matrix pencil approach. The method constructs matrices S-1, ..., S-d from the sampling values, and their simultaneous diagonalization yields the parameters {t(j)}(j=1)(M). The parameters {c(j)}(j=1)(M) are computed as the solution of an linear least squares problem, where the matrix of the problem is determined by {tj}(j=1)(M). Since the method involves independent generation and manipulation of a certain number of matrices, there is an intrinsic capacity for parallelization of the whole computational process on several levels. Hence, we propose a parallel version of the Prony's method in order to increase its efficiency. The tasks concerning the generation of matrices are divided among the block of threads of the graphics processing unit (GPU) and the central processing unit (CPU), where heavier load is put on the GPU. From the algorithmic point of view, the CPU is dedicated to the more complex tasks of computing the singular value decomposition, the eigendecomposition, and the solution of the least squares problem, while the GPU is performing matrix-matrix multiplications and summations. With careful choice of algorithms solving the subtasks, the load between CPU and GPU is balanced. Besides the parallelization techniques, we are also concerned with some numerical issues, and we provide detailed numerical analysis of the method in case of noisy input data. Finally, we performed a set of numerical tests which confirm superior efficiency of the parallel algorithm and consistency of the forward error with the results of numeric

关键词： Prony's method parallel algorithm efficient GPU-CPU implementation numerical analysis

来源：评论

学校读者我要写书评

暂无评论

Fast tree-based algorithms for DBSCAN for low-dimensional data on GPUs 23

Fast tree-based algorithms for DBSCAN for low-dimensional da...

引用

52nd International Conference on parallel Processing (ICPP)

作者： Prokopenko, Andrey Lebrun-Grandie, Damien Arndt, Daniel Oak Ridge Natl Lab Oak Ridge TN 37830 USA

ISBN: (纸本)9798400708435

DBSCAN is a well-known density-based clustering algorithm to discover arbitrary shape clusters. While conceptually simple in serial, the algorithm is challenging to efficiently parallelize on manycore GPU architectures. Common pitfalls, such as asynchronous range query calls, result in high thread execution divergence in many implementations. In this paper, we propose a new framework for GPU-accelerated DBSCAN, and describe two tree-based algorithms within that framework. Both algorithms fuse the search for neighbors with updating cluster information, but differ in their treatment of dense regions of the data. We show that the time taken to compute clusters is at most twice that of determination of the neighbors. We compare the proposed algorithms with existing CPU and GPU implementations, and demonstrate their competitiveness and performance using a fast traversal structure (bounding volume hierarchy) for low dimensional data. We also show that the memory usage can be reduced by processing object neighbors dynamically without storing them.

关键词： DBSCAN bounding volume hierarchy parallel algorithm GPU

来源：评论

学校读者我要写书评

暂无评论

An efficient maximum bound principle preserving p-adaptive operator-splitting method for three-dimensional phase field shape transformation model

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2022年 120卷 78-91页

作者： Wang, Yan Xiao, Xufeng Feng, Xinlong Xinjiang Univ Coll Math & Syst Sci Urumqi 830046 Peoples R China

In this paper, a novel numerical algorithm for efficient modeling of three-dimensional shape transformation governed by the modified Allen-Cahn (A-C) equation is developed, which has important significance for computer science and graphics technology. The new idea of the proposed method is as follows. Firstly, the operator splitting method is used to decompose the three-dimensional problem into a series of one-dimensional subproblems that can be solved in parallel in the same direction. Secondly, a temporal p-adaptive strategy, which is based on the extrapolation technique, is proposed to improve the convergence order in time and preserve the computational efficiency simultaneously. Finally, a parallel least distance modification technique is developed to force the discrete maximum bound principle. The proposed method achieves high precision and high efficiency at the same time. Numerical examples include the effectiveness of the p-adaptive method and the bound preserving least distance modification, and a series of complex three-dimensional shape transformation modelings.

关键词： Shape transformation Operator splitting method p-adaptive algorithm Maximum bound principle parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：