检索结果-内蒙古大学图书馆

Scalability analysis of a two level domain decomposition approach in space and time solving data assimilation models

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2024年第10期36卷 e7937-e7937页

作者： Cacciapuoti, Rosalba D'Amore, Luisa Univ Naples Federico II Dept Math & Applicat Renato Caccioppoli Complesso Univ Monte St AngeloVia Cintia Naples Italy

We are concerned with the mapping on high performance hybrid architectures of a parallel software implementing a two level overlapping domain decomposition, that is, along space and time directions, of the four dimensional variational data assimilation model. The reference architecture belongs to the SCoPE (Sistema Cooperativo Per Elaborazioni scientifiche multidisciplinari) data center, located at University of Naples Federico II. We consider the initial boundary problem of the shallow water equation and analyse both strong and weak scaling. Keeping the efficiency always greater than 60%$$ 60\% $$ and about 90%$$ 90\% $$ in most cases, we experimentally find that the isoefficiency function grows a little more than linearly with respect to the number of processes. Results, obtained by using the parallel computing toolbox of MATLABR2013a, are in agreement with the algorithm's performance prevision based on the scale up factor, confirming the appropriate mapping of the algorithm on the hybrid architecture.

关键词： data assimilation parallel algorithm software mapping space-time decomposition variational methods

来源：评论

学校读者我要写书评

暂无评论

Optimal group gossiping in hypercubes under a circuit-switching model

引用

SIAM JOURNAL ON COMPUTING 1996年第5期25卷 1045-1060页

作者： Fujita, S Yamashita, M HIROSHIMA UNIV FAC ENGN DEPT ELECT ENGN KAGAMIYAMA 1-4-1 HIGASHIHIROSHIMA 739 JAPAN

Let U be a given set of nodes of a parallel computer system and assume that each node u in U has a piece of information t(u) called a token. This paper discusses the problem of each u is an element of U broadcasting its token t(u) to all nodes in U. We refer to this problem as the group-gossiping problem, which includes the (conventional) gossiping problem as a special case. In this paper, we consider the group-gossiping problem in n-cubes under a circuit-switching model and propose an optimal group-gossiping algorithm for n-cubes under the model.

关键词： parallel algorithm gossiping circuit-switching model optimal-time bound n-cubes

来源：评论

学校读者我要写书评

暂无评论

Research on the Construction of Financial Computing Model Based on BSDE algorithm

引用

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2023年第4期22卷

作者： Cai, Youli Chongqing Creat Vocat Coll Chongqing Dept Econ & Management Chongqing 402160 Peoples R China

Economic and social development has made financial engineering an increasingly important research area, and more and more financial problems cannot be solved directly by analytical formulas. In view of this, algorithms that apply computer technology to financial engineering have emerged. In this study, the Backward Stochastic Differential Equation (BSDE) algorithm is used to investigate and analyse the problem of option pricing calculation in finance. In the research process, GBSDE-Theta parallel algorithm composed of BSDE-Theta algorithm and GPU algorithm uses the new algorithm to establish a computing model in the financial engineering field, which applies to the calculation of enterprise option pricing. The research results show that compared with the basic algorithm, the actual option values of the option pricing data obtained by using the GBSDE-Theta parallel algorithm are more closely matched. The computational model can achieve a speedup ratio of about 230 times of the serial version with the number of time steps N=128 and the number of simulated paths 80,000. About the relative error of the GBSDE-Theta algorithm, there are 80 points within 3% and only 16 points over 3.00%, which is a relatively small error. The above results show that the financial computing system obtained in this study is highly feasible and effective, and can provide a new research idea for the progress and development of other computations in the financial field.

关键词： BSDE algorithm parallel algorithm option pricing financial computing

来源：评论

学校读者我要写书评

暂无评论

CONFORMING IDENTIFICATION OF THE FUNDAMENTAL MATRIX IN THE IMAGE MATCHING PROBLEM

COMPUTER OPTICS

引用

COMPUTER OPTICS 2017年第4期41卷 559-563页

作者： Fursov, V. A. Gavrilov, A. V. Goshin, Ye. V. Pugachev, K. G. Samara Natl Res Univ Supercomp & Gen Informat Subdept Samara Russia Samara Natl Res Univ Samara Russia RAS Image Proc Syst Inst Branch FSRC Crystallog & Photon Samara Russia

The article considers the conforming identification of the fundamental matrix in the image matching problem. The method consists in the division of the initial overdetermined system into lesser dimensional subsystems. On these subsystems, a set of solutions is obtained, from which a subset of the most conforming solutions is defined. Then, on this subset the resulting solution is deduced. Since these subsystems are formed by all possible combinations of rows in the initial system, this method demonstrates high accuracy and stability, although it is computationally complex. A comparison with the methods of least squares, least absolute deviations, and the RANSAC method is drawn.

关键词： conforming identification parallel algorithm least squares method least absolute deviations epipolar geometry projective geometry

来源：评论

学校读者我要写书评

暂无评论

Faster algorithms for RNA-folding using the Four-Russians method

引用

algorithmS FOR MOLECULAR BIOLOGY 2014年第1期9卷 5-5页

作者： Venkatachalam, Balaji Gusfield, Dan Frid, Yelena Univ Calif Davis Dept Comp Sci Davis CA 95616 USA

Background: The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n(3)) time using Nussinov's dynamic programming algorithm. The Four-Russians method is a technique that reduces the running time for certain dynamic programming algorithms by a multiplicative factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated and solved. Frid and Gusfield designed an O(n(3)/log n) algorithm for RNA folding using the Four-Russians technique. In their algorithm the preprocessing is interleaved with the algorithm computation. Theoretical results: We simplify the algorithm and the analysis by doing the preprocessing once prior to the algorithm computation. We call this the two-vector method. We also show variants where instead of exhaustive preprocessing, we only solve the subproblems encountered in the main algorithm once and memoize the results. We give a simple proof of correctness and explore the practical advantages over the earlier method. The Nussinov algorithm admits an O(n(2)) time parallel algorithm. We show a parallel algorithm using the two-vector idea that improves the time bound to O(n(2)/log n). Practical results: We have implemented the parallel algorithm on graphics processing units using the CUDA platform. We discuss the organization of the data structures to exploit coalesced memory access for fast running times. The ideas to organize the data structures also help in improving the running time of the serial algorithms. For sequences of length up to 6000 bases the parallel algorithm takes only about 2.5 seconds and the two-vector serial method takes about 57 seconds on a desktop and 15 seconds on a server. Among the serial algorithms, the two-vector and memoized versions are faster than the Frid-Gusfield algorithm by a factor of 3, and are faster than Nussinov by up to a factor of 20. The source-code f

关键词： RNA-folding Four-Russians CUDA algorithms parallel algorithm GPU

来源：评论

学校读者我要写书评

暂无评论

Efficient selection algorithms on distributed memory computers 98

Efficient selection algorithms on distributed memory compute...

引用

Proceedings of the 1998 ACM/IEEE conference on Supercomputing

作者： E. L. G. Saukas S. W. Song University of São Paulo São Paulo SP 05508-900 Brazil

ISBN: (纸本)9780897919845

Consider the selection problem of determining the k th smallest element of a sequence of n elements. Under the CGM (Coarse Grained Multicomputer) model with p processors and O(n/p) local memory, we present a deterministic parallel algorithm for the selection problem that requires O(log p) communication rounds. Besides requiring a low number of communication rounds, the algorithm also attempts to minimize the total amount of data transmitted in each round (only O(p) except in the last round). The basic algorithm is then extended to solve the problem of q simultaneous selections using the same input sequence, also in O(log p) communication rounds and asymptotically same local computing time (if q = O(p) ). The simultaneous selection algorithm gives rise to a communication efficient sorting algorithm, with O(log p) communication rounds and a total of O(p 2) data transmitted in each round except in the last one. In addition to showing theoretical complexities, we present very promising experimental results obtained on two parallel machines that show almost linear speedup, indicating the efficiency and scalability of the proposed algorithms. To our knowledge, this is the best deterministic CGM algorithm in the literature for the selection problem.

关键词： parallel algorithm sorting coarse grained multicomputer selection problem simultaneous selection

来源：评论

学校读者我要写书评

暂无评论

Graphics processing unit-accelerated high-quality watercolor painting image generation

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2023年第19期35卷

作者： Huang, Jiamian Ito, Yasuaki Nakano, Koji Hiroshima Univ Grad Sch Adv Sci & Engn Kagamiyama 1-4-1 Higashihiroshima Hiroshima Japan

Stroke-based rendering is a rendering method that mimics the actual painting technique by drawing a stroke by stroke on a blank canvas image. In this paper, we propose a watercolor image generation method using stroke-based rendering. The proposed method generates an image that is a good approximation of the input image as well as having the characteristics of a watercolor painting by repeatedly painting strokes while referring to the input image. To generate a high-quality image, that is, an image that closely resembles an actual watercolor painting, various techniques are employed: modeling of watercolor paper, detailed physical simulation of the movement of water and pigment, strokes using a brush model, among others. The proposed method generates a large number of strokes and performs computationally intensive watercolor simulations for each stroke. Therefore, this paper also presents its parallel algorithm using a Graphics Processing Unit (GPU). We implemented this parallel algorithm on an NVIDIA A100 GPU. The experimental results show that the CPU implementations with sequential and parallel executions take 34,651 and 867 s to generate a 4K-watercolor image of size 3840x2144$$ 3840\times 2144 $$, respectively. In contrast, the GPU implementation with parallel execution succeeded in reducing the time to 44 s.

关键词： GPU parallel algorithm stroke-based rendering watercolor simulation

来源：评论

学校读者我要写书评

暂无评论

A cost optimal search technique for the knapsack problem

引用

INTERNATIONAL JOURNAL OF HIGH SPEED COMPUTING 1997年第1期9卷 1-12页

作者： Lou, DC Chaing, CC NATL CHUNG CHENG UNIV INST COMP SCI & INFORMAT ENGNCHIAYI 621TAIWAN

The knapsack problem is known to be a typical NP-complete problem, which has 2(n) possible solutions to search over. Thus a task for solving the knapsack problem can be accomplished in 2(n) trials if an exhaustive search is applied. In the past decade, much effort has been devoted in order to reduce the computation time of this problem instead of exhaustive search. In 1984, Karnin proposed a brilliant parallel algorithm, which needs O(2(n/6)) processors to solve the knapsack problem in O(2(n/2)) time;that is, the cost of Karnin's parallel algorithm is O(2(2n/3)). In this paper, we propose a fast search technique to improve Karnin's parallel algorithm by reducing the search time complexity of Karnin's parallel algorithm to be O(2(n/3)) under the same O(2(n/6)) processors available. Thus, the cost of the proposed parallel algorithm is O(2(n/2)). Furthermore, we extend this search technique to the case that the number of available processors is P = O(2(x)), where x greater than or equal to 1. From the analytical results, we see that our search technique is indeed superior to the previously proposed methods. We do believe our proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.

关键词： knapsack problem NP-complete problem parallel algorithm cryptosystem

来源：评论

学校读者我要写书评

暂无评论

GPU-based computation of the integral image

GPU-based computation of the integral image

引用

第十一届中国虚拟现实大会(ICVRV2011)

作者： Wei Huang Ling-Da Wu You-Gen Zhang Science and Technology on Information Systems Engineering Laboratory National University of Defense Science and Technology on Information Systems Engineering Laboratory National University of Defense

The integral image can be used to quickly complete common pixel-level operations in the regular region of the grey-level image. So it has been widely used in the field of computer vision and pattern recognition. In this paper, we firstly present an intuitive parallel method to compute the integral image. Then based on the intuitive method, a two-stage method based on the binary tree is introduced. In each stage of the algorithm, we do a firstly top-down and secondly bottom-up traversal over the tree. Finally, we analyze the case of large-scale grey-level image and optimize the computation based on the CUDA architecture. We have done the experiment in the consumer-level PC hardware which shows that the GPU-based algorithm outperforms the corresponded CPU-based algorithm in terms of speed in case of large-scale images.

关键词： integral image GPU binary tree parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Matrix Multiplication using r-Train Data Structure

Matrix Multiplication using r-Train Data Structure

引用

The 2013 AASRI Conference on parallel and Distributed Computing and Systems(DCS 2013)

作者： Bashir Alam Department of Computer Engineering Jamia Millia Islamia New Delhi

A new dynamic data structure has been proposed recently in *** are several algorithms for matrix *** none of them has used r-train data structure for storing and multiplying the *** this paper algorithm for matrix mul... 详细信息

关键词： R-Train SIMD parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：