检索结果-内蒙古大学图书馆

DENSE EDGE-DISJOINT EMBEDDING OF COMPLETE BINARY-TREES IN THE HYPERCUBE

INFORMATION PROCESSING LETTERS 1993年第6期45卷 321-325页

作者： RAVINDRAN, S GIBBONS, A Department of Computer Science University of Warwick Coventry CV4 7AL United Kingdom

We show that the complete binary tree with n > 8 leaves can be embedded in the hypercube with n nodes such that: paths of the tree are mapped onto edge-disjoint paths of the hypercube, at most two tree nodes (one of which is a leaf) are mapped onto each hypercube node, and the maximum distance from a leaf to the root of the tree is log2n + 1 hypercube edges (which is optimally short). This embedding facilitates efficient implementation of many P-RAM algorithms on the hypercube.

关键词： parallel algorithms GRAPH EMBEDDING BINARY TREE HYPERCUBE

来源：评论

学校读者我要写书评

暂无评论

NUMERICAL algorithms FOR THE HYPERCUBE CONCURRENT PROCESSOR

引用

MATHEMATICAL AND COMPUTER MODELLING 1988年第C期11卷 55-57页

作者： PATTERSON, JE MANSHADI, F CALALO, RH LIEWER, PC IMBRIALE, WA LYONS, JR CALTECH JET PROP LABPASADENACA 91109

With the development of concurrent computing architectures which promise cost-effective means of obtaining supercomputing performance, there is much interest in applying and in evaluating the actual performance on large, computationally-intensive problems. Of particular interest is the concurrent performance of large scale electromagnetic scattering problems. Two electromagnetic codes with differing underlying algorithms have been converted to run on the Mark III Hypercube. One is a time domain finite difference solution of Maxwell's equations to solve for scattered fields and the other is a frequency domain moment method solution. Important measures for demonstrating the utility of the parallel architecture are the size of the problem that can be solved and the efficiency by which the paralleling can increase the speed of execution.

关键词： Electromagnetic scattering parallel processing hypercubes parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Multi-threaded parallel tetrahedral mesh improvement by combining atomic operation and graph coloring

引用

ADVANCES IN ENGINEERING SOFTWARE 2024年 198卷

作者： Wang, Yifu Wang, Junji Wang, Bohan Wang, Yifei Chen, Jianjun Zhejiang Univ Sch Aeronaut & Astronaut Hangzhou 310027 Zhejiang Peoples R China

In industrial numerical simulations, efficiently generating high-quality tetrahedral meshes remains a significant challenge. Advances in high-performance computing have made parallelization a practical approach to improving the quality of large-scale tetrahedral meshes. This study proposes a fine-grained multithreaded parallel method to accelerate tetrahedral mesh improvement. By utilizing atomic operations, we fundamentally address thread safety concerns. Additionally, through the precise use of atomic operations, task decomposition strategies, and a multithreaded memory model, we minimize the probability of task overlap and data races, thereby enhancing overall parallel mesh improvement efficiency. Experimental results demonstrate that our parallel mesh improver is robust and effective for complex industrial models. On a laptop with 16 threads, we achieved a tenfold increase in tetrahedral mesh improvement speed, with the quality of the improved meshes being comparable to that of the sequential process.

关键词： parallel algorithms Tetrahedral mesh generation Quality improvement

来源：评论

学校读者我要写书评

暂无评论

BiqBin: A parallel Branch-and-bound Solver for Binary Quadratic Problems with Linear Constraints

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2022年第2期48卷 15-15页

作者： Gusmeroli, Nicolo Hrga, Timotej Luzar, Borut Povh, Janez Siebenhofer, Melanie Wiegele, Angelika Alpen Adria Univ Klagenfurt Univ Str 65-67 A-9020 Klagenfurt Austria Univ Ljubljana Fac Mech Engn Askerceva C 6 Ljubljana 1000 Slovenia Fac Informat Studies Novo Mesto Slovenia Fac Informat Studies Novo Mesto Ljubljanska Cesta 31a Novo Mesto 6038000 Slovenia

We present BiqBin, an exact solver for linearly constrained binary quadratic problems. Our approach is based on an exact penalty method to first efficiently transform the original problem into an instance of Max-Cut, and then to solve the Max-Cut problem by a branch-and-bound algorithm. All the main ingredients are carefully developed using new semidefinite programming relaxations obtained by strengthening the existing relaxations with a set of hypermetric inequalities, applying the bundle method as the bounding routine and using new strategies for exploring the branch-and-bound tree. Furthermore, an efficient C implementation of a sequential and a parallel branch-and-bound algorithm is presented. The latter is based on a load coordinator-worker scheme using MPI for multi-node parallelization and is evaluated on a high-performance computer. The new solver is benchmarked against BiqCrunch, GUROBI, and SCIP on four families of (linearly constrained) binary quadratic problems. Numerical results demonstrate that BiqBin is a highly competitive solver. The serial version outperforms the other three solvers on the majority of the benchmark instances. We also evaluate the parallel solver and show that it has good scaling properties. The general audience can use it as an on-line service available at http://***.

关键词： Solvers semidefinite programming binary quadratic programming parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A new parallel adaptive clustering and its application to streaming data

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2023年 66卷

作者： McLaughlin, Benjamin Ha Kang, Sung Asbury Univ Shaw Sch Sci 1 Macklem Dr Wilmore KY 40390 USA Georgia Inst Technol Sch Math 686 Cherry St NW Atlanta GA 30332 USA

This paper presents a parallel adaptive clustering (PAC) algorithm to automatically classify data while simultaneously choosing a suitable number of classes. Clustering is an important tool for data analysis and understanding in a broad set of areas including data reduction, pattern analysis, and classification. However, the requirement to specify the number of clusters in advance and the computational burden associated with clustering large sets of data persist as challenges in clustering. We propose a new parallel adaptive clustering (PAC) algorithm that addresses these challenges by adaptively computing the number of clusters and leveraging the power of parallel computing. The algorithm clusters disjoint subsets of the data on parallel computation threads. We develop regularized set k-means to efficiently cluster the results from the parallel threads. A refinement step further improves the clusters. The PAC algorithm offers the capability to adaptively cluster data sets which change over time by reusing the information from previous time steps to decrease computation. We provide theoretical analysis and numerical experiments to characterize the performance of the method, validate its properties, and demonstrate the computational efficiency of the method.

关键词： Unsupervised clustering k-means parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Data Distribution Management on Shared-memory Multiprocessors

引用

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION 2020年第1期30卷 5-5页

作者： Marzolla, Moreno D'angelo, Gabriele Univ Bologna Dept Comp Sci & Engn DISI Mura Anteo Zamboni 7 I-90126 Bologna Italy

The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification a standard framework for interoperability among simulators includes a Data Distribution Management (DDM) service whose responsibility is to report all intersections between a set of subscription and update regions. The algorithms at the core of the DDM service are CPU-intensive, and could greatly benefit from the large computing power of modern multi-core processors. In this article, we propose two parallel solutions to the DDM problem that can operate effectively on shared-memory multiprocessors. The first solution is based on a data structure (the interval tree) that allows concurrent computation of intersections between subscription and update regions. The second solution is based on a novel parallel extension of the Sort Based Matching algorithm, whose sequential version is considered among the most efficient solutions to the DDM problem. Extensive experimental evaluation of the proposed algorithms confirm their effectiveness on taking advantage of multiple execution units in a shared-memory architecture.

关键词： Data distribution management (DDM) parallel and distributed simulation (PADS) high level architecture (HLA) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel decomposition of unstructured FEM-meshes

引用

CONCURRENCY-PRACTICE AND EXPERIENCE 1998年第1期10卷 53-72页

作者： Diekmann, R Meyer, D Monien, B Univ Gesamthsch Paderborn Dept Math & Comp Sci D-4790 Paderborn Germany

We present a parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes, The method consists of two parts, First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition, taking several cost functions into account which reflect the characteristics of the underlying hardware and the requirements of the numerical solution method supposed to run after the decomposition, The parallel algorithm first calculates the number of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing, In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount of necessary communication and the shapes of subdomains, The latter criterion is extremely important for the convergence behavior of certain numerical solution methods, especially for preconditioned conjugate gradient methods. The parallel parts of the method are implemented in C under Parix to run on the Parsytec GC systems, Results on up to 64 processors are presented and compared to those of other existing methods. (C) 1998 John Wiley & Sons, Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A new state space-based approach for the estimation of two-dimensional frequencies and its parallel implementations

引用

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES 1997年第6期E80A卷 1099-1108页

作者： Chu, Y Fang, WH Chang, SH Department of Electronic Engineering National Taiwan Institute of Technology Taipei Taiwan Department of Electrical Engineering National Taiwan Ocean University Keelung Taiwan

In this paper, we present a new state space-based approach for the two-dimensional (2-D) frequency estimation problem which occurs in various areas of signal processing and communication problems. The proposed method begins with the construction of a state space model associated with the noiseless data which contains a summation of 2-D harmonics. Two auxiliary Hankel-block-Hankel-like matrices are then introduced and from which the two frequency components can be derived via matrix factorizations along with frequency shifting properties. Although the algorithm can render high resolution frequency estimates, it also calls for lots of computations. To alleviate the high computational overhead required, a highly parallelizable implementation of it via the principle subband component (PSC) of some appropriately chosen transforms have been addressed as well. Such a PSC-based transform domain implementation not only reduces the size of data needed to be processed, but it also suppresses the contaminated noise outside the subband of interest. To reduce the computational complexity induced in the transformation process, we also suggest that either the transform of the discrete Fourier transform (DFT) or the Haar wavelet transform (HWT) be employed. As a consequence, such an approach of implementation can achieve substantial computational savings;meanwhile, as demonstrated by the provided simulation results, it still retains roughly the same performance as that of the original algorithm. A comparison with other existing algorithms has been made as well to justify the proposed approaches.

关键词： frequency estimation state space model parallel algorithms discrete Fourier transform discrete Haar wavelet transform

来源：评论

学校读者我要写书评

暂无评论

Provably optimal parallel transport sweeps on semi-structured grids

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2020年第0期407卷 109234-000页

作者： Adams, Michael P. Adams, Marvin L. Hawkins, W. Daryl Smith, Timmie Rauchwerger, Lawrence Amato, Nancy M. Bailey, Teresa S. Falgout, Robert D. Kunen, Adam Brown, Peter Texas A&M Univ Dept Nucl Engn 3133 TAMU College Stn TX 77843 USA Texas A&M Univ Dept Comp Sci & Engn 3112 TAMU College Stn TX 77843 USA Univ Illinois Dept Comp Sci Chicago IL 60680 USA Lawrence Livermore Natl Lab Livermore CA 94550 USA

We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on a class of grids in 2D and 3D Cartesian geometry that are regular at a coarse level but arbitrary within the coarse blocks. We describe these algorithms and show that they always execute the full eight-octant (or four-quadrant if 2D) sweep in the minimum possible number of stages for a given P-x x P-y x P-z, partitioning. Computational results confirm that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. Our PDT transport code has achieved approximately 68% parallel efficiency with > 1.5M parallel threads, relative to 8 threads, on a simple weak-scaling problem with only three energy groups, 10 directions per octant, and 4096 cells/thread. Our ARDRA code has achieved 71% efficiency with > 1.5M cores, relative to 16 cores, with 36 directions per octant and 48 energy groups. We demonstrate similar efficiencies with PDT on a realistic set of nuclear-reactor test problems, with unstructured meshes that resolve fine geometric details. These results demonstrate that discrete-ordinates transport sweeps can be executed with high efficiency using more than 10(6) parallel processes. (C) 2020 Published by Elsevier Inc.

关键词： parallel transport sweeps parallel algorithms STAPL Performance models Unstructured mesh Scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

Residual parallel Reynolds stress due to turbulence intensity gradient in tokamak plasmas

引用

PHYSICS OF PLASMAS 2010年第11期17卷 112309-112309页

作者： Gurcan, O. D. Diamond, P. H. Hennequin, P. McDevitt, C. J. Garbet, X. Bourdelle, C. Ecole Polytech CNRS Lab Phys Plasmas F-91128 Palaiseau France Natl Fus Res Inst WCI Ctr Fus Theory Taejon 305333 South Korea Univ Calif San Diego Ctr Momentum Transport & Flow Org La Jolla CA 92093 USA Univ Calif San Diego Ctr Astrophys & Space Sci La Jolla CA 92093 USA CEA IRFM F-13108 St Paul Les Durance France

A novel mechanism for driving residual stress in tokamak plasmas based on k(parallel to) symmetry breaking by the turbulence intensity gradient is proposed The physics of this mechanism is explained and its connection to the wave kinetic equation and the wave-momentum flux is described Applications to the H-mode pedestal in particular to internal transport barriers, are discussed Also, the effect of heat transport on the momentum flux is discussed (C) 2010 American Institute of Physics [doi 10 1063/1 3503624]

关键词： parallel algorithms REYNOLDS stress TURBULENCE TOKAMAKS PLASMA (Ionized gases) BROKEN symmetry (Physics) WAVE mechanics MOMENTUM (Mechanics) TRANSPORT theory (Mathematics)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：