检索结果-内蒙古大学图书馆

Improving vertex-frontier based GPU breadth-first search

Journal of Central South University 2014年第10期21卷 3828-3836页

作者：杨博卢凯高颖慧徐凯王小平程志权 Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology Department of Electronic Science and Engineering National University of Defense Technology Avatar Science Company

Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.

关键词： breadth-first search GPU graph traversal vertex frontier

来源：评论

学校读者我要写书评

暂无评论

Bisection technique for designing synchronous parallel algorithms

引用

science China Mathematics 1995年第5期38卷 635-640页

作者：王能超 Parallel Computation Research Institute Huazhong Uniwrsity of Science and Technology Wuchang 430074 China)

A basic technique for designing synchronous parallel algorithms, the so-called bisection technique, is proposed. The basic pattern of designing parallel algorithms is described. The relationship between the designing ... 详细信息

关键词： synchronous parallel algorithm recursive doubling bisection Taiji thinking I Ching.

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2 supercomputer： system and application

引用

Frontiers of Computer science 2014年第3期8卷 345-356页

作者： Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On June 17, 2013, MilkyWay-2 （Tianhe-2） supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

关键词： MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation

来源：评论

学校读者我要写书评

暂无评论

The TH Express high performance interconnect networks

引用

Frontiers of Computer science 2014年第3期8卷 357-366页

作者： Zhengbin PANG Min XIE Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

Interconnection network plays an important role in scalable high performance computer （HPC） systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

关键词： HPC network interface chip (NIC) TH Express nterconnect offload collective operation

来源：评论

学校读者我要写书评

暂无评论

Accelerated Selective Algebraic Multigrid Method for Fully-Coupled Incompressible Flow Solver

引用

International Journal of Computational Fluid Dynamics 2025年

作者： Liang, Yuechao Guo, Xiao-Wei Zhang, Qingyang Li, Chao Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha China College of Computer Science and Technology National University of Defense Technology Changsha China

The fully coupled pressure-based algorithm is widely recognised for its superior convergence and robustness in solving incompressible flow problems. However, the increased scale of equations and the difficulty in solving linear systems have limited the widespread use of this algorithm in large-scale simulations. This paper presents an optimised block selective algebraic multigrid method significantly reducing computational complexity. Our approach employs a parallel modified independent set algorithm, allowing each process to perform matrix coarsening individually. Furthermore, an aggressive coarsening strategy is introduced to reduce complexity and enable the solution of larger-scale problems. Numerical experiments demonstrate that the solution time is shortened by 14% to 49% compared to the latest existing methods and outperforms the segregated algorithm. By addressing the computational challenges associated with the selective algebraic multigrid solver, this work unleashes the superior convergence properties of the fully coupled method. © 2025 Informa UK Limited, trading as Taylor & Francis Group.

关键词： Convergence of numerical methods

来源：评论

学校读者我要写书评

暂无评论

Novel Mobility Formula for parallel Mechanisms Expressed with Mobility of General Link Group

引用

Chinese Journal of Mechanical Engineering 2013年第6期26卷 1082-1090页

作者： ZHANG Yitong LU Wenjuan MU Dejun YANG Yandong ZHANG Lijie ZENG Daxing Key Laboratory of Parallel Robot and Mechatronic System Yanshan University Key Laboratory of Advanced Forging & Stamping Technology and Science Yanshan University

The determination of virtual constraints is always one of the key and difficult problems in traditional mobility calculation. To make mobility calculation simple, considering avoiding virtual constraints, some new formulae have been presented, however these formulae can hardly intuitively reflect general link group＇s restrictions on output member and its influences on independence of output parameters, which is premise to the judgment of the properties of mobility. Towards the problem to reveal the intrinsic relationship between the degree of freedom（DOF） of a mechanism, the link group, and the dimension of output parameters, also to avoid determination of virtual constraint, based on the new concepts of the ＂DOF of general link group＂ and ＂node parameters＂, a new formula in the calculation of the mobility of mechanisms is presented that is expressed with DOFs of the general link groups and rank of motion parameters of base point of the output link. It is named GOM（mobility of groups and output parameter） formula. On the basis of new concepts of＂effective parameters＂ and ＂invalid parameters＂, a rule is put forward for solving the DOF of mechanisms with invalid parameters by GOM formula, that is, the base point parameters are the subset of effective parameters of link group. Thereafter, several examples are enumerated and the results coincide with the prototype data, which proves the validity of the proposed formula. Meanwhile, it is obtained that the necessary and sufficient condition for the judgment of output parameters independence is that each of the DOF of the link group is not less than zero. The proposed formula which is simple in calculation provides theoretical basis for the judgment of independence of output parameters and provides references for type synthesis of novel parallel mechanisms with independence requirements of their output parameters.

关键词： parallel mechanism degree of freedom general link group output parameters

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2： back to the world Top 1

引用

Frontiers of Computer science 2014年第3期8卷 343-344页

作者： Xiangke LIAO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On the 41st Top500 list announced in June 2013, the MilkyWay-2 system produced by National University of Defense technology （NUDT） in China won the first place with a LINPACK test result of 33.86 PFLOPS. It has been one and a half year since its predecessor, MilkyWay-1 （TH-1）, reached the same place for the first time. On the newest Top500 list published in November 2013, MilkyWay-2 continued to win the champion.

关键词：

来源：评论

学校读者我要写书评

暂无评论

The spline alternating group explicit (splage) method

引用

INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS 1997年第3-4期63卷 245-263页

作者： Evans, DJ Kadhum, NI Parallel Algorithm Research Centre University of Technology Loughborough Leics United Kingdom College of Science University of Sanàà Yemen

In this paper the AGE iterative method is applied to the set of linear 3 term recurrence equations derived from the cubic spline approximations to the one dimensional diffusion equation. Convergence and stability for the method is proved and the derivation and existence of the optimal acceleration parameters for the stationary and nonstationary forms of the method established.

关键词： parabolic pde AGE iterative method cubic spline approximations

来源：评论

学校读者我要写书评

暂无评论

Spiral waves in CIMA model and its LBGK simulation

引用

Communications in Nonlinear science and Numerical Simulation 2001年第2期6卷 68-73页

作者： Qing LI, Chuguang ZHENG and Nengchao WANG College of Computer science and technology, Huazhong University of science and technology, Wuhan 430074, China e-mail: qingli @ public. *** State Key Laboratory of Coal Combustion, Huazhong University College of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China State Key Laboratory of Coal Combustion Huazhong University of Science and Technology Wuhan 430074 China Parallel Computation Institute Huazhong University of Science and Technology Wuhan 430074 China

Two type of structures-Turing patterns and spiral waves-are obtained in chloride-iodide-malonic acid (CIMA) reaction-diffusion model by using lattice Bhatnagar-Gross-Krook (LBGK) method.

关键词： lattice Boltzmann method reaction-diffusion Turing patterns spiral waves

来源：评论

学校读者我要写书评

暂无评论

A 3.2 GFLOPS neural network accelerator

引用

IEICE TRANSACTIONS ON ELECTRONICS 1997年第7期E80C卷 859-867页

作者： Komori, S Arima, Y Kondo, Y Tsubota, H Tanaka, K Kyuma, K Department of Neural and Parallel Processing Technology Advanced Technology Rand D Center Mitsubishi Electric Corporation Itami-shi 664 Japan Department of Neural and Parallel Processing Technology Advanced Technology R and D Center Mitsubishi Electric Corporation Amagasakishi 661 Japan Division of Information and Media Science Graduate School of Science and Technology Kobe University Kobe-shi 675 Japan

We have developed a SIMD-type neural-network processor (NEURO4) and its software environment. With the SIMD architecture, the chip executes 24 operations in a clock cycle and achieves 1.2 GFLOPS peak performance. An accelerator board, which contains four NEURO4 chips, achieves 3.2 GFLOPS. In this paper we describe features of the neural network chip, accelerator board, software environment and performance evaluation for several neural network models (LVQ, BP and Hopfield). The 3.2 GFLOPS neural network accelerator board demonstrates 1.7 GCPS and 261 MCUPS for Hopfield networks.

关键词： neural network parallel processing SIMD LSI

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：