检索结果-内蒙古大学图书馆

A load balancing-supported constant degree DHT ID assignment method

Tien Tzu Hsueh Pao/Acta Electronica Sinica 2010年第11期38卷 2649-2654页

作者： Wang, Xiao-Hai Peng, Yu-Xing Li, Dong-Sheng Key Laboratory of Science and Technology for Parallel and Distributed Processing National University of Defense Technology Changsha Hunan 410073 China

A Load Balancing-Supported ID assignment method is the foundation to implement and maintain DHT overlays, realized constant degree DHTs usually use simple pure centralized or distributed ID management strategies, which cannot resolve the contradiction between cost of maintaining topologies' information and topologies' balance. Analyzing the universal tree structures in the topologies, an ID Assignment method RFIDAM based on the internal structure Routing Forest is proposed, which regularly aggregates local balancing information to guide new nodes' joining for overall balance. The experimental results show, with low maintenance and routing message overhead, the system's loading balance is efficiently ensured with the length of IDs differ by at most 2.

关键词： Topology

来源：评论

学校读者我要写书评

暂无评论

A distributed paging RAM grid system for wide-area memory sharing 20

A distributed paging RAM grid system for wide-area memory sh...

引用

20th IEEE International parallel and Distributed processing Symposium, IPDPS 2006

作者： Chu, Rui Xiao, Nong Zhuang, Yongzhen Liu, Yunhao Lu, Xicheng National Key Laboratory for Parallel and Distributed Processing ChangSha HuNan China Hong Kong University of Science and Technology Clear Water Bay Kowloon Hong Kong

ISBN: (纸本)1424400546

Memory-intensive applications often suffer from the poor performance of disk swapping when memory is inadequate. Remote memory sharing schemes, which provide a remote memory that is faster than the local hard disk, are able to improve the performance of such applications. Due to the limitation of being applicable within single clusters only, however, most of the previous remote memory mechanisms, such as the network memory scheme, fail to be extendable into a large scale, distributed, heterogeneous, and dynamic environment. In this work, we propose a service-oriented grid memory sharing scheme, Distributed Paging RAM Grid (DPRG). We study the properties and criteria of large scale memory sharing, and then design major operations and optimizations to fit the usage of grid systems. We collect trace from our grid environment, and evaluate DPRG through comprehensive trace-driven simulations. Results show that DPRG significantly outperforms existing remote memory sharing schemes and supports grid computing applications effectively. © 2006 IEEE.

关键词： Storage allocation (computer)

来源：评论

学校读者我要写书评

暂无评论

A Data-Centric Approach for Efficient and Scalable CFD Implementation on Multi-GPUs Clusters 24th

A Data-Centric Approach for Efficient and Scalable CFD Imple...

引用

24th International Conference on parallel and Distributed Computing, Applications and Technologies, PDCAT 2023

作者： Li, Ruitian Deng, Liang Dai, Zhe Zhang, Jian Liu, Jie Liu, Gang China Aerodynamic Research and Development Center Computational Aerodynamic Institute Mianyang China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9789819982103

Scalability is a crucial factor determining the performance of massive heterogeneous parallel CFD applications on the multi-GPUs platforms, particularly after the single-GPU implementations have achieved optimal performance through numerous optimizations. A novel Data-Centric hybrid MPI-CUDA CFD model is proposed in this paper to enable efficient scalability of CFD applications on large-scale heterogeneous platforms. Based on the Data-Centric approach, Minimum-cost MPI transfer strategy and the code refactoring technique are realized for a better balance between data transfer and floating-point computation performance, which could significantly improve the scalability and reduce the time-to-solution. Subsequently, those approaches are integrated into the industrial unstructured CFD software, FlowStar, to evaluate their effectiveness. Numerical results demonstrate that Minimum-cost MPI strategy achieves more than 2.0 times performance improvement compared to the traditional Model-Centric implementation, and the code refactoring technique boosts performance by 40% to 50% over the minimum-cost MPI version. Moreover, the Data-Centric implementation on 64 A100 GPUs platform show a speedup ratio of over 120 when compared to the original MPI implementation with 64 ranks. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

A Distributed Approach to Consistent Order Delivery of Concurrent Events in Asynchronous DVEs

A Distributed Approach to Consistent Order Delivery of Concu...

引用

2010 2nd International Conference on Education technology and Computer(第二届IEEE教育技术与计算机国际会议 ICETC 2010)

作者： Hangjun Zhou Wei Zhang Yuxing Peng Sikun Li Key Laboratory of Science and Technology for National Defence of Parallel and Distributed Processing Key Laboratory of Science and Technology for National Defence of Parallel and Distributed Processing

ISBN: (纸本)9781424463671;9781424463688;9781424463701

In large-scale asynchronous distributed virtual environments(DVEs), one of the difficult problems is to deliver the concurrent events in a consistent order at each node. Generally, the previous consistency control approaches can be classified into two categories: causal order and time stamped order. However, causal order approaches can merely preserve the cause-effect relation of events and time stamped order approaches seem intrinsically complex to be used in serverless large-scale asynchronous DVEs. In this paper, we proposed a novel distributed algorithm to identify the concurrent events and preserve the consistent order delivery of them at different nodes. Simulation studies are also carried out to compare the performance of this algorithm with that of the previous ones. The results show that the new algorithm can effectively deliver the concurrent events in consistent order at each node and is more efficient than the previous algorithms in large-scale asynchronous DVEs.

关键词： distributed virtual environments concurrent events distributed consistency algorithm

来源：评论

学校读者我要写书评

暂无评论

Research on Integrated Detection of SQL Injection Behavior Based on Text Features and Traffic Features 10th

Research on Integrated Detection of SQL Injection Behavior B...

引用

10th International Conference on Computer Engineering and Networks, CENet 2020

作者： Li, Ming Liu, Bo Xing, Guangsheng Wang, Xiaodong Wang, Zhihui College of Intelligence Science and Technology National University of Defence Technology Changsha410073 China National Key Laboratory of Parallel and Distributed Processing College of Computer Science and Technology National University of Defence Technology Changsha410073 China

ISBN: (纸本)9789811584619

With the rapid development of Internet technology, various network attack methods come out one after the other. SQL injection has become one of the most severe threats to Web applications and seriously threatens various Web application services and users data security. There are both traditional detection methods and emerging methods based on deep learning technology with higher detection accuracy for the detection of SQL injection. However, they are all for detecting a single statement and cannot determine the stage of the attack. To further improve the effect of SQL injection detection, this paper proposes an integrated detection framework for SQL injection behavior based on both text features and traffic features. We propose a SQL-LSTM model based on deep learning technology as the detection model at the text features level. Meanwhile, the features of the data traffic are merged. By this integrated method, the detection effect of SQL injection is further improved. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

An Efficient parallel Successive Cancellation List Polar Decoder Based on GPUs

An Efficient Parallel Successive Cancellation List Polar Dec...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Xin Zhou Rongchun Li Shijie Li Yuntao Liu Yong Dou National Key Laboratory for Parallel and Distribution Processing National University of Defense Technology Changsha China

ISBN: (数字)9781728143286

ISBN: (纸本)9781728143293

Polar codes are a class of codes that can achieve the symmetric capacity. They are adopted to be control code for the enhanced mobile broadband (eMBB) for the fifth generation(5G) standard. Although Polar codes can be efficiently decoded by successive cancellation algorithm with complexity O(NlogN), decoding performance of this algorithm is not good enough for short codewords. The successive cancellation list(SCL) decoder is recently investigated in most studies. It has better frame error rate(FER) performance but poor latency and throughput. In this study, a parallel SCL decoder based on the graphic processing unit(GPU) is designed to reduce the latency and improve the decoding throughput. An efficient approach for sharing the intermediate values among different decoding paths is introduced. This method reduces the computing complexity and decoding latency. The implementation of parallel non-recursive decoding algorithm also increases the throughput significantly. For the typical case of code length N=1024 and list size L=4 with code rate R = 0.5, the parallel decoder based on GPU achieves throughput of 49 Mbps on Nvidia GTX 980 and 79 Mbps on Nvidia Titan X. The throughputs are 240 and 392 times higher than the decoder based on the CPU.

关键词： Decoding Polar codes Throughput Graphics processing units Measurement Complexity theory Reliability

来源：评论

学校读者我要写书评

暂无评论

ParTransgrid: A scalable parallel preprocessing tool for unstructured-grid cell-centered computational fluid dynamics applications

ParTransgrid: A scalable parallel preprocessing tool for uns...

引用

作者： Zhang, Jian Liu, Jie Zhou, Naichun Tang, Jing He, Xie Chen, Jianqiang Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Computational Aerodynamics Institute China Aerodynamics Research and Development Center Mianyang China

The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes. © 2021 John Wiley & Sons, Ltd.

关键词： Computational fluid dynamics

来源：评论

学校读者我要写书评

暂无评论

A relative coordinate based distributed sparse-preserving matrix factorization approach towards self-stabilizing network location service - Withdrawn

引用

IEEE Transactions on parallel and Distributed Systems 2023年 1-1页

作者： Fu, Yongquan Wang, Yijie Pei, Xiaoqiang Li, Xiaoyong Science and Technology Laboratory of Parallel and Distributed Processing National University of Defense Technology China

Withdrawn. IEEE

关键词： Linear matrix inequalities

来源：评论

学校读者我要写书评

暂无评论

A semantic-based meteorology grid service registry, discovery and composition framework

A semantic-based meteorology grid service registry, discover...

引用

2006 2nd International Conference on Semantics Knowledge and Grid, SKG

作者： Kaijun, Ren Nong, Xiao Junqiang, Song Weimin, Zhang Tao, Chen National Laboratory for Parallel and Distributed Processing NUDT Changsha Hunan 410073 China College of Science National University of Defense Technology Changsha Hunan 410073 China

ISBN: (纸本)0769532055

Meteorology Grid Computing aims to provide scientist with seamless, reliable, secure and inexpensive access to meteorological resources. In this paper, we presented a semantic-based meteorology grid service registry, discovery and composition framework by combining grid technologies and the advantages of semantic web techniques. The main objective of the framework is to support automating the discovery, selection, and workflow composition of semantically described heterogeneous meteorological grid services, which offers the possibility of facilitating geographically distributed meteorological scientists to resolve complex scientific problems cooperately. With this framework, the key technologies such as semantic registry, semantic matchmaking, QoS ranking and composition model, will be discussed. © 2006 IEEE.

关键词： Grid computing

来源：评论

学校读者我要写书评

暂无评论

Optimization and Performance Modeling of Stencil Computations on ARM Architectures 22

Optimization and Performance Modeling of Stencil Computation...

引用

22nd IEEE International Conference on High Performance Computing and Communications, 18th IEEE International Conference on Smart City and 6th IEEE International Conference on Data science and Systems, HPCC-SmartCity-DSS 2020

作者： Zhang, Kaifang Su, Huayou Zhang, Peng Dou, Yong National University of Defense Technology National Key Laboratory for Parallel and Distribution Processing Changsha China Caep Software Center for High Performance Numerical Simulation Beijing China Institute of Applied Physics and Computational Mathematics Beijing China

ISBN: (纸本)9781728176499

Stencil Computation has long been an omnipresent kernel of a wide range of scientific and engineering applications. There is much work investigating the stencil performance on x86 processors and accelerators such as GPU. Meanwhile, ARM processors for HPC have been highlighted more and more with Fugaku of Japan becoming the first number one system on the TOP-500 list. In this paper, we focus on modeling and optimizing the performance of stencil computation on ARM architectures. Specifically, we proposed a performance model for stencil computation based on tiling optimization to guide the optimal configuration of tiling parameters for good cache reuse. We validate the proposed model with the Performance Monitor Unit (PMU) provided by the ARM processor. Experimental results show that the prediction error of the execution time can be lower to 1.37%. Furthermore, we can achieve a maximum of 1.26 \times speedup compared to the naive implementation according to the optimization parameters provided from the model. © 2020 IEEE.

关键词： ARM processors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：