检索结果-内蒙古大学图书馆

IEEE International Conference on Services computing (SCC)

作者： Xue Ouyang Peter Garraghan Changjian Wang Paul Townend Jie Xu Parallel and Distributed Laboratory National University of Defense Technology Changsha China School of Computing University of Leeds Leeds UK

The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance.

关键词： Servers Production Data models Computational modeling Analytical models Time factors Calculators

来源：评论

学校读者我要写书评

暂无评论

Accelerate Graph Neural Network Training by Reusing Batch Data on GPUs

Accelerate Graph Neural Network Training by Reusing Batch Da...

引用

IEEE International Conference on Performance, computing and Communications (IPCCC)

作者： Zhejiang Ran Zhiquan Lai Lizhi Zhang Dongsheng Li National Key Laboratory of Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9781665443326

With the increasing adoption of graph neural networks (GNNs) in the graph-based deep learning community, various graph programming frameworks and models have been developed to improve the productivity of GNNs. The current GNN frameworks choose GPU as an essential tool to accelerate GNN training. However, it is still challenging to train GNNs on large graphs with limited GPU memory. Unlike traditional neural networks, generating mini-batch data by sampling in GNNs requires some complicated tasks such as traversing the graph to select neighboring nodes and gathering their features. This process takes up most of the training and we find the main bottleneck comes from transferring nodes features from CPU to GPU through limited bandwidth. In this paper, We propose a method Reusing Batch Data for the problem of data transmission. This method utilizes the similarity between adjacent mini-batches to reduce repeated data transmission from CPU to GPU. Furthermore, to reduce the overhead introduced by this method, we design a fast algorithm based on GPU to detect repeated nodes’ data and achieve shorter additional computation time. Evaluations on three representative GNN models show that our method can reduce transmission time by up to 60% and speed the end-to-end GNN training by up to 1.79× over the state-ofthe-art baselines. Besides, Reusing Batch Data can effectively save GPU memory footprint by about 19% to 40% while still reducing the training time compared to the static cache strategy.

关键词： Training Productivity Deep learning Design methodology Conferences Graphics processing units Programming

来源：评论

学校读者我要写书评

暂无评论

SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs

SCGraph: Accelerating Sample-based GNN Training by Staged Ca...

引用

IEEE International Conference on Big Data and Cloud computing (BdCloud)

作者： Yuqi He Zhiquan Lai Zhejiang Ran Lizhi Zhang Dongsheng Li National Key Laboratory of Parallel and Distributed Processing College of Computer National University of Defense Technology Changsha China

Graph neural networks (GNNs) have been becoming important tools for processing structured graph data and successfully applied to multiple graph-based application scenarios. The existing GNN systems adopt sample-based training on large-scale graphs over multiple GPUs. Although they support large-scale graph training, large data loading overhead of transferring vertex features between CPUs and GPUs is still a bottleneck. In this work, we propose SCGraph, a method that supports GPU high-speed feature caching. SCGraph classifies the graph vertices sorted by out-degrees. For high out-degree vertices, SCGraph sets grading caches via different GPUs to increase the overall cache content through NVLink high-speed data transmission between them. For low out-degree vertices, SCGraph expands training vertices' neighborhood in advance to regenerate cache. We evaluate SCGraph against two state-of-the-art industrial GNN frameworks, i.e., DGL and PaGraph on various benchmarks. Experimental results show that SCGraph improves the cache hit rate over GPUs up to 23.6%, and achieves up to 1.71x performance speedup over the state-of-the-art baselines while the convergence almost constant.

关键词： Training Memory management Loading Graphics processing units Benchmark testing Graph neural networks Data communication

来源：评论

学校读者我要写书评

暂无评论

A Novel Evaluation Strategy to Artificial Neural Network Model Based on Bionics

引用

Journal of Bionic Engineering 2022年第1期19卷 224-239页

作者： Sen Tian Jin Zhang Xuanyu Shu Lingyu Chen Xin Niu You Wang School of Mathematics and Statistics Hunan Normal UniversityChangsha410081China College of Information Science and Engineering Hunan Normal UniversityChangsha410081China School of Computer and Communication Engineering Changsha University of Science and TechnologyChangsha410114China Key Laboratory of Industrial Control Technology Zhejiang UniversityHangzhou310058China Science and Technology on Parallel and Distributed Laboratory College of ComputerNational University of Defense TechnologyChangsha410199China Key Laboratory of Industrial Control Technology Institute of Cyber Systems and ControlZhejiang UniversityHangzhou310027China

With the continuous deepening of Artificial Neural Network(ANN)research,ANN model structure and function are improving towards diversification and ***,the model is more evaluated from the pros and cons of the problem-solving results and the lack of evaluation from the biomimetic aspect of imitating neural networks is not inclusive ***,a new ANN models evaluation strategy is proposed from the perspective of bionics in response to this problem in the ***,four classical neural network models are illustrated:Back Propagation(BP)network,Deep Belief Network(DBN),LeNet5 network,and olfactory bionic model(KIII model),and the neuron transmission mode and equation,network structure,and weight updating principle of the models are analyzed *** analysis results show that the KIII model comes closer to the actual biological nervous system compared with other models,and the LeNet5 network simulates the nervous system in ***,evaluation indexes of ANN are constructed from the perspective of bionics in this paper:small-world,synchronous,and chaotic ***,the network model is quantitatively analyzed by evaluation indexes from the perspective of *** experimental results show that the DBN network,LeNet5 network,and BP network have synchronous *** the DBN network and LeNet5 network have certain chaotic characteristics,but there is still a certain distance between the three classical neural networks and actual biological neural *** KIII model has certain small-world characteristics in structure,and its network also exhibits synchronization characteristics and chaotic *** with the DBN network,LeNet5 network,and the BP network,the KIII model is closer to the real biological neural network.

关键词： Artificial neural network(ANN) Back Propagation(BP)network Deep Belief Network(DBN) LeNet5 network Olfactory bionic model(KIII model) Small world Chaos Synchronous

来源：评论

学校读者我要写书评

暂无评论

Towards Building Efficient Content-Based Publish/Subscribe Systems over Structured P2P Overlays

Towards Building Efficient Content-Based Publish/Subscribe S...

引用

International Conference on parallel Processing (ICPP)

作者： Shengdong Zhang Ji Wang Rui Shen Jie Xu National Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China School of Computing University of Leeds Leeds UK

In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event-oriented, subscription-oriented and hybrid) to make all the matched pairs (event, subscription) meet in a system. By theoretically analyzing the inherent problem of both event-oriented and subscription-oriented methods, we propose PEM (Popularity-based Event Matching), a variant of hybrid method. PEM can achieve better trade-off between event processing load and subscription storage load of a system. PEM has been verified through both mathematical and simulation-based evaluation.

关键词： Subscriptions Decision support systems Publishing Load modeling Mathematical model Bandwidth Routing

来源：评论

学校读者我要写书评

暂无评论

Noncontiguous I/O accesses through MPI-IO 03

Noncontiguous I/O accesses through MPI-IO

引用

IEEE/ACM International Symposium on Cluster computing and the Grid (CCGRID)

作者： A. Ching A. Choudhary K. Coloma Wei-keng Liao R. Ross W. Gropp Center for Parallel and Distributed Computing Northwestern University Evanston IL USA Argonne National Laboratory Argonne IL USA

ISBN: (纸本)9780769519197

I/O performance remains a weakness of parallel computing systems today. While this weakness is partly attributed to rapid advances in other system components, I/O interfaces available to programmers and the I/O methods supported by file systems have traditionally not matched efficiently with the types of I/O operations that scientific applications perform, particularly noncontiguous accesses. The MPI-IO interface allows for rich descriptions of the I/O patterns desired for scientific applications and implementations such as ROMIO have taken advantage of this ability while remaining limited by underlying file system methods. A method of noncontiguous data access, list I/O, was recently implemented in the parallel Virtual File System (PVFS). We implement support for this interface in the ROMIO MPI-IO implementation. Through a suite of noncontiguous I/O tests we compared ROMIO list I/O to current methods of ROMIO noncontiguous access and found that the list I/O interface provides performance benefits in many noncontiguous cases.

关键词： File systems Testing distributed computing Mathematics Computer science Laboratories parallel processing Programming profession Tiles Checkpointing

来源：评论

学校读者我要写书评

暂无评论

High performance and memory efficient implementation of matrix multiplication on FPGAs

High performance and memory efficient implementation of matr...

引用

IEEE International Conference on Field-Programmable Technology (FPT)

作者： Guiming Wu Yong Dou Miao Wang National Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China Jiangnan Institute of Computing Technology Wuxi China

We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S 2 ), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.

关键词： Field programmable gate arrays Arrays Random access memory Algorithm design and analysis Hardware Memory management Optimization

来源：评论

学校读者我要写书评

暂无评论

Automatic synthesis of processor arrays with local memories on FPGAs

Automatic synthesis of processor arrays with local memories ...

引用

IEEE International Conference on Field-Programmable Technology (FPT)

In this paper, we present an automatic synthesis framework to map loop nests to processor arrays with local memories on FPGAs. An affine transformation approach is firstly proposed to address space-time mapping problem. Then a data-driven architecture model is introduced to enable automatic generation of processor arrays by extracting this data-driven architecture model from transformed loop nests. Some techniques including memory allocation, communication generation and control generation are presented. Synthesizable RTL codes can be easily generated from the architecture model built by these techniques. A preliminary synthesis tool is implemented based on PLUTO, an automatic polyhedral source-to-source transformation and parallelization framework.

关键词： parallel processing Registers Field programmable gate arrays Radiation detectors Arrays Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Grid computing in China

引用

Journal of Grid computing 2004年第2期2卷 193-206页

作者： Yang, Guangwen Jin, Hai Li, Minglu Xiao, Nong Li, Wei Wu, Zhaohui Wu, Yongwei Tang, Feilong Department of Computer Science and Technology Tsinghua University Beijing China School of Computer Huazhong University of Science and Technology Wuhan China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China Institute of Computing Technology Chinese Academy of Sciences Beijing 100080 China College of Computer Science Zhejiang University Hangzhou 310027 China

Grid computing presents a new trend to distributed computation and Internet applications, which can construct a virtual single image of heterogeneous resources, provide uniform application interface and integrate widespread computational resources into super, ubiquitous and transparent aggregation. In the adoption of Grid computing, China, who is facing more resource heterogeneity and other specific demands, has put much effort on both research and practical utilization. In this paper, we introduce the major China Grid research projects and their perspective applications. First we give the overview of the four government-sponsored programs in Grid, namely the China national Grid, ChinaGrid, NSFC Grid, and ShanghaiGrid. Then we present six representative ongoing Grid systems in details, which are categorized into Grid middleware and Grid application. This paper provides the general picture of Grid computing in China, and shows the great efforts, devotion and confidence in China to use Grid technology to boost the society, economics and scientific research. © Springer 2004.

关键词： Grid computing

来源：评论

学校读者我要写书评

暂无评论

Random Walk Based Inverse Influence Research in Online Social Networks

Random Walk Based Inverse Influence Research in Online Socia...

引用

IEEE International Conference on High Performance computing and Communications (HPCC)

作者： Zhaoyan Jin Quanyuan Wu Dianxi Shi Huining Yan National Key Laboratory for Parallel and Distributed Processing NUDT Changsha Hunan P.R.China

ISBN: (纸本)9781479909735

In online social networks, social influence of a user reflects his or her reputation or importance in the whole network or to a personalized user. Social influence analysis can be used in many real applications, such as link prediction, friend recommendation and personalized searching. Personalized Page Rank, which ranks nodes according to the probabilities that a random walk starting from a personalized node stops at all nodes, is one of the most popular metrics for influence analysis. In this paper, we study the problem of inverse influence in online social networks. Different from Personalized Page Rank, the inverse influence for a personalized node ranks nodes according to the probabilities that all nodes stop at the personalized node in limited steps. We propose two computation models for inverse influence, i.e., the random walk based and the path based. Both of the models have high computation complexity, and cannot be used in large graphs, so we propose a Monte Carlo based approximation algorithm. Experiments from synthetic and real world datasets show that, our algorithm has equivalent or even better accuracy than related researches in link prediction, and thus can be used in friend recommendation in online social networks.

关键词： Prediction algorithms Social network services Equations Mathematical model Accuracy Approximation algorithms Measurement

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：