检索结果-内蒙古大学图书馆

26th European Conference on Artificial Intelligence, ECAI 2023

作者： Wu, Lilei Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha410073 China

ISBN: (纸本)9781643684369

Unsupervised visual representation learning has gained much attention from the computer vision community because of the recent contrastive learning achievements. Current work mainly adopts instance discrimination as the pretext task, which treats every single instance as a different class (negative) and uses a collection of data augmentation techniques to generate more examples (positive) for each class. The idea is straightforward and efficient but will generally cause similar instances to be classified into different classes. Such problem has been defined as 'class collision' in some previous works and is shown to hurt the representation ability. Motivated by this observation, we present a solution to address this issue by filtering similar negative examples from each mini-batch. Concretely, we model the problem as a Determinantal Point Process (DPP) so that similar instances can be filtered stochastically, and diverse samples are expected to be sampled for contrastive training. Besides, we further introduce a priority term for each instance, which indicates the hardness of its positives, so that instances with more hard positives are more likely to be sampled for contributing to the optimization. Our sampling can be efficiently implemented in a feed-forward manner and further accelerated by our encouraged complement DPP. Extensive experimental results demonstrate our priority over the standard setup of contrastive learning. © 2023 The Authors.

关键词：

来源：评论

学校读者我要写书评

暂无评论

ParTransgrid: A scalable parallel preprocessing tool for unstructured-grid cell-centered computational fluid dynamics applications

ParTransgrid: A scalable parallel preprocessing tool for uns...

引用

作者： Zhang, Jian Liu, Jie Zhou, Naichun Tang, Jing He, Xie Chen, Jianqiang Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Computational Aerodynamics Institute China Aerodynamics Research and Development Center Mianyang China

The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes. © 2021 John Wiley & Sons, Ltd.

关键词： Computational fluid dynamics

来源：评论

学校读者我要写书评

暂无评论

A Float-Point Arithmetic Logic Unit for High Performance RISC-V Processor 6

A Float-Point Arithmetic Logic Unit for High Performance RIS...

引用

6th International Conference on Frontier Technologies of Information and Computer, ICFTIC 2024

作者： Wei, Chunyuan Yuan, Chuan Qi, Yu Xu, Bangjian Niu, Xin College of Computer Science and Electronic Engineering Hunan University Changsha China College of Electrical and Information Engineering Hunan University Changsha China College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Laboratory Changsha China

ISBN: (纸本)9798331541750

With the rapid advancement of artificial intelligence, chips have become increasingly important. The emerging RISC-V instruction set gradually provides powerful computing support for this field. In this context, along with the computing requirements of deep learning, this paper presents the design of a high-performance floating-point arithmetic logic unit (FALU) that facilitates calculations with double-precision, single-precision, half-precision, and Bfloat16 precision data. This design is based on a single-channel algorithm with merged rounding. It improves and implements a composite adder that combines high and low bits. It also proposes a tree-like floating-point comparator based on the Kogge-Stone parallel prefix network. To ensure that the FALU components meet performance requirements, we undergo functional verification in the Vivado simulation environment. Operating at 1.47GHz under the 28nm CMOS process, the components achieve the predetermined performance indicators. © 2024 IEEE.

关键词： Computer circuits

来源：评论

学校读者我要写书评

暂无评论

A relative coordinate based distributed sparse-preserving matrix factorization approach towards self-stabilizing network location service - Withdrawn

引用

IEEE Transactions on parallel and distributed Systems 2023年 1-1页

作者： Fu, Yongquan Wang, Yijie Pei, Xiaoqiang Li, Xiaoyong Science and Technology Laboratory of Parallel and Distributed Processing National University of Defense Technology China

Withdrawn. IEEE

关键词： Linear matrix inequalities

来源：评论

学校读者我要写书评

暂无评论

High-performance Network Traffic Classification Based on Graph Neural Network 6

High-performance Network Traffic Classification Based on Gra...

引用

6th IEEE Information technology, Networking, Electronic and Automation Control Conference, ITNEC 2023

作者： Pang, Bo Fu, Yongquan Ren, Siyuan Jia, Yan Harbin Institute of Technology College of Computer Science and Technology Shenzhen China National University of Defense Technology National Key Laboratory for Parallel and Distributed Processing College of Computer Changsha China Peng Cheng Laboratory Shen Zhen China

ISBN: (纸本)9781665460033

Network traffic classification is crucial for network security and network management and is one of the most important network tasks. Current state-of-the-art traffic classifiers are based on deep learning models to automatically extract features from packet streams. Unfortunately, current approaches fail to effectively combine the structural information of traffic packets with the content features of the packets, resulting in limited classification accuracy. In this paper, we propose a graph neural network model for network traffic classification, which can well perceive the interaction feature of packets in traffic. Firstly, we design a graph structure for packets' flows to hold the interaction information between packets, which embeds both packet contents and sequence relationships into a unified graph. Secondly, we propose a graph neural network framework for graph classification to automatically learn the structural features of the packets' flows together with the packets' features. Extensive evaluation results on real-world traffic data show that the proposed model improves the prediction accuracy of improves the prediction accuracy by 2% to 37% for malicious traffic classification. © 2023 IEEE.

关键词： Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Optimizing Depthwise Convolutions on ARMv8 Architecture 23rd

Optimizing Depthwise Convolutions on ARMv8 Architecture

引用

23rd International Conference on parallel and distributed Computing, Applications, and Technologies, PDCAT 2022

作者： Hao, Ruochen Wang, Qinglin Yin, Shangfei Zhou, Tianyang Zhang, Qingyang Mei, Songzhu Shen, Siqi Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China College of Computer National University of Defense Technology Changsha410073 China Xiamen University Xiamen China

ISBN: (纸本)9783031299261

Depthwise convolutions are widely used in lightweight convolutional neural networks (CNNs). The performance of depthwise convolutions is mainly bounded by the memory access rather than the arithmetic operations for classic convolutions so that direct algorithms are often more efficient than indirect ones (matrix multiplication-, Winograd-, and FFT-based convolutions) with additional memory accesses. However, the existing direct implementations of depthwise convolutions on ARMv8 architectures feature a bad trade-off between register-level reuse of different tensors, which usually leads to sub-optimal performance. In this paper, we propose a new direct implementation of depthwise convolutions by means of implicit padding, register tiling, etc. Compared to the existing ones, our new implementations can incur much less communication overhead between registers and cache. Experimental results on two ARMv8 CPUs show that our implementation can averagely deliver 4.88 × performance improvement over the existing direct ones in open-source libraries. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Transformer in reinforcement learning for decision-making:a survey

引用

Frontiers of Information technology & Electronic Engineering 2024年第6期25卷 763-790页

作者： Weilin YUAN Jiaxing CHEN Shaofei CHEN Dawei FENG Zhenzhen HU Peng LI Weiwei ZHAO College of Information and Communication National University of Defense TechnologyWuhan 430014China College of Intelligence Science and Technology National University of Defense TechnologyChangsha 410072China Laboratory for Parallel and Distributed Processing National University of Defense TechnologyChangsha 410072China

Reinforcement learning(RL)has become a dominant decision-making paradigm and has achieved notable success in many real-world ***,deep neural networks play a crucial role in unlocking RL’s potential in large-scale decision-making *** by current major success of Transformer in natural language processing and computer vision,numerous bottlenecks have been overcome by combining Transformer with RL for *** paper presents a multiangle systematic survey of various Transformer-based RL(TransRL)models applied in decision-making tasks,including basic models,advanced algorithms,representative implementation instances,typical applications,and known *** work aims to provide insights into problems that inherently arise with the current RL approaches,and examines how we can address them with better TransRL *** our knowledge,we are the first to present a comprehensive review of the recent Transformer research developments in RL for *** hope that this survey provides a comprehensive review of TransRL models and inspires the RL community in its pursuit of future *** keep track of the rapid TransRL developments in the decision-making domains,we summarize the latest papers and their open-source implementations at https://***/williamyuanv0/Transformer-in-Reinforcement-Learning-for-Decision-Making-A-Survey.

关键词： Transformer Reinforcement learning(RL) Decision-making(DM) Deep neural network(DNN) Multi-agent reinforcement learning(MARL) Meta-reinforcement learning(Meta-RL)

来源：评论

学校读者我要写书评

暂无评论

Model Provenance Management in MLOps Pipeline 2022

Model Provenance Management in MLOps Pipeline

引用

8th International Conference on Computing and Data Engineering, ICCDE 2022

作者： Mei, Songzhu Liu, Cong Wang, Qinglin Su, Huayou Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology China Information Center Cmc Logistic Support Department China

ISBN: (纸本)9781450395717

Machine learning engineering is an important technology that has attracted the attention of academia and industry in the past two years. For AI to become a productivity of enterprises, it must be engineered to solve the problems of model development, deployment, management, prediction, and other lifecycle management, and MLOps system is the most concerned technology in the current machine learning engineering process. This paper starts from the training evolution process of intelligent computing models in machine learning engineering, focuses on the influence of data, training algorithms, and requirement indicators on the model evolution process, and gives the definition of Model Provenance and its evolutionary correlation elements. Based on the concept of model lineage, this paper designs a method to manage the model evolution process in the MLOps system and implements the related system. © 2022 Association for Computing Machinery. All rights reserved.

关键词： Life cycle

来源：评论

学校读者我要写书评

暂无评论

Adaptive Self-Supervised Continual Learning 26

Adaptive Self-Supervised Continual Learning

引用

26th European Conference on Artificial Intelligence, ECAI 2023

作者： Wu, Lilei Wang, Zhen Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha410073 China Tsinghua University Beijing100084 China

ISBN: (纸本)9781643684369

Continual Learning (CL) studies the problem of developing a robust model that can learn new tasks while retaining previously learned knowledge. However, the current CL methods exclusively focus on data with annotations, disregarding that unlabelled data is the mainstream in real-world applications. To close this research gap, this study concentrates on continual self-supervised learning, which is plagued by challenges of memory over-fitting and class imbalance. Besides, these challenges are exacerbated throughout incremental training. Aimed at addressing these challenges from both loss and data perspectives, we introduce a framework, Adaptive Self-supervised Continual Learning (ASCL). Specifically, we devise an Adaptive Sharpness-Aware Minimization (ASAM) module responsible for identifying flatter local minima in the loss landscape with a smaller memory over-fitting risk. Additionally, we design an Adaptive Memory Enhancement (AME) module responsible for rebalancing self-supervised loss with new and old tasks from a data perspective. Finally, the adaptive mechanisms in AME and ASAM modules dynamically adjust the loss landscape sharpness and memory enhancement strength with the feedback of intermediate training results. The results of our extensive experiments demonstrate the state-of-the-art performance of our methods in continual self-supervised learning scenarios across multiple datasets. © 2023 The Authors.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Evaluating matrix multiplication-based convolution algorithm on multi-core digital signal processors

引用

Guofang Keji Daxue Xuebao/Journal of National University of Defense technology 2023年第1期45卷 86-94页

作者： Wang, Qinglin Pei, Xiangdong Liao, Linyu Wang, Haoxu Li, Rongchun Mei, Songzhu Li, Dongsheng College of Computer Science and Technology National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China

The matrix multiplication-based convolutional algorithm, which can efficiently implement convolutions with different parameters, is the first choice of convolution performance optimization for a given chip. Based on the architecture of Phytium heterogeneous multi-core DSPs(digital signal processors) developed by National University of Defense technology and the characteristic of the matrix multiplication-based convolutional algorithm, a parallel implementation of the matrix multiplication-based convolutional algorithm (called ftmEConv) for different convolutions on multi-core DSPs was proposed. The ftmEConv consists of four parallelized parts(input feature maps transformation, filter transformation, matrix multiplication, and output feature maps transformation), all of which were optimized for multi-core DSPs, and the performance of each part was improved by effectively exploiting the potential of all functional units in DSP cores. The experimental results demonstrate that ftmEConv achieves computational efficiency of up to 42.90%. Compared with other implementations of the matrix multiplication-based convolutional algorithm on heterogeneous chips, ftmEConv gets a speedup of up to 7.79 times. © 2023 National University of Defense technology. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：