检索结果-内蒙古大学图书馆

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Jintao Peng Jie Liu Yi Dai Min Xie Chunye Gong National University of Defense Technology ChangSha China Science and Technology on Parallel and Distributed Processing Laboratory NUDT Changsha China Changsha and Laboratory of Software Engineering for Complex System NUDT Changsha China

All-to-all communication has a wide range of ap-plications in parallel applications like FFT. On most supercom-puters, there are multiple cores in a node. Message aggregation is an efficient method for smaller messages. Using multi-leader to aggregate message show significant improvement in intra-node overhead. However, compared to one-leader aggregation, existing multi-leader design incurs more message count and less aggregated message size. This paper proposes an Overlapped Multi-worker Multi-port all-to-all (OVALL) to scale the message size and parallelism of the aggregation algorithm. The algorithm exploits the all-to-all multi-core parallelism, concurrent commu-nication, and overlapping capabilities. Experiment results show that, compared to systems built-in MPI, OVALL'implementation achieves up to 5.9x or 18x speedup compared to MPI on different HPC systems. For the Fast Fourier Transform (FFT) application, OVALL is up to 2.7x (8192 cores, system A) or 5.6x (4800 cores, system B) faster compared to built-in MPI on peak performance.

关键词： Multicore processing Fast Fourier transforms Scalability Aggregates Supercomputers Libraries Pipeline processing

来源：评论

学校读者我要写书评

暂无评论

A Memory Saving Mechanism Based on Data Transferring for Pipeline parallelism

A Memory Saving Mechanism Based on Data Transferring for Pip...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Wei Jiang Rui Xu Sheng Ma Qiong Wang Xiang Hou Hongyi Lu Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha China

The correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads to high memory capacity requirement and computer capacity requirements of the training process. The multi-accelerator parallelism is a promising choice for the two challenges, which deploys multiple accelerators in parallel for training neural networks. Among them, the pipeline parallel scheme has a great advantage in training speed, but its memory capacity requirements are relatively higher than other parallel schemes. Aiming at solving this challenge of pipeline parallel scheme, we propose a data transfer mechanism, which effectively reduces the peak memory usage of the training process by real-time data transferring. In the experiment, we implement our design and apply it to Pipedream, a mature pipeline parallel scheme. The memory requirement of training process is reduced by up to 48.5%, and the speed loss is kept within a reasonable range.

关键词： Training Computational modeling Pipelines Neural networks Memory management Bandwidth parallel processing

来源：评论

学校读者我要写书评

暂无评论

Modeling full information with graph network for joint entity-relation extraction 4

Modeling full information with graph network for joint entit...

引用

4th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2021

作者： Wan, Qian Wei, Luona Chen, Xinhai Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Hunan Changsha China College of Systems Engineering National University of Defense Technology Hunan Changsha China Laboratory of Software Engineering for Complex Systems National University of Defense Technology Hunan Changsha China

ISBN: (纸本)9781450385053

Fully capturing contextual information and analyzing the association between entity semantics and type is helpful for joint extraction task: 1) The context can reflect the part of speech and semantics of entity. 2) The entity type is closely related to the relation between entities. Previous research used to simply embed the contextual information into shallow layer of the model, ignoring the association between entity semantics and type. In this paper, we propose a graph network with full-information modeling to explicitly model different-level information in the text. The contextual information of entity is dynamically embedded in each span representation to improve the reasoning ability. To capture the fine-grained association between the semantics and type of entity, the graph network uses the feature of entity types to generate edge information between different nodes. Experimental results show that our model outperforms previous models on the CoNLL04 dataset and obtains competitive results on the SciERC dataset in both entity recognition and relation extraction. Extensive additional experiments further verify the effectiveness of the model. © 2021 ACM.

关键词： Natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Co-designing the Topology/Algorithm to Accelerate distributed Training

Co-designing the Topology/Algorithm to Accelerate Distribute...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Xiang Hou Rui Xu Sheng Ma Qiong Wang Wei Jiang Hongyi Lu Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha China

With the development of Deep Learning (DL), Deep Neural Network (DNN) models have become more complex. At the same time, the development of the Internet makes it easy to obtain large data sets for DL training. Large-scale model parameters and training data enhance the level of AI by improving the accuracy of DNN models. But on the other hand, they also present more severe challenges to the hardware training platform because training a large model needs a lot of computing and memory resources that can easily exceed the capacity of a single processor. In this context, integrating more processors on a hierarchical system to conduct distributed training is a direction for the development of training platforms. In distributed training, collective communication operations (including all-to-all, all-reduce, and all-gather) take up a lot of training time, making the interconnection network between computing nodes one of the most critical factors affecting the system performance. The hierarchical torus topology, combined with the Ring All-Reduce collective communication algorithm, is one of the current mainstream distributed interconnection networks. However, we believe that its communication performance is not the best. In this work, we first designed a new intra-package communication topology, i.e. the switch-based fully connected topology, which shortens the time consumed by cross-node communication. Then, considering the characteristics of this topology, we carefully devised more efficient all-reduce and all-gather communication algorithms. Finally, combined with the torus topology, we implemented a novel distributed DL training platform. Compared with the hierarchical torus, our platform improves communication efficiency and provides 1.16-2.68 times speedup in distributed training of DNN models.

关键词： Training Deep learning Network topology Multiprocessor interconnection Computational modeling Training data Switches

来源：评论

学校读者我要写书评

暂无评论

科学研究的第五范式——以智能驱动的材料设计为例

引用

Engineering 2023年第5期24卷 126-137,I0003,I0004页

作者： Can Leng Zhuo Tang Yi-Ge Zhou Zean Tian Wei-Qing Huang Jie Liu Keqin Li Kenli Li Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense TechnologyChangsha 410073China Laboratory of Software Engineering for Complex Systems National University of Defense TechnologyChangsha 410073China National Supercomputing Center in Changsha Changsha 410082China College of Computer Science and Electronic Engineering Hunan UniversityChangsha 410082China Institute of Chemical Biology and Nanomedicine State Key Laboratory of Chemo/Biosensing and ChemometricsCollege of Chemistry and Chemical EngineeringHunan UniversityChangsha 410082China Department of Applied Physics School of Physics and ElectronicsHunan UniversityChangsha 410082China Department of Computer Science State University of New YorkNew PaltzNY 12561USA

科学正在进入一个新时代——第五范式——它被认为是知识整合到不同领域的主要特征,是基于无所不在的机器学习系统的计算社区中智能驱动的工作。在此,我们通过在天河一号超级计算机系统上构建的催化材料专门设计的典型平台案例,生动地... 详细信息

科学正在进入一个新时代——第五范式——它被认为是知识整合到不同领域的主要特征,是基于无所不在的机器学习系统的计算社区中智能驱动的工作。在此,我们通过在天河一号超级计算机系统上构建的催化材料专门设计的典型平台案例,生动地阐明了第五范式的本质,旨在促进第五范式在其他领域的培养。第五范式平台主要包括模型自动构建(原始数据提取)、指纹自动构建(神经网络特征选择)以及跨学科知识串联的重复迭代(“火山图”)。与分解一起进行的是对迭代中实现的体系结构的性能评估。通过讨论,第五范式的智能驱动平台可以极大地简化和改进研究中极其繁琐和具有挑战性的工作,并通过补偿机器学习中缺少样本和替代一些由于计算资源不足而导致的数值计算来实现数值计算与机器学习之间的相互反馈,从而加速探索过程。在数据驱动的学科中,跨学科专家的协同作用和对动态数据需求的急剧增长仍然是一个挑战。我们相信,对第五范式平台的一瞥可以为其在其他领域的应用铺平道路。

关键词：机器学习自动构建天河一号数据驱动神经网络动态数据知识整合科学研究

来源：评论

学校读者我要写书评

暂无评论

Surrogate Supervision-based Deep Weakly-supervised Anomaly Detection

Surrogate Supervision-based Deep Weakly-supervised Anomaly D...

引用

IEEE International Conference on Data Mining Workshops (ICDM Workshops)

作者： Zhiyue Wu Hongzuo Xu Yijie Wang Yongjun Wang Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9781665424288

Many anomaly detection applications can provide partially observed anomalies, but only limited work is for this setting. Additionally, a number of anomaly detectors focus on learning a particular model of normal/abnormal class. However, the intra-class model might be too complicated to be accurately learned. It is still a non-trivial task to handle data with anomalies/inliers in skewed and heterogeneous distributions. To address these problems, this paper proposes an anomaly detection method to leverage Partially Labeled anomalies via Surrogate supervision-based Deviation learning (denominated PLSD). The original supervision (i.e., known anomalies and a set of explored inliers) is transferred to semantic-rich surrogate supervision signals (i.e., anomaly-inlier and inlier-inlier class) via vector concatenation. Then different relationships and interactions between anomalies and inliers are directly and efficiently learned thanks to the neural network’s connection property. Anomaly scoring is processed via the trained network and the high-efficacy inliers. Extensive experiments show that PLSD significantly prevails state-of-the-art semi/weakly-supervised anomaly detectors.

关键词： Learning systems Conferences Detectors Gaussian distribution Benchmark testing Task analysis Anomaly detection

来源：评论

学校读者我要写书评

暂无评论

Word Embedding-based Context-sensitive Network Flow Payload Anomaly Detection

Word Embedding-based Context-sensitive Network Flow Payload ...

引用

International Conference on Applied Machine Learning (ICAML)

作者： Yizhou Li Yijie Wang Li Cheng Hongzuo Xu Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9781665421263

Payload anomaly detection can discover malicious behaviors hidden in network packets. It is hard to handle payload due to its various possible characters and complex semantic context, and thus identifying abnormal payload is also a non-trivial task. Prior art only uses the n-gram language model to extract features, which directly leads to ultra-high-dimensional feature space and also fails to capture the context semantics fully. Accordingly, this paper proposes a word embedding-based context-sensitive network flow payload anomaly detection method (termed WECAD). First, WECAD obtains the initial feature representation of the payload through the word embedding-based method. Then, we propose a corpus pruning algorithm, which applies the cosine similarity clustering and frequency distribution to prune inconsequential characters. We only keep the essential characters to reduce the calculation space. Subsequently, we propose a context learning algorithm. It employs the co-occurrence matrix transformation technology and introduces the backward step size to consider the order relationship of essential characters. Comprehensive experiments on real-world intrusion detection datasets validate the effectiveness of our method.

关键词： Art Semantics Clustering algorithms Intrusion detection Machine learning Feature extraction Task analysis

来源：评论

学校读者我要写书评

暂无评论

A survey on deep learning approaches for data integration in autonomous driving system

arXiv

引用

arXiv 2023年

作者： Zhu, Xi Wang, Likang Zhou, Caifa Cao, Xiya Gong, Yue Chen, Lei Riemann Laboratory Huawei Technologies 2012 Laboratories China Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong Parallel Distributed Computing Laboratory Huawei Technologies 2012 Laboratories China

The perception module of self-driving vehicles relies on a multi-sensor system to understand its environment. Recent advancements in deep learning have led to the rapid development of approaches that integrate multi-sensory measurements to enhance perception capabilities. This paper surveys the latest deep learning integration techniques applied to the perception module in autonomous driving systems, categorizing integration approaches based on "what, how, and when to integrate." A new taxonomy of integration is proposed, based on three dimensions: multi-view, multi-modality, and multi-frame. The integration operations and their pros and cons are summarized, providing new insights into the properties of an "ideal" data integration approach that can alleviate the limitations of existing methods. After reviewing hundreds of relevant papers, this survey concludes with a discussion of the key features of an optimal data integration approach. © 2023, CC BY-NC-SA.

关键词： Autonomous vehicles

来源：评论

学校读者我要写书评

暂无评论

Fden: Mining Effective Information of Features in Detecting Network Anomalies

Fden: Mining Effective Information of Features in Detecting ...

引用

IEEE International Conference on Acoustics, Speech and Signal processing

作者： Bin Li Yijie Wang Mingyu Liu Kele Xu Zhongyang Wang Li Cheng Yizhou Li Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha China

Network anomaly detection is important for detecting and reacting to the presence of network attacks. In this paper, we propose a novel method to effectively leverage the features in detecting network anomalies, named FDEn, consisting of flow-based Feature Derivation (FD) and prior knowledge incorporated Ensemble models (En pk). To mine the effective information in features, 149 features are derived to enrich the feature set of the original data with covering more characteristics of network traffic. To leverage these features effectively, an ensemble model En pk, including CatBoost and XGBoost, based on the bagging strategy is proposed to first detect anomalies by combining numerical features and categorical features. And then, En pk adjusts the predicted label of specific data by incorporating the prior knowledge of network security. We conduct empirically experiments on the data set provided by the Network Anomaly Detection Challenge (NADC), in which we obtain average improvement up to 61.6%, 31.7%, 50.2%, and 45.0%, in terms of the cost score, precision, recall and F1-score, respectively.

关键词： Knowledge engineering Conferences Telecommunication traffic Signal processing Feature extraction Numerical models Security

来源：评论

学校读者我要写书评

暂无评论

Deep Isolation Forest for Anomaly Detection

arXiv

引用

arXiv 2022年

作者： Xu, Hongzuo Pang, Guansong Wang, Yijie Wang, Yongjun The College of Computer National University of Defense Technology Changsha410073 China The Science and Technology on Parallel and Distributed Processing Laboratory China The School of Computing and Information Systems Singapore Management University 178902 Singapore

Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear axis-parallel isolation method often leads to (i) failure in detecting hard anomalies that are difficult to isolate in high-dimensional/non-linear-separable data space, and (ii) notorious algorithmic bias that assigns unexpectedly lower anomaly scores to artefact regions. These issues contribute to high false negative errors. Several iForest extensions are introduced, but they essentially still employ shallow, linear data partition, restricting their power in isolating true anomalies. Therefore, this paper proposes deep isolation forest. We introduce a new representation scheme that utilises casually initialised neural networks to map original data into random representation ensembles, where random axis-parallel cuts are subsequently applied to perform the data partition. This representation scheme facilitates high freedom of the partition in the original data space (equivalent to non-linear partition on subspaces of varying sizes), encouraging a unique synergy between random representations and random partition-based isolation. Extensive experiments show that our model achieves significant improvement over state-of-the-art isolation-based methods and deep detectors on tabular, graph and time series datasets;our model also inherits desired scalability from iForest. © 2022, CC BY.

关键词： Anomaly detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：