All-to-all communication has a wide range of ap-plications in parallel applications like FFT. On most supercom-puters, there are multiple cores in a node. Message aggregation is an efficient method for smaller message...
详细信息
All-to-all communication has a wide range of ap-plications in parallel applications like FFT. On most supercom-puters, there are multiple cores in a node. Message aggregation is an efficient method for smaller messages. Using multi-leader to aggregate message show significant improvement in intra-node overhead. However, compared to one-leader aggregation, existing multi-leader design incurs more message count and less aggregated message size. This paper proposes an Overlapped Multi-worker Multi-port all-to-all (OVALL) to scale the message size and parallelism of the aggregation algorithm. The algorithm exploits the all-to-all multi-core parallelism, concurrent commu-nication, and overlapping capabilities. Experiment results show that, compared to systems built-in MPI, OVALL'implementation achieves up to 5.9x or 18x speedup compared to MPI on different HPC systems. For the Fast Fourier Transform (FFT) application, OVALL is up to 2.7x (8192 cores, system A) or 5.6x (4800 cores, system B) faster compared to built-in MPI on peak performance.
The correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads t...
详细信息
The correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads to high memory capacity requirement and computer capacity requirements of the training process. The multi-accelerator parallelism is a promising choice for the two challenges, which deploys multiple accelerators in parallel for training neural networks. Among them, the pipeline parallel scheme has a great advantage in training speed, but its memory capacity requirements are relatively higher than other parallel schemes. Aiming at solving this challenge of pipeline parallel scheme, we propose a data transfer mechanism, which effectively reduces the peak memory usage of the training process by real-time data transferring. In the experiment, we implement our design and apply it to Pipedream, a mature pipeline parallel scheme. The memory requirement of training process is reduced by up to 48.5%, and the speed loss is kept within a reasonable range.
Fully capturing contextual information and analyzing the association between entity semantics and type is helpful for joint extraction task: 1) The context can reflect the part of speech and semantics of entity. 2) Th...
详细信息
With the development of Deep Learning (DL), Deep Neural Network (DNN) models have become more complex. At the same time, the development of the Internet makes it easy to obtain large data sets for DL training. Large-s...
详细信息
With the development of Deep Learning (DL), Deep Neural Network (DNN) models have become more complex. At the same time, the development of the Internet makes it easy to obtain large data sets for DL training. Large-scale model parameters and training data enhance the level of AI by improving the accuracy of DNN models. But on the other hand, they also present more severe challenges to the hardware training platform because training a large model needs a lot of computing and memory resources that can easily exceed the capacity of a single processor. In this context, integrating more processors on a hierarchical system to conduct distributed training is a direction for the development of training platforms. In distributed training, collective communication operations (including all-to-all, all-reduce, and all-gather) take up a lot of training time, making the interconnection network between computing nodes one of the most critical factors affecting the system performance. The hierarchical torus topology, combined with the Ring All-Reduce collective communication algorithm, is one of the current mainstream distributed interconnection networks. However, we believe that its communication performance is not the best. In this work, we first designed a new intra-package communication topology, i.e. the switch-based fully connected topology, which shortens the time consumed by cross-node communication. Then, considering the characteristics of this topology, we carefully devised more efficient all-reduce and all-gather communication algorithms. Finally, combined with the torus topology, we implemented a novel distributed DL training platform. Compared with the hierarchical torus, our platform improves communication efficiency and provides 1.16-2.68 times speedup in distributed training of DNN models.
Many anomaly detection applications can provide partially observed anomalies, but only limited work is for this setting. Additionally, a number of anomaly detectors focus on learning a particular model of normal/abnor...
详细信息
ISBN:
(纸本)9781665424288
Many anomaly detection applications can provide partially observed anomalies, but only limited work is for this setting. Additionally, a number of anomaly detectors focus on learning a particular model of normal/abnormal class. However, the intra-class model might be too complicated to be accurately learned. It is still a non-trivial task to handle data with anomalies/inliers in skewed and heterogeneous distributions. To address these problems, this paper proposes an anomaly detection method to leverage Partially Labeled anomalies via Surrogate supervision-based Deviation learning (denominated PLSD). The original supervision (i.e., known anomalies and a set of explored inliers) is transferred to semantic-rich surrogate supervision signals (i.e., anomaly-inlier and inlier-inlier class) via vector concatenation. Then different relationships and interactions between anomalies and inliers are directly and efficiently learned thanks to the neural network’s connection property. Anomaly scoring is processed via the trained network and the high-efficacy inliers. Extensive experiments show that PLSD significantly prevails state-of-the-art semi/weakly-supervised anomaly detectors.
Payload anomaly detection can discover malicious behaviors hidden in network packets. It is hard to handle payload due to its various possible characters and complex semantic context, and thus identifying abnormal pay...
详细信息
ISBN:
(纸本)9781665421263
Payload anomaly detection can discover malicious behaviors hidden in network packets. It is hard to handle payload due to its various possible characters and complex semantic context, and thus identifying abnormal payload is also a non-trivial task. Prior art only uses the n-gram language model to extract features, which directly leads to ultra-high-dimensional feature space and also fails to capture the context semantics fully. Accordingly, this paper proposes a word embedding-based context-sensitive network flow payload anomaly detection method (termed WECAD). First, WECAD obtains the initial feature representation of the payload through the word embedding-based method. Then, we propose a corpus pruning algorithm, which applies the cosine similarity clustering and frequency distribution to prune inconsequential characters. We only keep the essential characters to reduce the calculation space. Subsequently, we propose a context learning algorithm. It employs the co-occurrence matrix transformation technology and introduces the backward step size to consider the order relationship of essential characters. Comprehensive experiments on real-world intrusion detection datasets validate the effectiveness of our method.
The perception module of self-driving vehicles relies on a multi-sensor system to understand its environment. Recent advancements in deep learning have led to the rapid development of approaches that integrate multi-s...
详细信息
Network anomaly detection is important for detecting and reacting to the presence of network attacks. In this paper, we propose a novel method to effectively leverage the features in detecting network anomalies, named...
详细信息
Network anomaly detection is important for detecting and reacting to the presence of network attacks. In this paper, we propose a novel method to effectively leverage the features in detecting network anomalies, named FDEn, consisting of flow-based Feature Derivation (FD) and prior knowledge incorporated Ensemble models (En pk). To mine the effective information in features, 149 features are derived to enrich the feature set of the original data with covering more characteristics of network traffic. To leverage these features effectively, an ensemble model En pk, including CatBoost and XGBoost, based on the bagging strategy is proposed to first detect anomalies by combining numerical features and categorical features. And then, En pk adjusts the predicted label of specific data by incorporating the prior knowledge of network security. We conduct empirically experiments on the data set provided by the Network Anomaly Detection Challenge (NADC), in which we obtain average improvement up to 61.6%, 31.7%, 50.2%, and 45.0%, in terms of the cost score, precision, recall and F1-score, respectively.
Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear ...
详细信息
暂无评论