With the rapid advancement of artificial intelligence, chips have become increasingly important. The emerging RISC-V instruction set gradually provides powerful computing support for this field. In this context, along...
详细信息
ISBN:
(数字)9798331541750
ISBN:
(纸本)9798331541767
With the rapid advancement of artificial intelligence, chips have become increasingly important. The emerging RISC-V instruction set gradually provides powerful computing support for this field. In this context, along with the computing requirements of deep learning, this paper presents the design of a high-performance floating-point arithmetic logic unit (FALU) that facilitates calculations with double-precision, single-precision, half-precision, and Bfloat16 precision data. This design is based on a single-channel algorithm with merged rounding. It improves and implements a composite adder that combines high and low bits. It also proposes a tree-like floating-point comparator based on the Kogge-Stone parallel prefix network. To ensure that the FALU components meet performance requirements, we undergo functional verification in the Vivado simulation environment. Operating at 1.47GHz under the 28nm CMOS process, the components achieve the predetermined performance indicators.
Facts in military field tend to involve elements of time, space, quantity, status, and so on. Existing methods of representing knowledge in the form of triples fail to adequately express these facts, and also cause ob...
详细信息
A lot of efforts have been devoted to solving the problem about complex relationship and localized cooperation among a large number of agents in large-scale multi-agent systems. However, global cooperation among all a...
详细信息
A lot of efforts have been devoted to solving the problem about complex relationship and localized cooperation among a large number of agents in large-scale multi-agent systems. However, global cooperation among all agents is also important while interactions between agents often happen locally. It is a challenging problem to enable agent to learn global and localized cooperate information simultaneously in multi-agent systems. In this paper, we model the global and localized cooperation among agents by global and localized agent graphs and propose a novel graph convolutional reinforcement learning mechanism based on these two graphs which allows each agent to communicate with neighbors and all a-gents to cooperate at the high level. Experiments on the large-scale multi-agent scenarios in StarCraft II show that our pro-posed method gets better performance compared with state-of-the-art algorithms and allows agents learning to cooperate efficiently.
Controlled thermonuclear fusion has always been a dream pursued by mankind. However, the physical processes of controlled thermonuclear fusion are complex, requiring numerical simulations with high performance computi...
详细信息
The correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads t...
详细信息
The correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads to high memory capacity requirement and computer capacity requirements of the training process. The multi-accelerator parallelism is a promising choice for the two challenges, which deploys multiple accelerators in parallel for training neural networks. Among them, the pipeline parallel scheme has a great advantage in training speed, but its memory capacity requirements are relatively higher than other parallel schemes. Aiming at solving this challenge of pipeline parallel scheme, we propose a data transfer mechanism, which effectively reduces the peak memory usage of the training process by real-time data transferring. In the experiment, we implement our design and apply it to Pipedream, a mature pipeline parallel scheme. The memory requirement of training process is reduced by up to 48.5%, and the speed loss is kept within a reasonable range.
Knee osteoarthritis (OA) is a common musculoskeletal illness. To solve the problem that inaccurate knee joint localization and inadequate knee OA features extracted from plain radiographs affect the accuracy of knee O...
详细信息
With the development of Deep Learning (DL), Deep Neural Network (DNN) models have become more complex. At the same time, the development of the Internet makes it easy to obtain large data sets for DL training. Large-s...
详细信息
With the development of Deep Learning (DL), Deep Neural Network (DNN) models have become more complex. At the same time, the development of the Internet makes it easy to obtain large data sets for DL training. Large-scale model parameters and training data enhance the level of AI by improving the accuracy of DNN models. But on the other hand, they also present more severe challenges to the hardware training platform because training a large model needs a lot of computing and memory resources that can easily exceed the capacity of a single processor. In this context, integrating more processors on a hierarchical system to conduct distributed training is a direction for the development of training platforms. In distributed training, collective communication operations (including all-to-all, all-reduce, and all-gather) take up a lot of training time, making the interconnection network between computing nodes one of the most critical factors affecting the system performance. The hierarchical torus topology, combined with the Ring All-Reduce collective communication algorithm, is one of the current mainstream distributed interconnection networks. However, we believe that its communication performance is not the best. In this work, we first designed a new intra-package communication topology, i.e. the switch-based fully connected topology, which shortens the time consumed by cross-node communication. Then, considering the characteristics of this topology, we carefully devised more efficient all-reduce and all-gather communication algorithms. Finally, combined with the torus topology, we implemented a novel distributed DL training platform. Compared with the hierarchical torus, our platform improves communication efficiency and provides 1.16-2.68 times speedup in distributed training of DNN models.
Fully capturing contextual information and analyzing the association between entity semantics and type is helpful for joint extraction task: 1) The context can reflect the part of speech and semantics of entity. 2) Th...
详细信息
ISBN:
(纸本)9781450385053
Fully capturing contextual information and analyzing the association between entity semantics and type is helpful for joint extraction task: 1) The context can reflect the part of speech and semantics of entity. 2) The entity type is closely related to the relation between entities. Previous research used to simply embed the contextual information into shallow layer of the model, ignoring the association between entity semantics and type. In this paper, we propose a graph network with full-information modeling to explicitly model different-level information in the text. The contextual information of entity is dynamically embedded in each span representation to improve the reasoning ability. To capture the fine-grained association between the semantics and type of entity, the graph network uses the feature of entity types to generate edge information between different nodes. Experimental results show that our model outperforms previous models on the CoNLL04 dataset and obtains competitive results on the SciERC dataset in both entity recognition and relation extraction. Extensive additional experiments further verify the effectiveness of the model.
Many anomaly detection applications can provide partially observed anomalies, but only limited work is for this setting. Additionally, a number of anomaly detectors focus on learning a particular model of normal/abnor...
详细信息
ISBN:
(纸本)9781665424288
Many anomaly detection applications can provide partially observed anomalies, but only limited work is for this setting. Additionally, a number of anomaly detectors focus on learning a particular model of normal/abnormal class. However, the intra-class model might be too complicated to be accurately learned. It is still a non-trivial task to handle data with anomalies/inliers in skewed and heterogeneous distributions. To address these problems, this paper proposes an anomaly detection method to leverage Partially Labeled anomalies via Surrogate supervision-based Deviation learning (denominated PLSD). The original supervision (i.e., known anomalies and a set of explored inliers) is transferred to semantic-rich surrogate supervision signals (i.e., anomaly-inlier and inlier-inlier class) via vector concatenation. Then different relationships and interactions between anomalies and inliers are directly and efficiently learned thanks to the neural network’s connection property. Anomaly scoring is processed via the trained network and the high-efficacy inliers. Extensive experiments show that PLSD significantly prevails state-of-the-art semi/weakly-supervised anomaly detectors.
Payload anomaly detection can discover malicious behaviors hidden in network packets. It is hard to handle payload due to its various possible characters and complex semantic context, and thus identifying abnormal pay...
详细信息
ISBN:
(纸本)9781665421263
Payload anomaly detection can discover malicious behaviors hidden in network packets. It is hard to handle payload due to its various possible characters and complex semantic context, and thus identifying abnormal payload is also a non-trivial task. Prior art only uses the n-gram language model to extract features, which directly leads to ultra-high-dimensional feature space and also fails to capture the context semantics fully. Accordingly, this paper proposes a word embedding-based context-sensitive network flow payload anomaly detection method (termed WECAD). First, WECAD obtains the initial feature representation of the payload through the word embedding-based method. Then, we propose a corpus pruning algorithm, which applies the cosine similarity clustering and frequency distribution to prune inconsequential characters. We only keep the essential characters to reduce the calculation space. Subsequently, we propose a context learning algorithm. It employs the co-occurrence matrix transformation technology and introduces the backward step size to consider the order relationship of essential characters. Comprehensive experiments on real-world intrusion detection datasets validate the effectiveness of our method.
暂无评论