Transformer models, such as BERT, GPT, and ViT, have been applied to a wide range of areas in recent years, due to their efficacy. In order to improve the training efficiency of Transformer models, different distribut...
Transformer models, such as BERT, GPT, and ViT, have been applied to a wide range of areas in recent years, due to their efficacy. In order to improve the training efficiency of Transformer models, different distributed training approaches have been proposed, like Megatron-LM [8]. However, when multi-dimensional parallelism strategies are considered, due to the complexity, existing works can not harmonize the different strategies well enough to obtain a globally optimal solution. In this paper, we propose a parallelism strategy searching algorithm PTIP, which generates operator-level parallelism strategies consisting of three schemes: data parallelism, tensor parallelism, and pipeline parallelism. PTIP abstracts these three parallelism schemes simultaneously into an auxiliary graph, reformulates the searching problem into a mixed-integer programming (MIP) problem, and uses a MIP solver to obtain a high-quality multi-dimensional strategy. Experiments conducted on Transformers demonstrate that PTIP obtains 13.9% − 24.7% performance improvement compared to Megatron-LM [8].
Recent advances in single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for reconstruction gene regulation networks (GRNs). At present, many different models have been proposed to inf...
Recent advances in single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for reconstruction gene regulation networks (GRNs). At present, many different models have been proposed to infer GRN from a large number of RNA-seq data, but most deep learning models use a priori gene regulatory network to infer potential GRNs. It is a challenge to reconstruct GRNs from scRNA-seq data due to the noise and sparsity introduced by the dropout effect. Here, we propose GAALink, a novel unsupervised deep learning method. It first constructs the gene similarity matrix and then refines it by threshold value. It then learns feature representations of genes through a graphical attention autoencoder that propagates information across genes with different weights. Finally, we use gene feature expression for matrix completion such that the GRNs are reconstructed. Compared with seven existing GRNs reconstruction methods, GAALink achieves more accurate performance on seven scRNA-seq dataset with four ground truth networks. GAALink can provide a useful tool for inferring GRNs for scRNA-seq expression data.
Discourse structure analysis has shown to be useful for many artificial intelligence (AI) tasks such as text sum-marization and text categorization. However, for the Chinese news domain, the discourse structure analys...
详细信息
Discourse structure analysis has shown to be useful for many artificial intelligence (AI) tasks such as text sum-marization and text categorization. However, for the Chinese news domain, the discourse structure analysis system is still immature due to the limitation of the lack of expert-annotated datasets. In this paper, we present CNA, a Chinese news corpus containing 1155 news articles annotated by human experts, which covers four domains and four news media sources. Next, we implement several text classification methods as baselines. Experimental results demonstrate that document-level method can achieve a better performance, and we further propose a document-level neural network model with multiple sentence features which achieves the state-of-the-art performance. In the end, we analyze the content type distribution of each sentence in CNA and the prediction errors of our model that occurred on the test set. The codes and dataset will be open-sourced at https://***/gzl98/Chinese_Discourse_Profiling.
Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward ...
详细信息
he size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-c...
详细信息
he size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture of Expert (MoE) has drawn significant attention as it can scale models to extra-large sizes with a stable computation budget. However, inefficient distributed training of large-scale MoE models hinders their broader application. Specifically, a considerable dynamic load imbalance occurs among devices during training, significantly reducing throughput. Several load-balancing works have been proposed to address the challenge. System-level solutions draw more attention for their hardware affinity and non-disruption of model convergence compared to algorithm-level ones. However, they are troubled by high communication costs and poor communication-computation overlapping. To address these challenges, we propose a systematic load-balancing method, Pro-Prophet, which consists of a planner and a scheduler for efficient parallel training of large-scale MoE models. To adapt to the dynamic load imbalance, we profile training statistics and use them to design Pro-Prophet. For lower communication volume, Pro-Prophet planner determines a series of lightweight load-balancing strategies and efficiently searches for a communication-efficient one for training based on the statistics. For sufficient overlapping of communication and computation, Pro-Prophet scheduler schedules the data-dependent operations based on the statistics and operation features, further improving the training throughput. We conduct extensive experiments in four clusters and five MoE models. The results indicate that Pro-Prophet achieves up to 2.66x speedup compared to two popular MoE frameworks including Deepspeed-MoE and FasterMoE. Furthermore, Pro-Prophet has demonstrated a load-balancing improvement of up to 11.01x compared to a representative load-balancing work,
Representation learning on textual network or textual network embedding, which leverages rich textual information associated with the network structure to learn low-dimensional embedding of vertices, has been useful i...
详细信息
Representation learning on textual network or textual network embedding, which leverages rich textual information associated with the network structure to learn low-dimensional embedding of vertices, has been useful in a variety of tasks. However, most approaches learn textual network embedding by using direct neighbors. In this paper, we employ a powerful and spatially localized operation: personalized Page Rank(PPR) to eliminate the restriction of using only the direct connection relationship. Also, we analyze the relationship between PPR and spectral-domain theory, which provides insight into the empirical performance boost. From the experiment, we discovered that the proposed method provides a great improvement in linkprediction tasks, when compared to existing methods, achieving a new state-of-the-art on several real-world benchmark datasets.
Monte Carlo (MC) simulation plays a key role in radiotherapy. Since the simulation time of the MC program cannot fully meet the clinical requirements, we use the ARM-based FT-2000+ multi-core processor for paralleliza...
详细信息
The matrix multiplication-based convolutional algorithm, which can efficiently implement convolutions with different parameters, is the first choice of convolution performance optimization for a given chip. Based on t...
详细信息
Temporal Knowledge Graph Completion (TKGC) aims to predict missing parts of quadruples, which is crucial for real-life knowledge graphs. Compared with methods that only use graph neural networks, the emergence of pre-...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Temporal Knowledge Graph Completion (TKGC) aims to predict missing parts of quadruples, which is crucial for real-life knowledge graphs. Compared with methods that only use graph neural networks, the emergence of pre-trained model has introduced a trend of simultaneously leveraging text and graph structure information. However, most current methods based on pre-trained models struggle to effectively utilize both text and multi-hop graph structure information concurrently, resulting in insufficient association mining of relations. To address the challenge, we propose a novel model: Temporal Closing Path for Pre-trained Language Model-based TKGC (TCP-PLM). We obtain the temporal closing relation path of the target relation through sampling, and use the relation path as a bridge to simultaneously utilize text and multi-hop graph structure information. Moreover, the relation path serves as a tool for mining associations between relations. At the same time, due to the design of entity-independent relation paths, our model can also handle the inductive setting. Our experiments on three benchmarks, along with extensive analysis, demonstrate that our model not only achieves substantial performance enhancements across four metrics compared to other models but also adeptly handles inductive settings.
Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different f...
详细信息
Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different from traditional applications. It is inefficient to use general-purpose platforms for graph applications, thus contributing to the research of specific graph processing platforms. In this survey, we systematically categorize the graph workloads and applications, and provide a detailed review of existing graph processing platforms by dividing them into general-purpose and specialized systems. We thoroughly analyze the implementation technologies including programming models, partitioning strategies, communication models, execution models, and fault tolerance strategies. Finally, we analyze recent advances and present four open problems for future research.
暂无评论