Temporal knowledge graph question answering (TKGQA) poses a significant challenge task, due to the temporal constraints hidden in questions and the answers sought from dynamic structured knowledge. Although large lang...
详细信息
Recent years, there is a growing interest in knowledge graph embedding (KGE), which maps symbolic entities and relations into low-dimensional vector space to effectively represent structured data from the knowledge gr...
详细信息
Recent years, there is a growing interest in knowledge graph embedding (KGE), which maps symbolic entities and relations into low-dimensional vector space to effectively represent structured data from the knowledge graph. In addition, the concept of temporal knowledge graph is proposed to document dynamically changing facts in the real world. Existing works attempt to incorporate temporal information into static KGE methods to accomplish temporal knowledge representations. However, existing static or temporal KGE approaches focus on the single query fact and ignore the query-relevant contextual information in the graph structure. This paper moves beyond the traditional way of scoring facts in distinct vector space and proposes a unified framework with pre-trained language models (PLM) to learn dynamic contextualized static/ temporal knowledge graph embeddings, called CoS/TKGE. Given the query-specific subgraph, our model transforms it into an input sequence and uses the PLM to obtain the contextualized knowledge representations, which is flexible adaptive to the input graph contexts. We reformulate the link prediction task as a mask prediction problem to fine-tune the pre-trained language model. And the contrastive learning technique is employed to align dynamic contextual embeddings with static global embeddings. Experimental results on three widely used static and temporal KG datasets show the superiority of our model.
In the field of nuclear energy, the Loss of Coolant Accident (LOCA) is recognized as one of the most severe types of nuclear reactor accidents, characterized by its complex physical processes and potentially catastrop...
详细信息
Disaggregated memory (DM) is a widely discussed datacenter architecture in academia and industry. It decouples computing and memory resources from monolithic servers into two network-connected resource pools. Range in...
详细信息
Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, wherea...
Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, whereas some anomalies can be only identified through long temporal contextual information. This may finally lead to disastrous outcomes due to false negatives of these anomalies. Prior arts employ Transformers (i.e., a neural network architecture that has powerful capability in modeling long-term dependence and global association) to alleviate this problem; however, Transformers are insensitive in sensing local context, which may neglect subtle anomalies. Therefore, in this paper, we propose a local-adaptive Transformer based on cross-correlation for time series anomaly detection, which unifies both global and local information to capture comprehensive time series patterns. Specifically, we devise a cross-correlation mechanism by employing causal convolution to adaptively capture local pattern variation, offering diverse local information into the long-term temporal learning process. Furthermore, a novel optimization objective is utilized to jointly optimize reconstruction of the entire time series and matrix derived from cross-correlation mechanism, which prevents the cross-correlation from becoming trivial in the training phase. The generated cross-correlation matrix reveals underlying interactions between dimensions of multivariate time series, which provides valuable insights into anomaly diagnosis. Extensive experiments on six real-world datasets demonstrate that our model outperforms state-of-the-art competing methods and achieves 6.8%-27.5% $F_{1}$ score improvement. Our method also has good anomaly interpretability and is effective for anomaly diagnosis.
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking ...
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes.
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bott...
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the infl...
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large ...
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance.
As the demands for superior agents grow, the training complexity of Deep Reinforcement Learning (DRL) becomes higher. Thus, accelerating training of DRL has become a major research focus. Dividing the DRL training pro...
详细信息
暂无评论