检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Gao, Yifu Qiao, Linbo Kan, Zhigang Wen, Zhihua He, Yongquan Li, Dongsheng National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology Changsha China Xiangjiang Laboratory Changsha China Meituan Beijing China

Temporal knowledge graph question answering (TKGQA) poses a significant challenge task, due to the temporal constraints hidden in questions and the answers sought from dynamic structured knowledge. Although large language models (LLMs) have made considerable progress in their reasoning ability over structured data, their application to the TKGQA task is a relatively unexplored area. This paper first proposes a novel generative temporal knowledge graph question answering framework, GenTKGQA, which guides LLMs to answer temporal questions through two phases: Subgraph Retrieval and Answer Generation. First, we exploit LLM’s intrinsic knowledge to mine temporal constraints and structural links in the questions without extra training, thus narrowing down the subgraph search space in both temporal and structural dimensions. Next, we design virtual knowledge indicators to fuse the graph neural network signals of the subgraph and the text representations of the LLM in a non-shallow way, which helps the open-source LLM deeply understand the temporal order and structural dependencies among the retrieved facts through instruction tuning. Experimental results on two widely used datasets demonstrate the superiority of our model. Copyright © 2024, The Authors. All rights reserved.

关键词： Question answering

来源：评论

学校读者我要写书评

暂无评论

Unified Contextualized Knowledge Embedding Method for Static and Temporal Knowledge Graph

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language Processing 2024年 33卷 82-95页

作者： Yifu Gao Linbo Qiao Zhen Huang Zhigang Kan Yongquan He Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha China Intelligent Game and Decision Laboratory Beijing China Meituan Beijing China

Recent years, there is a growing interest in knowledge graph embedding (KGE), which maps symbolic entities and relations into low-dimensional vector space to effectively represent structured data from the knowledge graph. In addition, the concept of temporal knowledge graph is proposed to document dynamically changing facts in the real world. Existing works attempt to incorporate temporal information into static KGE methods to accomplish temporal knowledge representations. However, existing static or temporal KGE approaches focus on the single query fact and ignore the query-relevant contextual information in the graph structure. This paper moves beyond the traditional way of scoring facts in distinct vector space and proposes a unified framework with pre-trained language models (PLM) to learn dynamic contextualized static/ temporal knowledge graph embeddings, called CoS/TKGE. Given the query-specific subgraph, our model transforms it into an input sequence and uses the PLM to obtain the contextualized knowledge representations, which is flexible adaptive to the input graph contexts. We reformulate the link prediction task as a mask prediction problem to fine-tune the pre-trained language model. And the contrastive learning technique is employed to align dynamic contextual embeddings with static global embeddings. Experimental results on three widely used static and temporal KG datasets show the superiority of our model.

关键词： Knowledge graphs Biological system modeling Vectors Contrastive learning Semantics Context modeling Speech processing Vegetation Transforms Tensors

来源：评论

学校读者我要写书评

暂无评论

A Physics and Data-Driven Hybrid PINNs Intelligent computing Method for Nuclear Engineering Simulation 4

A Physics and Data-Driven Hybrid PINNs Intelligent Computing...

引用

4th International Conference on Electronic Information Engineering and Computer Science, EIECS 2024

作者： Xie, Yufei Wang, Wenlin Wu, Guohua Yu, Yang An, Ping Sun, Zibin Zhang, Haichuan Luo, Shengfeng Li, Yue School of Automation Wuhan University of Technology Wuhan China Sino-German College of Intelligent Manufacturing Shenzhen Technology University Shenzhen China Nuclear Power Institute of China Chengdu China National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing Changsha China

ISBN: (纸本)9798331531409

In the field of nuclear energy, the Loss of Coolant Accident (LOCA) is recognized as one of the most severe types of nuclear reactor accidents, characterized by its complex physical processes and potentially catastrophic consequences. These challenges impose stringent requirements on safety analysis and emergency response. Accurate prediction and analysis of fluid behavior within pipelines under LOCA conditions are critical for evaluating accident outcomes and formulating response strategies. This paper introduces an innovative intelligent computing approach - a Physics-Informed Neural Networks (PINNs) model driven by both physical data and simulation data, specifically tailored for LOCA conditions. To address the challenges posed by multivariable and complex physical relationships, the six-equation two-fluid model is first simplified to represent the physical processes. Subsequently, a dual-driven PINNs network is developed, integrating both simulation data and physical constraints. The proposed model demonstrates a root mean square error (RMSE) of 0.02, a mean absolute error (MAE) of 0.044, and an R2 of 0.81 in predicting the outcomes under the six-equation model. © 2024 IEEE.

关键词： Nuclear energy

来源：评论

学校读者我要写书评

暂无评论

CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory 24

CHIME: A Cache-Efficient and High-Performance Hybrid Index o...

引用

30th ACM Symposium on Operating Systems Principles, SOSP 2024

作者： Luo, Xuchuan Shen, Jiacheng Zuo, Pengfei Wang, Xin Lyu, Michael R. Zhou, Yangfan National Key Laboratory of Parallel and Distributed Computing Changsha China School of Computer Science Fudan University Shanghai China Duke Kunshan University Kunshan China Huawei Cloud Shenzhen China Shanghai Key Laboratory of Intelligent Information Processing Shanghai China The Chinese University of Hong Kong Hong Kong Hong Kong

ISBN: (纸本)9798400712517

Disaggregated memory (DM) is a widely discussed datacenter architecture in academia and industry. It decouples computing and memory resources from monolithic servers into two network-connected resource pools. Range indexes are widely adopted by storage systems on DM to efficiently locate and query remote data. However, existing range indexes on DM suffer from either high computing-side cache consumption or high memory-side read amplifications. In this paper, we propose CHIME, a hybrid index combining B+ trees with hopscotch hashing, to achieve low cache consumption and low read amplifications simultaneously. There are three challenges in constructing CHIME on DM, i.e., the complicated optimistic synchronization, the extra metadata access, and the read amplifications introduced by hopscotch hashing. CHIME leverages 1) a three-level optimistic synchronization scheme to synchronize read and write operations with various granularities, 2) an access-aggregated metadata management technique to eliminate extra metadata accesses by piggybacking and replicating metadata, and 3) an effective hotness-aware speculative read mechanism to mitigate the read amplifications of hopscotch hashing. Experimental results show that CHIME outperforms the state-of-the-art range indexes on DM by up to 5.1× with the same cache size and achieves similar performance with up to 8.7× lower cache consumption. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Local-Adaptive Transformer for Multivariate Time Series Anomaly Detection and Diagnosis

Local-Adaptive Transformer for Multivariate Time Series Anom...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Xiaohui Zhou Yijie Wang Hongzuo Xu Mingyu Liu Ruyi Zhang National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, whereas some anomalies can be only identified through long temporal contextual information. This may finally lead to disastrous outcomes due to false negatives of these anomalies. Prior arts employ Transformers (i.e., a neural network architecture that has powerful capability in modeling long-term dependence and global association) to alleviate this problem; however, Transformers are insensitive in sensing local context, which may neglect subtle anomalies. Therefore, in this paper, we propose a local-adaptive Transformer based on cross-correlation for time series anomaly detection, which unifies both global and local information to capture comprehensive time series patterns. Specifically, we devise a cross-correlation mechanism by employing causal convolution to adaptively capture local pattern variation, offering diverse local information into the long-term temporal learning process. Furthermore, a novel optimization objective is utilized to jointly optimize reconstruction of the entire time series and matrix derived from cross-correlation mechanism, which prevents the cross-correlation from becoming trivial in the training phase. The generated cross-correlation matrix reveals underlying interactions between dimensions of multivariate time series, which provides valuable insights into anomaly diagnosis. Extensive experiments on six real-world datasets demonstrate that our model outperforms state-of-the-art competing methods and achieves 6.8%-27.5% $F_{1}$ score improvement. Our method also has good anomaly interpretability and is effective for anomaly diagnosis.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Rethinking the distributed DNN Training Cluster Design from the Cost-effectiveness View

Rethinking the Distributed DNN Training Cluster Design from ...

引用

IEEE International Conference on High Performance computing and Communications (HPCC)

作者： Zhiquan Lai Yujie Liu Wei Wang Yanqi Hao Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Communication Analysis for Multidimensional parallel Training of Large-scale DNN Models

Communication Analysis for Multidimensional Parallel Trainin...

引用

IEEE International Conference on High Performance computing and Communications (HPCC)

作者： Zhiquan Lai Yanqi Hao Shengwei Li Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Area-NeRF: Area-based Neural Radiance Fields

Area-NeRF: Area-based Neural Radiance Fields

引用

Image Processing, Computer Vision and Machine Learning (ICICML), International Conference on

作者： Zonxin Ye Wenyu Li Peng Qiao Yong Dou National Key Laboratory of Parallel and Distributed Computing School of Computer National University of Defense Technology Changsha China

Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient Large Models Fine-tuning on Commodity Servers via Memory-balanced Pipeline parallelism

Efficient Large Models Fine-tuning on Commodity Servers via ...

引用

IEEE International Conference on High Performance computing and Communications (HPCC)

作者： Yujie Liu Zhiquan Lai Weijie Liu Wei Wang Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Highly parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies

arXiv

引用

arXiv 2025年

作者： He, Zhouyu Qiao, Peng Li, Rongchun Dou, Yong Tan, Yusong College of Computer Science and Technology National University of Defense Technology China National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology China

As the demands for superior agents grow, the training complexity of Deep Reinforcement Learning (DRL) becomes higher. Thus, accelerating training of DRL has become a major research focus. Dividing the DRL training process into subtasks and using parallel computation can effectively reduce training costs. However, current DRL training systems lack sufficient parallelization due to data assignment between subtask components. This assignment issue has been ignored, but addressing it can further boost training efficiency. Therefore, we propose a high-throughput distributed RL training system called TianJi. It relaxes assignment dependencies between subtask components and enables event-driven asynchronous communication. Meanwhile, TianJi maintains clear boundaries between subtask components. To address convergence uncertainty from relaxed assignment dependencies, TianJi proposes a distributed strategy based on the balance of sample production and consumption. The strategy controls the staleness of samples to correct their quality, ensuring convergence. We conducted extensive experiments. TianJi achieves a convergence time acceleration ratio of up to 4.37 compared to related comparison systems. When scaled to eight computational nodes, TianJi shows a convergence time speedup of 1.6 and a throughput speedup of 7.13 relative to XingTian, demonstrating its capability to accelerate training and scalability. In data transmission efficiency experiments, TianJi significantly outperforms other systems, approaching hardware limits. TianJi also shows effectiveness in on-policy algorithms, achieving convergence time acceleration ratios of 4.36 and 2.95 compared to RLlib and XingTian. TianJi is accessible at https://***/HiPRL/***. © 2025, CC BY.

关键词： Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：