检索结果-内蒙古大学图书馆

Artificial Intelligence and Industrial Technology Applications (AIITA), International Conference on

作者： Jingyan Zhang Dawei Feng Bo Ding National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha China

ISBN: (数字)9798331509767

ISBN: (纸本)9798331509774

Retrieval-Augmented Generation (RAG) enhances large language models' response quality by incorporating external knowledge. However, standard RAG's text chunking often causes semantic incompleteness due to absent entity information, reducing recall rates. This challenge is particularly evident in narrative texts, where the number of pronouns often far exceeds the number of explicit entity references. To address this, we introduce CoRAG, a novel framework that integrates a coreference resolution module into RAG's preprocessing pipeline. Representing, to our knowledge, the first application of coreference resolution to RAG preprocessing, CoRAG restores semantic coherence to text chunks efficiently. We systematically evaluated the impact of various coreference resolution and generative models on CoRAG's performance. Experiments on the NarrativeQA dataset, utilizing DeepSeek-14B as the foundational model, demonstrate that CoRAG increases accuracy from 0.37 to 0.40 and the F1 score from 0.26 to 0.32, surpassing conventional RAG methods. Furthermore, by processing text offline, CoRAG achieves these gains cost-effectively without sacrificing real-time response efficiency. These results underscore CoRAG's effectiveness in improving RAG performance for narrative tasks. This framework offers a viable avenue for future optimization in knowledge-augmented language modeling.

关键词： Accuracy Query processing Retrieval augmented generation Semantics Pipelines Coherence Real-time systems Time factors Optimization Standards

来源：评论

学校读者我要写书评

暂无评论

HSDP: Accelerating Large-scale Model Training via Efficient Sharded Data parallelism

HSDP: Accelerating Large-scale Model Training via Efficient ...

引用

International Symposium on parallel and distributed Processing with Applications, ISPA

作者： Yanqi Hao Zhiquan Lai Wei Wang Shengwei Li Weijie Liu Keshi Ge Dongsheng Li National Key Laboratory of Parallel and Distributed Computing(PDL) College of Computer National University of Defense Technology Changsha China

ISBN: (数字)9798331509712

ISBN: (纸本)9798331509729

Large deep neural network (DNN) models have demonstrated exceptional performance across diverse downstream tasks. Sharded data parallelism (SDP) has been widely used to reduce the memory footprint of model states. In a DNN training cluster, a device usually has multiple inter-device links that connect to other devices, like NVLink and InfiniBand. However, existing SDP approaches employ a single link at any given time, encountering challenges in efficient training due to significant communication overheads. We observe that the inter-device links can work independently without affecting each other. To reduce the fatal communication overhead of distributed training of large DNNs, this paper introduces HSDP, an efficient SDP training approach that enables the simultaneous utilization of multiple inter-device links. HSDP partitions models in a novel fine-grained manner and orchestrates the communication processes of partitioned parameters while considering inter-device links. This design enables concurrent communication execution and reduces communication overhead. To further optimize the training performance of HSDP, we propose a HSDP planner. The HSDP planner first abstracts the model partition and execution of HSDP into a communication parallel strategy, and builds a cost model to estimate the performance of each strategy. We then formulate the strategy searching as an optimization problem and solve it with an off-the-shelf solver. Evaluations on representative DNN workloads demonstrate that HSDP achieves up to 1.30× speedup compared to the state-of-the-art SDP training approaches.

关键词： Training Performance evaluation Costs distributed databases Artificial neural networks parallel processing Throughput Search problems Data models Optimization

来源：评论

学校读者我要写书评

暂无评论

CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory 24

CHIME: A Cache-Efficient and High-Performance Hybrid Index o...

引用

30th ACM Symposium on Operating Systems Principles, SOSP 2024

作者： Luo, Xuchuan Shen, Jiacheng Zuo, Pengfei Wang, Xin Lyu, Michael R. Zhou, Yangfan National Key Laboratory of Parallel and Distributed Computing Changsha China School of Computer Science Fudan University Shanghai China Duke Kunshan University Kunshan China Huawei Cloud Shenzhen China Shanghai Key Laboratory of Intelligent Information Processing Shanghai China The Chinese University of Hong Kong Hong Kong Hong Kong

ISBN: (纸本)9798400712517

Disaggregated memory (DM) is a widely discussed datacenter architecture in academia and industry. It decouples computing and memory resources from monolithic servers into two network-connected resource pools. Range indexes are widely adopted by storage systems on DM to efficiently locate and query remote data. However, existing range indexes on DM suffer from either high computing-side cache consumption or high memory-side read amplifications. In this paper, we propose CHIME, a hybrid index combining B+ trees with hopscotch hashing, to achieve low cache consumption and low read amplifications simultaneously. There are three challenges in constructing CHIME on DM, i.e., the complicated optimistic synchronization, the extra metadata access, and the read amplifications introduced by hopscotch hashing. CHIME leverages 1) a three-level optimistic synchronization scheme to synchronize read and write operations with various granularities, 2) an access-aggregated metadata management technique to eliminate extra metadata accesses by piggybacking and replicating metadata, and 3) an effective hotness-aware speculative read mechanism to mitigate the read amplifications of hopscotch hashing. Experimental results show that CHIME outperforms the state-of-the-art range indexes on DM by up to 5.1× with the same cache size and achieves similar performance with up to 8.7× lower cache consumption. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Pro-Prophet: A Systematic Load Balancing Method for Efficient parallel Training of Large-scale MoE Models

arXiv

引用

arXiv 2024年

作者： Wang, Wei Lai, Zhiquan Li, Shengwei Liu, Weijie Ge, Keshi Shen, Ao Su, Huayou Li, Dongsheng The National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Hunan Changsha China

he size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture of Expert (MoE) has drawn significant attention as it can scale models to extra-large sizes with a stable computation budget. However, inefficient distributed training of large-scale MoE models hinders their broader application. Specifically, a considerable dynamic load imbalance occurs among devices during training, significantly reducing throughput. Several load-balancing works have been proposed to address the challenge. System-level solutions draw more attention for their hardware affinity and non-disruption of model convergence compared to algorithm-level ones. However, they are troubled by high communication costs and poor communication-computation overlapping. To address these challenges, we propose a systematic load-balancing method, Pro-Prophet, which consists of a planner and a scheduler for efficient parallel training of large-scale MoE models. To adapt to the dynamic load imbalance, we profile training statistics and use them to design Pro-Prophet. For lower communication volume, Pro-Prophet planner determines a series of lightweight load-balancing strategies and efficiently searches for a communication-efficient one for training based on the statistics. For sufficient overlapping of communication and computation, Pro-Prophet scheduler schedules the data-dependent operations based on the statistics and operation features, further improving the training throughput. We conduct extensive experiments in four clusters and five MoE models. The results indicate that Pro-Prophet achieves up to 2.66x speedup compared to two popular MoE frameworks including Deepspeed-MoE and FasterMoE. Furthermore, Pro-Prophet has demonstrated a load-balancing improvement of up to 11.01x compared to a representative load-balancing work,

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

Area-NeRF: Area-based Neural Radiance Fields

Area-NeRF: Area-based Neural Radiance Fields

引用

Image Processing, computer Vision and Machine Learning (ICICML), International Conference on

作者： Zonxin Ye Wenyu Li Peng Qiao Yong Dou National Key Laboratory of Parallel and Distributed Computing School of Computer National University of Defense Technology Changsha China

Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Temporal Closing Path for PLM-based Temporal Knowledge Graph Completion

Temporal Closing Path for PLM-based Temporal Knowledge Graph...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Xin Zhou Yongxue Shan Zixuan Dong Haijiao Liu Xiaodong Wang National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Denfense Technology Changsha China

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Temporal Knowledge Graph Completion (TKGC) aims to predict missing parts of quadruples, which is crucial for real-life knowledge graphs. Compared with methods that only use graph neural networks, the emergence of pre-trained model has introduced a trend of simultaneously leveraging text and graph structure information. However, most current methods based on pre-trained models struggle to effectively utilize both text and multi-hop graph structure information concurrently, resulting in insufficient association mining of relations. To address the challenge, we propose a novel model: Temporal Closing Path for Pre-trained Language Model-based TKGC (TCP-PLM). We obtain the temporal closing relation path of the target relation through sampling, and use the relation path as a bridge to simultaneously utilize text and multi-hop graph structure information. Moreover, the relation path serves as a tool for mining associations between relations. At the same time, due to the design of entity-independent relation paths, our model can also handle the inductive setting. Our experiments on three benchmarks, along with extensive analysis, demonstrate that our model not only achieves substantial performance enhancements across four metrics compared to other models but also adeptly handles inductive settings.

关键词： Training Measurement Knowledge engineering Bridges Analytical models Knowledge graphs Transforms

来源：评论

学校读者我要写书评

暂无评论

Unified Contextualized Knowledge Embedding Method for Static and Temporal Knowledge Graph

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language Processing 2024年 33卷 82-95页

作者： Yifu Gao Linbo Qiao Zhen Huang Zhigang Kan Yongquan He Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha China Intelligent Game and Decision Laboratory Beijing China Meituan Beijing China

Recent years, there is a growing interest in knowledge graph embedding (KGE), which maps symbolic entities and relations into low-dimensional vector space to effectively represent structured data from the knowledge graph. In addition, the concept of temporal knowledge graph is proposed to document dynamically changing facts in the real world. Existing works attempt to incorporate temporal information into static KGE methods to accomplish temporal knowledge representations. However, existing static or temporal KGE approaches focus on the single query fact and ignore the query-relevant contextual information in the graph structure. This paper moves beyond the traditional way of scoring facts in distinct vector space and proposes a unified framework with pre-trained language models (PLM) to learn dynamic contextualized static/ temporal knowledge graph embeddings, called CoS/TKGE. Given the query-specific subgraph, our model transforms it into an input sequence and uses the PLM to obtain the contextualized knowledge representations, which is flexible adaptive to the input graph contexts. We reformulate the link prediction task as a mask prediction problem to fine-tune the pre-trained language model. And the contrastive learning technique is employed to align dynamic contextual embeddings with static global embeddings. Experimental results on three widely used static and temporal KG datasets show the superiority of our model.

关键词： Knowledge graphs Biological system modeling Vectors Contrastive learning Semantics Context modeling Speech processing Vegetation Transforms Tensors

来源：评论

学校读者我要写书评

暂无评论

Highly parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies

arXiv

引用

arXiv 2025年

作者： He, Zhouyu Qiao, Peng Li, Rongchun Dou, Yong Tan, Yusong College of Computer Science and Technology National University of Defense Technology China National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology China

As the demands for superior agents grow, the training complexity of Deep Reinforcement Learning (DRL) becomes higher. Thus, accelerating training of DRL has become a major research focus. Dividing the DRL training process into subtasks and using parallel computation can effectively reduce training costs. However, current DRL training systems lack sufficient parallelization due to data assignment between subtask components. This assignment issue has been ignored, but addressing it can further boost training efficiency. Therefore, we propose a high-throughput distributed RL training system called TianJi. It relaxes assignment dependencies between subtask components and enables event-driven asynchronous communication. Meanwhile, TianJi maintains clear boundaries between subtask components. To address convergence uncertainty from relaxed assignment dependencies, TianJi proposes a distributed strategy based on the balance of sample production and consumption. The strategy controls the staleness of samples to correct their quality, ensuring convergence. We conducted extensive experiments. TianJi achieves a convergence time acceleration ratio of up to 4.37 compared to related comparison systems. When scaled to eight computational nodes, TianJi shows a convergence time speedup of 1.6 and a throughput speedup of 7.13 relative to XingTian, demonstrating its capability to accelerate training and scalability. In data transmission efficiency experiments, TianJi significantly outperforms other systems, approaching hardware limits. TianJi also shows effectiveness in on-policy algorithms, achieving convergence time acceleration ratios of 4.36 and 2.95 compared to RLlib and XingTian. TianJi is accessible at https://***/HiPRL/***. © 2025, CC BY.

关键词： Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

An in-depth Exploration of LAMOST Unknown Spectra Based on Density Clustering

引用

Research in Astronomy and Astrophysics 2023年第5期23卷 52-65页

作者： Hai-Feng Yang Xiao-Na Yin Jiang-Hui Cai Yu-Qing Yang A-Li Luo Zhong-Rui Bai Li-Chan Zhou Xu-Jun Zhao Ya-Ling Xun Shanxi Key Laboratory of Big Data Analysis and Parallel Computing Taiyuan University of Science and TechnologyTaiyuan 030024China School of Computer Science and Technology Taiyuan University of Science and TechnologyTaiyuan 030024China School of Computer Science and Technology North University of ChinaTaiyuan 030051China National Astronomical Observatories Chinese Academy of SciencesBeijing 100101China

Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST) has completed the observation of nearly 20 million celestial objects,including a class of spectra labeled “Unknown.” Besides low signal-to-noise ratio,these spectra often show some anomalous features that do not work well with current *** this paper,a total of 637,889 “Unknown” spectra from LAMOST DR5 are selected,and an unsupervised-based analytical framework of “Unknown” spectra named SA-Frame(Spectra Analysis-Frame) is provided to explore their origins from different *** SA-Frame is composed of three parts:NAPC-Spec clustering,characterization and origin ***,NAPC-Spec(Nonparametric density clustering algorithm for spectra) characterizes different features in the “unknown” spectrum by adjusting the influence space and divergence distance to minimize the effects of noise and high dimensionality,resulting in 13 ***,characteristic extraction and representation of clustering results are carried out based on spectral lines and continuum,where these 13 types are characterized as regular spectra with low S/Ns,splicing problems,suspected galactic emission signals,contamination from city light and un-gregarious type ***,a preliminary analysis of their origins is made from the characteristics of the observational targets,contamination from the sky,and the working status of the *** results would be valuable for improving the overall data quality of large-scale spectral surveys.

关键词： methods data analysis-surveys-techniques spectroscopic-site testing-methods analytical

来源：评论

学校读者我要写书评

暂无评论

Hemo-FS-SAM2: Few-Shot Tuning on Hemorrhage Segmentation of Segment Anything Model 2

Hemo-FS-SAM2: Few-Shot Tuning on Hemorrhage Segmentation of ...

引用

Advanced Algorithms and Control Engineering (ICAACE), International Conference on

作者： Xi Wang Peng Qiao Yong Dou National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology National University of Defense Technology Changsha China

ISBN: (数字)9798331535087

ISBN: (纸本)9798331535094

Surgical hemorrhage is a common occurrence in surgeries. Accurate segmentation of hemorrhage regions is important for surgical navigation and post-operative assessment. Some segmentation models focused on medical images, but its performance on hemorrhage data is limited. Besides, there is an essential challenge to annotate a large number of hemorrhage data, and previous segmentation methods struggle with complex hemorrhage characteristics like unclear boundaries and scattered targets. The Segment Anything Model 2 (SAM2) shows an significant zero-shot ability in general image segmentation, and is generalizable to fine-tuned on downstream tasks. However, it often faces limitations in hemorrhage segmentation task that lacking annotations. This paper proposes a fine-tuning approach for SAM2, significantly improving its performance on hemorrhage segmentation tasks with limited data. Our method provides better segmentation performance on few-shot hemorrhage data than SAM and SAM2 based models.

关键词： Image segmentation Adaptation models Accuracy Annotations Shape Surgery Data models Hemorrhaging Tuning Biomedical imaging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：