Retrieval-Augmented Generation (RAG) enhances large language models' response quality by incorporating external knowledge. However, standard RAG's text chunking often causes semantic incompleteness due to abse...
详细信息
ISBN:
(数字)9798331509767
ISBN:
(纸本)9798331509774
Retrieval-Augmented Generation (RAG) enhances large language models' response quality by incorporating external knowledge. However, standard RAG's text chunking often causes semantic incompleteness due to absent entity information, reducing recall rates. This challenge is particularly evident in narrative texts, where the number of pronouns often far exceeds the number of explicit entity references. To address this, we introduce CoRAG, a novel framework that integrates a coreference resolution module into RAG's preprocessing pipeline. Representing, to our knowledge, the first application of coreference resolution to RAG preprocessing, CoRAG restores semantic coherence to text chunks efficiently. We systematically evaluated the impact of various coreference resolution and generative models on CoRAG's performance. Experiments on the NarrativeQA dataset, utilizing DeepSeek-14B as the foundational model, demonstrate that CoRAG increases accuracy from 0.37 to 0.40 and the F1 score from 0.26 to 0.32, surpassing conventional RAG methods. Furthermore, by processing text offline, CoRAG achieves these gains cost-effectively without sacrificing real-time response efficiency. These results underscore CoRAG's effectiveness in improving RAG performance for narrative tasks. This framework offers a viable avenue for future optimization in knowledge-augmented language modeling.
Large deep neural network (DNN) models have demonstrated exceptional performance across diverse downstream tasks. Sharded data parallelism (SDP) has been widely used to reduce the memory footprint of model states. In ...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Large deep neural network (DNN) models have demonstrated exceptional performance across diverse downstream tasks. Sharded data parallelism (SDP) has been widely used to reduce the memory footprint of model states. In a DNN training cluster, a device usually has multiple inter-device links that connect to other devices, like NVLink and InfiniBand. However, existing SDP approaches employ a single link at any given time, encountering challenges in efficient training due to significant communication overheads. We observe that the inter-device links can work independently without affecting each other. To reduce the fatal communication overhead of distributed training of large DNNs, this paper introduces HSDP, an efficient SDP training approach that enables the simultaneous utilization of multiple inter-device links. HSDP partitions models in a novel fine-grained manner and orchestrates the communication processes of partitioned parameters while considering inter-device links. This design enables concurrent communication execution and reduces communication overhead. To further optimize the training performance of HSDP, we propose a HSDP planner. The HSDP planner first abstracts the model partition and execution of HSDP into a communication parallel strategy, and builds a cost model to estimate the performance of each strategy. We then formulate the strategy searching as an optimization problem and solve it with an off-the-shelf solver. Evaluations on representative DNN workloads demonstrate that HSDP achieves up to 1.30× speedup compared to the state-of-the-art SDP training approaches.
Disaggregated memory (DM) is a widely discussed datacenter architecture in academia and industry. It decouples computing and memory resources from monolithic servers into two network-connected resource pools. Range in...
详细信息
he size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-c...
详细信息
he size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture of Expert (MoE) has drawn significant attention as it can scale models to extra-large sizes with a stable computation budget. However, inefficient distributed training of large-scale MoE models hinders their broader application. Specifically, a considerable dynamic load imbalance occurs among devices during training, significantly reducing throughput. Several load-balancing works have been proposed to address the challenge. System-level solutions draw more attention for their hardware affinity and non-disruption of model convergence compared to algorithm-level ones. However, they are troubled by high communication costs and poor communication-computation overlapping. To address these challenges, we propose a systematic load-balancing method, Pro-Prophet, which consists of a planner and a scheduler for efficient parallel training of large-scale MoE models. To adapt to the dynamic load imbalance, we profile training statistics and use them to design Pro-Prophet. For lower communication volume, Pro-Prophet planner determines a series of lightweight load-balancing strategies and efficiently searches for a communication-efficient one for training based on the statistics. For sufficient overlapping of communication and computation, Pro-Prophet scheduler schedules the data-dependent operations based on the statistics and operation features, further improving the training throughput. We conduct extensive experiments in four clusters and five MoE models. The results indicate that Pro-Prophet achieves up to 2.66x speedup compared to two popular MoE frameworks including Deepspeed-MoE and FasterMoE. Furthermore, Pro-Prophet has demonstrated a load-balancing improvement of up to 11.01x compared to a representative load-balancing work,
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the infl...
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.
Temporal Knowledge Graph Completion (TKGC) aims to predict missing parts of quadruples, which is crucial for real-life knowledge graphs. Compared with methods that only use graph neural networks, the emergence of pre-...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Temporal Knowledge Graph Completion (TKGC) aims to predict missing parts of quadruples, which is crucial for real-life knowledge graphs. Compared with methods that only use graph neural networks, the emergence of pre-trained model has introduced a trend of simultaneously leveraging text and graph structure information. However, most current methods based on pre-trained models struggle to effectively utilize both text and multi-hop graph structure information concurrently, resulting in insufficient association mining of relations. To address the challenge, we propose a novel model: Temporal Closing Path for Pre-trained Language Model-based TKGC (TCP-PLM). We obtain the temporal closing relation path of the target relation through sampling, and use the relation path as a bridge to simultaneously utilize text and multi-hop graph structure information. Moreover, the relation path serves as a tool for mining associations between relations. At the same time, due to the design of entity-independent relation paths, our model can also handle the inductive setting. Our experiments on three benchmarks, along with extensive analysis, demonstrate that our model not only achieves substantial performance enhancements across four metrics compared to other models but also adeptly handles inductive settings.
Recent years, there is a growing interest in knowledge graph embedding (KGE), which maps symbolic entities and relations into low-dimensional vector space to effectively represent structured data from the knowledge gr...
详细信息
Recent years, there is a growing interest in knowledge graph embedding (KGE), which maps symbolic entities and relations into low-dimensional vector space to effectively represent structured data from the knowledge graph. In addition, the concept of temporal knowledge graph is proposed to document dynamically changing facts in the real world. Existing works attempt to incorporate temporal information into static KGE methods to accomplish temporal knowledge representations. However, existing static or temporal KGE approaches focus on the single query fact and ignore the query-relevant contextual information in the graph structure. This paper moves beyond the traditional way of scoring facts in distinct vector space and proposes a unified framework with pre-trained language models (PLM) to learn dynamic contextualized static/ temporal knowledge graph embeddings, called CoS/TKGE. Given the query-specific subgraph, our model transforms it into an input sequence and uses the PLM to obtain the contextualized knowledge representations, which is flexible adaptive to the input graph contexts. We reformulate the link prediction task as a mask prediction problem to fine-tune the pre-trained language model. And the contrastive learning technique is employed to align dynamic contextual embeddings with static global embeddings. Experimental results on three widely used static and temporal KG datasets show the superiority of our model.
As the demands for superior agents grow, the training complexity of Deep Reinforcement Learning (DRL) becomes higher. Thus, accelerating training of DRL has become a major research focus. Dividing the DRL training pro...
详细信息
Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST) has completed the observation of nearly 20 million celestial objects,including a class of spectra labeled “Unknown.” Besides low signal-to-noise rati...
详细信息
Large sky Area Multi-Object fiber Spectroscopic Telescope(LAMOST) has completed the observation of nearly 20 million celestial objects,including a class of spectra labeled “Unknown.” Besides low signal-to-noise ratio,these spectra often show some anomalous features that do not work well with current *** this paper,a total of 637,889 “Unknown” spectra from LAMOST DR5 are selected,and an unsupervised-based analytical framework of “Unknown” spectra named SA-Frame(Spectra Analysis-Frame) is provided to explore their origins from different *** SA-Frame is composed of three parts:NAPC-Spec clustering,characterization and origin ***,NAPC-Spec(Nonparametric density clustering algorithm for spectra) characterizes different features in the “unknown” spectrum by adjusting the influence space and divergence distance to minimize the effects of noise and high dimensionality,resulting in 13 ***,characteristic extraction and representation of clustering results are carried out based on spectral lines and continuum,where these 13 types are characterized as regular spectra with low S/Ns,splicing problems,suspected galactic emission signals,contamination from city light and un-gregarious type ***,a preliminary analysis of their origins is made from the characteristics of the observational targets,contamination from the sky,and the working status of the *** results would be valuable for improving the overall data quality of large-scale spectral surveys.
Surgical hemorrhage is a common occurrence in surgeries. Accurate segmentation of hemorrhage regions is important for surgical navigation and post-operative assessment. Some segmentation models focused on medical imag...
详细信息
ISBN:
(数字)9798331535087
ISBN:
(纸本)9798331535094
Surgical hemorrhage is a common occurrence in surgeries. Accurate segmentation of hemorrhage regions is important for surgical navigation and post-operative assessment. Some segmentation models focused on medical images, but its performance on hemorrhage data is limited. Besides, there is an essential challenge to annotate a large number of hemorrhage data, and previous segmentation methods struggle with complex hemorrhage characteristics like unclear boundaries and scattered targets. The Segment Anything Model 2 (SAM2) shows an significant zero-shot ability in general image segmentation, and is generalizable to fine-tuned on downstream tasks. However, it often faces limitations in hemorrhage segmentation task that lacking annotations. This paper proposes a fine-tuning approach for SAM2, significantly improving its performance on hemorrhage segmentation tasks with limited data. Our method provides better segmentation performance on few-shot hemorrhage data than SAM and SAM2 based models.
暂无评论