The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main mem...
The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.
Unlike Emotion Cause Extraction (ECE) task which consists of pre-annotate emotions and passage, emotion-cause pair extraction (ECPE) aims at extracting potential emotions and corresponding causes in the document witho...
详细信息
Accurate and efficient airway segmentation is essential for evaluating pulmonary diseases, aiding diagnosis, reducing the preoperative burden of airway identification, and minimizing patient discomfort during prolonge...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
Accurate and efficient airway segmentation is essential for evaluating pulmonary diseases, aiding diagnosis, reducing the preoperative burden of airway identification, and minimizing patient discomfort during prolonged surgeries. However, current pulmonary airway reconstruction techniques are hindered by two major challenges: difficulty in accurately reconstructing fine airway branches due to the tendency to overlook small targets, and insufficient structural connectivity leading to frequent branch discontinuities within the airway tree. These limitations directly affect the clinical applicability of reconstructed airways. To overcome these challenges, a novel 3D pulmonary airway segmentation multi-task framework is proposed, designed to enhance the performance of existing backbone models. This approach integrates Anatomical Prior-Based Multi-Task Learning (AP-MTL) through the use of Gaussian-constructed connectivity-enhanced isosurfaces, significantly improving the network’s ability to maintain airway continuity. Additionally, a Class-Balanced CT Density Distribution Reconstruction mechanism (DDR-CB) is introduced, further refining the model’s capability to detect and segment fine airway branches. As a result of these enhancements, the model demonstrates a 11.5% average improvement in segmentation accuracy and connectivity compared to the baseline. The source code is publicly accessible at https://***/inexhaustible419/APMTLAirwaySegment.
Foundation models are in the process of becoming the dominant deep learning technology. Pretraining a foundation model is always time-consuming due to the large scale of both the model parameter and training dataset. ...
详细信息
Graph neural networks (GNNs) have been becoming important tools for processing structured graph data and successfully applied to multiple graph-based application scenarios. The existing GNN systems adopt sample-based ...
详细信息
Graph neural networks (GNNs) have been becoming important tools for processing structured graph data and successfully applied to multiple graph-based application scenarios. The existing GNN systems adopt sample-based training on large-scale graphs over multiple GPUs. Although they support large-scale graph training, large data loading overhead of transferring vertex features between CPUs and GPUs is still a bottleneck. In this work, we propose SCGraph, a method that supports GPU high-speed feature caching. SCGraph classifies the graph vertices sorted by out-degrees. For high out-degree vertices, SCGraph sets grading caches via different GPUs to increase the overall cache content through NVLink high-speed data transmission between them. For low out-degree vertices, SCGraph expands training vertices' neighborhood in advance to regenerate cache. We evaluate SCGraph against two state-of-the-art industrial GNN frameworks, i.e., DGL and PaGraph on various benchmarks. Experimental results show that SCGraph improves the cache hit rate over GPUs up to 23.6%, and achieves up to 1.71x performance speedup over the state-of-the-art baselines while the convergence almost constant.
The great success of Deep Neural Networks (DNNs) has inspired the algorithmic development of DNN-based Fixed-Point (DNN-FP) for computer vision tasks. DNN-FP methods, trained by Back-Propagation Through Time or comput...
详细信息
As the application scenarios of convolutional neural network (CNN) become more and more complex, the general CNN accelerator based on matrix multiplication has become a new research focus. The existing mapping methods...
详细信息
With the exponential growth of biomedical knowledge in unstructured text repositories such as PubMed, it is imminent to establish a knowledge graph-style, efficient searchable and targeted database that can support th...
详细信息
ISBN:
(纸本)9798350337488
With the exponential growth of biomedical knowledge in unstructured text repositories such as PubMed, it is imminent to establish a knowledge graph-style, efficient searchable and targeted database that can support the need of information retrieval from researchers and clinicians. To mine knowledge from graph databases, most previous methods view a triple in a graph (see Fig. 1) as the basic processing unit and embed the triplet element (i.e. drugs/chemicals, proteins/genes and their interaction) as separated embedding matrices, which cannot capture the semantic correlation among triple elements. To remedy the loss of semantic correlation caused by disjoint embeddings, we propose a novel approach to learn triple embeddings by combining entities and interactions into a unified representation. Furthermore, traditional methods usually learn triple embeddings from scratch, which cannot take advantage of the rich domain knowledge embedded in pre-trained models, and is also another significant reason for the fact that they cannot distinguish the differences implied by the same entity in the multi-interaction triples. In this paper, we propose a novel fine-tuning based approach to learn better triple embeddings by creating weakly supervised signals from pre-trained knowledge graph embeddings. The method automatically samples triples from knowledge graphs and estimates their pairwise similarity from pre-trained embedding models. The triples are then fed pairwise into a Siamese-like neural architecture, where the triple representation is fine-tuned in the manner bootstrapped by triple similarity scores. Finally, we demonstrate that triple embeddings learned with our method can be readily applied to several downstream applications (e.g. triple classification and triple clustering). We evaluated the proposed method on two open-source drug-protein knowledge graphs constructed from PubMed abstracts, as provided by BioCreative. Our method achieves consistent improvement in both t
With the development of the Internet, the volume of information is expanding rapidly, and the complex information makes it particularly important to extract information quickly and intelligently. Event extraction algo...
详细信息
General Matrix Multiplication (GEMM) has a wide range of applications in scientific simulation and artificial intelligence. Although traditional libraries can achieve high performance on large regular-shaped GEMMs, th...
详细信息
暂无评论