Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of o...
Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue λmax. A large λmax indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively. Inspired by the discovery, we propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their λmax exceeding a dynamic threshold selectively, concentrating the model on learning common features and ignoring those superfluous ones. Unlike most existing training schemes with a complete BP cycle across all network modules, MAT can significantly save computations by its partially-updating strategy and can further improve performance. Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines.
Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context ...
详细信息
Early identification of lung cancer is vital for improving patient outcomes, since it is a major contributor to cancer-related mortality, often resulting from late-stage diagnosis. This study addresses the challenge o...
详细信息
ISBN:
(数字)9798350391244
ISBN:
(纸本)9798350391251
Early identification of lung cancer is vital for improving patient outcomes, since it is a major contributor to cancer-related mortality, often resulting from late-stage diagnosis. This study addresses the challenge of accurately identifying early-stage lung cancer from MRI scans, where subtle tumors are often overlooked. Traditional diagnostic methods, reliant on manual interpretation and conventional machine learning models, fall short in handling the complexity of MRI data. In order to address these constraints, we suggest using a hybrid deep learning model that combines Convolutional Neural Networks (CNNs) for spatial feature extraction with Transformer networks for contextual analysis. This innovative approach significantly enhances the accuracy of early-stage lung cancer detection. Performance evaluation on extensive MRI datasets demonstrates that the hybrid model achieves an accuracy of 95%, a sensitivity of 93%, and a specificity of 96%, outperforming traditional diagnostic methods. The results highlight the potential of this hybrid model to revolutionize early detection strategies, ultimately improving treatment outcomes and survival rates for lung cancer patients.
Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context ...
Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectively.
Knowledge transfer (KT) has been regarded as an efficient method in evolutionary multitask optimization (EMTO) by utilizing the information of other tasks to promote the optimization of the current task. Most KT metho...
详细信息
Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their ...
详细信息
Deep hashing is an appealing approach for large-scale image retrieval. Most existing supervised deep hashing methods learn hash functions using pairwise or triple image similarities in randomly sampled mini-batches. T...
Deep hashing is an appealing approach for large-scale image retrieval. Most existing supervised deep hashing methods learn hash functions using pairwise or triple image similarities in randomly sampled mini-batches. They suffer from low training efficiency, insufficient coverage of data distribution, and pair imbalance problems. Recently, central similarity quantization (CSQ) attacks the above problems by using “hash centers” as a global similarity metric, which encourages the hash codes of similar images to approach their common hash center and distance themselves from other hash centers. Although achieving SOTA retrieval performance, CSQ falls short of a worst-case guarantee on the minimal distance between its constructed hash centers, i.e. the hash centers can be arbitrarily close. This paper presents an optimization method that finds hash centers with a constraint on the minimal distance between any pair of hash centers, which is non-trivial due to the non-convex nature of the problem. More importantly, we adopt the Gilbert-Varshamov bound from coding theory, which helps us to obtain a large minimal distance while ensuring the empirical feasibility of our optimization approach. With these clearly-separated hash centers, each is assigned to one image class, we propose several effective loss functions to train deep hashing networks. Extensive experiments on three datasets for image retrieval demonstrate that the proposed method achieves superior retrieval performance over the state-of-the-art deep hashing methods.
WSNs function as key components across environmental monitoring systems as well as healthcare facilities and industrial automation networks. WSN performance optimization faces ongoing challenges because the networks m...
详细信息
ISBN:
(数字)9798331501488
ISBN:
(纸本)9798331501495
WSNs function as key components across environmental monitoring systems as well as healthcare facilities and industrial automation networks. WSN performance optimization faces ongoing challenges because the networks must deal with restricted energy supplies in addition to shifting operational environments and restricted data transfer capabilities. The current optimization techniques implement static hierarchical approaches combined with heuristic methods yet these prove unable to adjust in real-time thus leading to deteriorating network conditions and resource inefficiency. The research puts forward Self Attention-based Sparse Graph Convolutional Neural Network (SA-SGCN) as an instant parameter prediction framework to optimize WSN performance through automatic adjustments of critical metrics including energy usage and packet success ratios as well as latency and data transfer speed. Real-time prediction becomes possible through the self-attention mechanisms used in SA- SGCN models to detect distant dependencies in the sensor network in combination with sparse graph convolution that decreases computational requirements. The Crested Porcupine Optimizer (CPO) optimizes the hyperparameters of the SA- SGCN model for better efficiency and accuracy measurement. The experimental evidence shows that using the proposed method results in superior performance compared to current approaches because it reaches a prediction accuracy level of 99.9% for WSN configurations. The research presents a dependable energy-efficient adaptive framework for WSNs which enables high scalability and reliable operation within fast-changing environments.
Semi-supervised learning (SSL) aims to leverage massive unlabeled data when labels are expensive to obtain. Unfortunately, in many real-world applications, the collected unlabeled data will inevitably contain unseen-c...
Semi-supervised learning (SSL) aims to leverage massive unlabeled data when labels are expensive to obtain. Unfortunately, in many real-world applications, the collected unlabeled data will inevitably contain unseen-class outliers not belonging to any of the labeled classes. To deal with the challenging open-set SSL task, the mainstream methods tend to first detect outliers and then filter them out. However, we observe a surprising fact that such approach could result in more severe performance degradation when labels are extremely scarce, as the unreliable outlier detector may wrongly exclude a considerable portion of valuable inliers. To tackle with this issue, we introduce a novel open-set SSL framework, IOMatch, which can jointly utilize inliers and outliers, even when it is difficult to distinguish exactly between them. Specifically, we propose to employ a multi-binary classifier in combination with the standard closed-set classifier for producing unified open-set classification targets, which regard all outliers as a single new class. By adopting these targets as open-set pseudo-labels, we optimize an open-set classifier with all unlabeled samples including both inliers and outliers. Extensive experiments have shown that IOMatch significantly outperforms the baseline methods across different benchmark datasets and different settings despite its remarkable simplicity. Our code and models are available at https://***/nukezil/IOMatch.
This paper introduces an automated grading system for mangoes, enhancing efficiency and accuracy compared to human-based methods. The system uses the Lion Assisted Firefly Algorithm (LA-FF) to extract the best feature...
详细信息
ISBN:
(数字)9798331529833
ISBN:
(纸本)9798331529840
This paper introduces an automated grading system for mangoes, enhancing efficiency and accuracy compared to human-based methods. The system uses the Lion Assisted Firefly Algorithm (LA-FF) to extract the best features from multiple highlights, enhancing grading efficiency and accuracy. The LA-FF algorithm is then used to fine-tune the convolutional layers of a deep CNN based on the specific requirements of mango grading. The system integrates the latest algorithms, automation, and adaptation to create an even more effective and precise grading system suitable for rural agricultural contexts. The LA-FF algorithm is used to extract the best features from multiple highlights, resulting in a more accurate and efficient grading process.
暂无评论