Despite the remarkable generation capabilities of Diffusion Models (DMs), conducting training and inference remains computationally expensive. Previous works have been devoted to accelerating diffusion sampling, but a...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Despite the remarkable generation capabilities of Diffusion Models (DMs), conducting training and inference remains computationally expensive. Previous works have been devoted to accelerating diffusion sampling, but achieving data-efficient diffusion training has often been overlooked. In this work, we investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training, where data features are encoded by a surrogate model, and a score criterion is then applied to select the coreset. To further improve the generation performance, we employ a class-wise reweighting approach, which derives class weights through distributionally robust optimization (DRO) over a pre-trained reference DM. For a pixel-wise DM (DDPM) on CIFAR-10, experiments demonstrate the superiority of our methodology over existing approaches and its effectiveness in image synthesis comparable to that of the original full-data model while achieving the speed-up between 2.34× and 8.32×. Additionally, our method could be generalized to latent DMs (LDMs), e.g., Masked Diffusion Transformer (MDT) and Stable Diffusion (SD), and achieves competitive generation capability on ImageNet.
Banana farms are under risk from a variety of diseases, including Black Sigatoka and Panama sickness, as well as fertilizer shortages. Accurate and prompt identification of these issues is necessary to halt financial ...
详细信息
Spelling mistakes in written communication among Arabic speakers have become more common, especially with the rise of social media. Correcting these errors requires effective natural language processing (NLP) tools. T...
详细信息
ISBN:
(数字)9798331523411
ISBN:
(纸本)9798331523428
Spelling mistakes in written communication among Arabic speakers have become more common, especially with the rise of social media. Correcting these errors requires effective natural language processing (NLP) tools. This paper introduces a ByT5-based approach to address two types of spelling mistakes: directed and general. ByT5, a token-free, pre-trained transformer model, processes UTF-8 text as raw bytes, which minimizes preprocessing and enhances robustness. We perform experiments with varying error injection rates to identify the optimal rate for correction. Our results are evaluated on two test sets: Test200 and TSMTS. The findings demonstrate ByT5's strong performance in spelling error correction, reducing the character error rate (CER) from 5% to 1.37% on the Test200 set and from 5% to 1.77% on the TSMTS set. These test sets include real Arabic sentences containing actual spelling mistakes. Overall, our results show that this approach offers a promising solution for spelling error correction and contributes to generating effective synthetic datasets for training large language models.
作者:
R. RenugadeviAssociate Professor
Department of Computer Science and Engineering Saveetha Engineering College Saveetha Nagar Thandalam Chennai 602105 India
Based on CT and chest X-ray images, a new automatic classification method for lung anomaly identification is introduced. The issue of an unbalanced and limited available dataset is addressed by data augmentation. Vari...
详细信息
Based on CT and chest X-ray images, a new automatic classification method for lung anomaly identification is introduced. The issue of an unbalanced and limited available dataset is addressed by data augmentation. Various preprocessing methods are implemented to eliminate noise. Original CCT model serves as a foundation of this model and it is undergoing an ablation study with a 32 x 32 image CT scan dataset for best configuration determination. The model is then trained on an X-ray dataset for testing its performance in different modalities and performance is compared with six pre-trained models. Classic models showed mediocre outcomes with test accuracies between 40% and 78% for CT and 50% to 75% for X-ray with a large number of training cycles. On the other hand, the proposed model showed outstanding performance with test accuracies of 99.70% for CT and 95.30% for X-ray. By progressively shrinking the dataset of training images, the robustness of the model is assured, demonstrating that the model will still be able to perform well despite the reduction in data. The decisions of the model are explained through the explainable AI method Grad-CAM, which creates color visualizations to enable medical practitioners to make swift and certain judgments. This paper solves issues with training time and computational complexity by integrating deep learning methods with image preprocessing to identify lung abnormalities.
Sustainable technology and community participation need to extract insights from high-dimensional datasets in an era of rapid data collection across several industries. Visualizing multivariate data massive datasets i...
详细信息
ISBN:
(纸本)9789819780952
Sustainable technology and community participation need to extract insights from high-dimensional datasets in an era of rapid data collection across several industries. Visualizing multivariate data massive datasets is challenging yet essential for ecosystem sustainability. Multivariate data visualization helps adaptive and sustainable practices by revealing intricate relationships, patterns, and trends that lower-dimensional perspectives overlook. These techniques are popular in many domains because they provide unique insights into complex information, supporting sustainable and flexible solutions. Researchers and data analysts may quickly study multiple variable correlations for sustainable technology development and community participation using these visualization methodologies. Characters and ecological sustainability are discussed below. Researchers often use scatter plot matrices to show bivariate relationships between community engagement measures. Plotting several scatter plots in a matrix helps researchers uncover data patterns and trends, boosting sustainability. Plotting many variables on parallel axes and connecting data points with lines encourages community engagement. These lines’ patterns and intersections can help researchers understand dynamic interactions inside and across communities, promoting sustainability. Academics use scatter plot matrices and parallel coordinates to see complex technological development data links and patterns. These approaches operate in many domains, offering academics a holistic view of ecosystem interconnection and emphasizing the necessity for data-driven adaptation. Andrews Plot presents data using the Fourier series, concentrating on how leading terms generate adaptable and sustainable functions. This study also recommends using Multidimensional Scaling [MDS] and Glyph Plots to visualize high-dimensional data by imposing order on data variances to identify patterns, outliers, and interrelationships in complex info
To meet the stringent demanding low latency and high throughput of cloud datacenter applications, recent receiver-driven transport protocols transmit only one packet once receiving each credit packet from the receiver...
详细信息
Effective management of construction project portfolios demands informed decisions driven by data and mathematical models, aiming to enhance decision-making and address complex decision problems. This article introduc...
详细信息
The increasing demand for dynamic, efficient, and resilient network management in 5G systems has driven advancements in AI-enhanced architectures. This research proposes an AI-Augmented Hybrid 5G Model that integrates...
详细信息
The FAPL-DM-BC solution is a new FL-based privacy, security, and scalability solution for the Internet of Vehicles (IoV). It leverages Federated Adaptive Privacy-Aware Learning (FAPL) and Dynamic Masking (DM) to learn...
详细信息
K-means clustering is a fundamental data mining technique. It heavily relies on parameter optimization (number of clusters, initial centers, and distance measures) for accurate and meaningful results. This study addre...
详细信息
暂无评论