Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more robust to domain shifts. The mainstream Fourier-transform-based domain generalization swaps the Fourier amplitude spectrum while preserving the phase spectrum between the source and the target images. However, it neglects background interference in the amplitude spectrum. To overcome this limitation, we introduce a soft-thresholding function in the Fourier domain. We apply this newly designed algorithm to retinal fundus image segmentation, which is important for diagnosing ocular diseases but the neural network's performance can degrade across different sources due to domain shifts. The proposed technique basically enhances fundus image augmentation by eliminating small values in the Fourier domain and providing better generalization. The innovative nature of the soft thresholding fused with Fourier-transform-based domain generalization improves neural network models' performance by reducing the target images' background interference significantly. Experiments on public data validate our approach's effectiveness over conventional and state-of-the-art methods with superior segmentation metrics.
Deep steganalyzer combined with neural networks has achieved great success in image classification over recent years. However, it suffers from the following persistent challenges: i) Deep steganalyzer is extremely vul...
ISBN:
(纸本)9798350374520;9798350374513
Deep steganalyzer combined with neural networks has achieved great success in image classification over recent years. However, it suffers from the following persistent challenges: i) Deep steganalyzer is extremely vulnerable and has the risk of being attacked via adversarial steganography when performing the image classification tasks;ii) Pre-processing based methods aiming to remove adversarial perturbations from cover images jeopardize the accuracy performance, as the involved steganographic signal will be wiped off as well. In this context, to defend against adversarial attacks by an adversary, we propose an adversarial steganography detection scheme based on the pre-processing and feature migration. In brief, sub-images are sampled to obtain the dimensionality of the extracted features, which are usually used to expand them while reducing the effect brought by adversarial perturbations. In particular, by computing statistical features together with normalizing the features, our approach can improve the classification accuracy of the samples. Our experimental results show that the proposed approach is capable of detecting adversarial steganographic image with an accuracy gain of up to 35.9% over the state-of-the-art methods.
When an abnormal event occurs, the features of the video will gradually change in the temporal sequences. Since the abnormal event is a continuous process, the abnormal features should have a continuous prominence com...
详细信息
When an abnormal event occurs, the features of the video will gradually change in the temporal sequences. Since the abnormal event is a continuous process, the abnormal features should have a continuous prominence compared to the normal features. Incorporating this differential information is critical for effective anomaly detection. Nevertheless, prior multi-instance learning (MIL) methods have overlooked this aspect. Therefore, this study proposes the spatial-temporal feature fusion enhancement (STFFE) learning approach to address this issue. STFFE improves the discriminative power of normal and abnormal features by enhancing temporal information through the fusion of top-k video segment features with their corresponding temporal information features. Furthermore, a temporal information constraint is proposed to increase the concentration of abnormal information and accentuate the saliency of abnormal features. Extensive experimentation demonstrates that our approach surpasses state-of-the-art (SOTA) methods on the Shanghai-Tech and XD-Violence datasets. Moreover, our method achieves competitive results on the UCF-Crime dataset, with an AUC of 84.52%. The results demonstrate the effectiveness of the proposed method.
stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, which forms the bedrock of modern machine learning. In this work, we seek to balance the fact that attenuating step-size...
详细信息
stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, which forms the bedrock of modern machine learning. In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster to a limiting error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters evolving adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex problems. It inherits the exact convergence and more importantly, the optimal error decreasing rate and an overall computation reduction are achieved. Furthermore, we extended the TSA method to the generalized adaptive batching framework, which is a generic methodology modular to any stochastic algorithms pursuing a trade-off between convergence rates and stochastic variance. We evaluate the TSA method on the image classification problem on MNIST and CIFAR-10 datasets compared with standard SGD methods and existing adaptive batch-size methods, to corroborate theoretical findings.
COVID-19 is a modern virus that has spread all over the world and is affecting billions of people. Timely and accurate detection is of great significance to slow down the spread of the virus and treat infected people ...
详细信息
COVID-19 is a modern virus that has spread all over the world and is affecting billions of people. Timely and accurate detection is of great significance to slow down the spread of the virus and treat infected people effectively. Deep learning methods have been shown promising in COVID-19 detection due to their accurate feature extraction ability, which can enhance the classification ability of the detection model with high accuracy. Existing studies mostly focus on information compression and feature extraction, which inevitably leads to original information loss. This paper proposes a novel end-to-end COVID-19 classification model called MSCGRU, which is based on the fusion of multi-faceted global features with local information of original images. Firstly, multi-scale CNN is used to extract multi-scale global features from multiple channels, and these features are fused with local information to obtain comprehensive features. Secondly, GRU is adopted to extract deep abstract features from comprehensive features. The experimental results demonstrate that the proposed MSCGRU greatly strengthens the learning ability and achieves better prediction performance compared with other methods. The accuracy of multi-classification is 98.2%, and the accuracy of binary classification is 100%.
Frequent occurrences of forest fires globally result in significant harm to both human populations and the natural environment. Smoke is the earliest visual signal of wildfire and its efficient detection is crucial. H...
详细信息
Frequent occurrences of forest fires globally result in significant harm to both human populations and the natural environment. Smoke is the earliest visual signal of wildfire and its efficient detection is crucial. However, due to the lack of real-world smoke video data, existing deep learning methods often face difficulties in effective training, and their generalization ability and detection stability in different scenarios are not validated and guaranteed. Since smoke characteristics are strongly correlated with environmental conditions, the weather state information in the smoke data is additionally exploited in this study to benefit the model training and optimization. This is a new approach to explore more useful information within limited smoke data, inspired by human perceptual tendencies. For the efficient modeling of smoke features and to achieve a trade-off between computational complexity and recognition accuracy, we further propose BFBlock for enhanced smoke feature extraction and build a two-stage lightweight video detection framework WISNet. Compared to regular 3DCNN models, the proposed framework reduces the model's parameters by 85% to only 5.6M, and achieves 96.69% accuracy and 00.58% false alarm rate. Extensive experimental results on real-world smoke samples prove the effectiveness of the proposed method.
In the neutral hydrogen(H I)galaxy survey,a significant challenge is to identify and extract the H I galaxy signal from the observational data contaminated by radio frequency interference(RFI).For a drift-scan survey,...
详细信息
In the neutral hydrogen(H I)galaxy survey,a significant challenge is to identify and extract the H I galaxy signal from the observational data contaminated by radio frequency interference(RFI).For a drift-scan survey,or more generally a survey of a spatially continuous region,in the time-ordered spectral data,the H I galaxies and RFI all appear as regions that extend an area in the time-frequency waterfall plot,so the extraction of the H I galaxies and RFI from such data can be regarded as an image segmentation problem,and machine-learning methods can be applied to solve such *** this study,we develop a method to effectively detect and extract signals of H I galaxies based on a Mask R-CNN network combined with the PointRend *** simulating FAST-observed galaxy signals and potential RFI impact,we created a realistic data set for the training and testing of our neural *** compared five different architectures and selected the best-performing *** architecture successfully performs instance segmentation of H I galaxy signals in the RFI-contaminated time-ordered data,achieving a precision of 98.64%and a recall of 93.59%.
Diabetes is often considered a vascular disease due to its impact on blood vessels, it is a complex condition with various metabolic and autoimmune factors involved. One of the long term comorbidities of diabetes incl...
详细信息
Diabetes is often considered a vascular disease due to its impact on blood vessels, it is a complex condition with various metabolic and autoimmune factors involved. One of the long term comorbidities of diabetes includes microvascular complications. The microvascular complications can be analyzed using the Nailfold capillaroscopy, a non-invasive technique that allows for the visualization and analysis of capillaries in the proximal nailfold area. Using advanced video capillaroscopy with high magnification, capillary images can be captured from and processed to analyze their morphology. The capillary images of normal group and diabetic group are acquired from 118 participants using nailfold capillaroscopy and the obtained images are preprocessed using imageprocessing filters. The identification and segmentation of the capillaries are the challenges to be addressed in the processing of the images. Hence segmentation of capillaries is done using morphological operations, thresholding and convolutional neural networks. The performance of the filters and segmentation methods are evaluated using Mean Square Error (MSE), Peak signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Jaccard Index and Sorensen coefficient. By analyzing the morphological features namely the capillary diameter, density, distribution, presence of hemorrhage and the shape of the capillaries from both the groups, the capillary changes associated with diabetic condition were studied. It was found that the non diabetic participants considered in this study has capillary diameter in the range of 8-14 mu m and the capillary density in the range of 10-30 capillaries per mm2 whereas the diabetic participants has capillary diameter greater than 30 mu m and the capillary density is less than 10 capillaries per mm2. In addition to capillary density and diameter, the presence of hemorrhage, the orientation and distribution of the capillaries are also considered to differentiate the diabe
It's widely accepted that human expressions, considering for roughly sixty percent of all daily interactions, are among the most authentic forms of communication. Numerous studies are being conducted to explore th...
详细信息
It's widely accepted that human expressions, considering for roughly sixty percent of all daily interactions, are among the most authentic forms of communication. Numerous studies are being conducted to explore the importance of facial expressions and the development of machine-assisted recognition techniques. Significant progress is being made in facial and expression recognition, largely due to the rapid growth of machine learning and computer vision. A variety of algorithmic approaches and methods exist for detecting and recognizing facial expressions and features. This study investigates various optimization algorithms used with convolutional neural networks for facial expression recognition. The primary focus is on Adam, RMSProp, stochastic gradient descent and AdaMax optimizers. A comprehensive comparison is being made, examining the key aspects of each optimizer, including its advantages and disadvantages. Furthermore, the study also incorporates findings from recent studies that used these optimizers in various applications, highlighting their performance in terms of training time and precision. The aim is to illuminate the process of selecting a suitable optimizer for specific applications, analysing the trade-offs between training speed and higher accuracy levels. Moreover, this study provides a deeper analysis of the role optimizers play in machine learning-based facial expression recognition models. The discussion of the technical challenges posed by these optimizers and future improvements for achieving much more optimal results concludes the study.
Fine-grained image recognition is a challenging task that focuses on identifying images from similar subordinate categories. Recently, methods based on the vision transformer (ViT) have demonstrated remarkable achieve...
详细信息
Fine-grained image recognition is a challenging task that focuses on identifying images from similar subordinate categories. Recently, methods based on the vision transformer (ViT) have demonstrated remarkable achievements in fine-grained image recognition, which inherent multi-head self-attention (MHSA) can effectively capture the discriminative regions in images. However, most of these ViT-based methods ignore the channel relationships of image features, and there are also problems with the inconsistent learning performance of different heads in MHSA and different layers in ViT. To address these issues, an innovative feature-enhanced transformer is proposed, named TS-ViT. TS-ViT includes three key modules: soft channel attention (Soft-CA), multi-head token selection (MHTS), and multi-level feature enhancement (MLFE). The Soft-CA enables the model to concentrate on the relationships among various channels of image features. The MHTS is proposed to address the issue of inconsistent multi-head learning performance. It selects tokens with discriminative region positions based on attention maps to form the multi-level feature. By employing contrastive learning and enhanced feature extraction, the MLFE is proposed to effectively utilize multi-level features while mitigating background noise. Extensive experiments have demonstrated that TS-ViT achieves superior performance compared to popular methods, with average accuracy of 91.8%, 91.2%, 99.5%, and 93.9% on the experimental data sets, respectively. Furthermore, TS-ViT demonstrated outstanding performance in terms of computational complexity and efficiency, with an average parameter count of 93.5M, FLOPs of 73.2G, training time of 7.8 h, and inference time of 4.1 milliseconds.
暂无评论