Few-shot learning (FSL) aims to generalize from few labeled samples. Recently, metric-based methods have achieved surprising classification performance on many FSL benchmarks. However, those methods ignore the impact ...
详细信息
ISBN:
(纸本)9781665405409
Few-shot learning (FSL) aims to generalize from few labeled samples. Recently, metric-based methods have achieved surprising classification performance on many FSL benchmarks. However, those methods ignore the impact of noise, making the few-shot learning still tricky. In this work, we identify that noise suppression is important to improve the performance of FSL algorithms. Hence, we proposed a novel attention-based contrastive learning model with discrete cosine transform input (ACL-DCT), which can suppress the noise in input images, image labels, and learned features, respectively. ACL-IX:T takes the transformed frequency domain representations by IX:T as input and removes the high-frequency part to suppress the input noise. Besides, an attention-based alignment of the feature maps and a supervised contrastive loss are used to mitigate the feature and label noise. We evaluate our ACL-DCT by comparing previous methods on two widely used datasets for few-shot classification (i.e., miniimageNet and CUB). The results indicate that our proposed method outperforms the state-of-the-art methods.
Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem....
详细信息
Gallstones and kidney stones are two types of stone diseases that are very afflictive and often fatal. Biological studies in this field have primarily aimed to discover what causes gallstones formation in the urinary ...
详细信息
Time-frequency representations are frequently used in signalprocessingapplications. This paper presents a framework for recursive implementation of biorthogonal nonstationary discrete Gabor transforms. These transfo...
详细信息
ISBN:
(纸本)9781665483605
Time-frequency representations are frequently used in signalprocessingapplications. This paper presents a framework for recursive implementation of biorthogonal nonstationary discrete Gabor transforms. These transforms can achieve a non-uniform frequency resolution unlike the well known Fourier transform. Typically they are realized with finite impulse response filters. This paper shows an observer-based recursive implementation of these transforms based on Hostetter's approach. In addition it reviews the construction of generalized Gabor frames and the conditions of their invertibility in detail. The design of the observer's parameters are discussed and multiple examples are given to illustrate the properties and capabilities of both the design process and the observer.
—This article presents an exhaustive exploration of convolutional vectors, a cornerstone concept in deep learning. Initially, it introduces the fundamental principles of convolutional vectors, and then delves into th...
详细信息
The aim of signalprocessing with transformation techniques, is to precisely identify frequencies and extract minute signal variations. Previous studies have employed various time-frequency distributions (TFD) to extr...
详细信息
Aiming at recognizing the samples from novel categories with few reference samples, few-shot learning (FSL) is a challenging problem. We found that the existing works often build their few-shot model based on the imag...
详细信息
ISBN:
(纸本)9781665405409
Aiming at recognizing the samples from novel categories with few reference samples, few-shot learning (FSL) is a challenging problem. We found that the existing works often build their few-shot model based on the image-level feature by mixing all local-level features, which leads to the discriminative location bias and information loss in local details. To tackle the problem, this paper returns the perspective to the local-level feature and proposes a series of local-level strategies. Specifically, we present (a) a local-agnostic training strategy to avoid the discriminative location bias between the base and novel categories, (b) a novel local-level similarity measure to capture the accurate comparison between local-level features, and (c) a local-level knowledge transfer that can synthesize different knowledge transfers from the base category according to different location features. Extensive experiments justify that our proposed local-level strategies can significantly boost the performance and achieve 2.8%-7.2% improvements over the baseline across different benchmark datasets, which also achieves the state-of-the-art accuracy.
Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in social life, such as intelligent security, autonomous driving, and so on. However, the perfor...
详细信息
To create a single image with the most information possible, two photographs of same model are combined through the process of image fusion. Many image-processingapplications, including satellite imaging, remote sens...
详细信息
Weakly Supervised Object Detection (WSOD) aims to train a detector to specify the interesting targets in an image by only using image-level labels. An important trend of current WSOD methods is to integrate object det...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Weakly Supervised Object Detection (WSOD) aims to train a detector to specify the interesting targets in an image by only using image-level labels. An important trend of current WSOD methods is to integrate object detection with Weakly Supervised Semantic Segmentation (WSSS), so that more discriminative regions can be obtained and more accurate detection can be achieved. However, due to the unreliable segmentation supervision generated by WSOD, their performance is still very limited. To address this problem, in this paper, we propose a novel end-to-end framework termed Class activation map Guided Detection Network (CGDN), where the detection process is guided by Class Activation Map (CAM) rather than the segmentation results. The proposed CGDN is composed of a detection branch and a CAM refinement branch, where the CAM refinement branch critically refines the CAMs generated by the detection branch, and then the refined CAMs are deployed to provide more reliable foreground cues for the detection branch in turn. Therefore, the two branches interact which leads to progressively improved detection and CAM outputs. Extensive experiments on PASCAL VOC 2007 and 2012 datasets verify the effectiveness of our proposed network.
暂无评论