Currently, meta-learning is the mainstream approach to solving the problem of scarce data in few-shot text classification. Still, challenges remain, such as embedding vectors not being compact enough, suboptimal meta-...
详细信息
ISBN:
(数字)9798331506582
ISBN:
(纸本)9798331506599
Currently, meta-learning is the mainstream approach to solving the problem of scarce data in few-shot text classification. Still, challenges remain, such as embedding vectors not being compact enough, suboptimal meta-task sampling strategies, and the rigidity of convolution operations in traditional Text CNNs. To address this, we propose the Attention Mechanism-based Improved Meta-learning Contrast Network (AMIMC), which enhances intra-class aggregation, increases inter-class separation and improves embedding quality. Additionally, Double Dynamic Similar Sampling (DDSS) generates more challenging meta-tasks, and the attention mechanism enhances the flexibility of Text CNNs, significantly boosting accuracy on five few-shot text classification datasets.
Night is an inevitable scene for surveillance video. Due to the high image resolution, complex background, uneven illumination, and similarity between the target and the background of hawk-eye surveillance video, it i...
详细信息
In today's highly interactive human-computer world, speech synthesis is widely used in many scenarios, and the requirements for rhyme effects in speech synthesis technology are increasing, so rhyme-controllable mo...
详细信息
Currently, research on speaker verification tasks is primarily concentrated on enhancing deep speaker models to extract high-quality speaker embeddings. Nevertheless, this speaker embeddings can be regarded as potenti...
详细信息
The accuracy and reliability of automatic speaker verification (ASV) face significant challenges in noisy environments. In recent years, joint training of speech enhancement front-end and ASV back-end has been widely ...
详细信息
In recent years, saliency object detection methods based on convolutional neural networks have been widely studied, and have achieved excellent performance in clear images. However, due to the low visibility of images...
In recent years, saliency object detection methods based on convolutional neural networks have been widely studied, and have achieved excellent performance in clear images. However, due to the low visibility of images in foggy conditions, the existing saliency object detection methods will be seriously affected or even ineffective. To address this problem, we introduce an end-to-end multi-task learning network. We design two subetworks for depth estimation and image restoration as auxiliary tasks to improve saliency object detection in foggy conditions. According to different characteristics of vision tasks, different shared layers are assigned to improve the performance of saliency object detection. Experiments show that our method has been greatly improved on both synthetic foggy datasets and real-to-world foggy datasets, outperforming many state-to-the-art saliency object detection methods.
Sound source localization and detection is a joint task of identifying the presence of individual sound events and locating the sound sources in space. In order to promote the combination of two different tasks, we pr...
详细信息
A major challenge in speech emotion recognition (SER) is how to build a lightweight model with limited training data for applying on the devices with limited in resources. In this paper, we propose a lightweight speec...
A major challenge in speech emotion recognition (SER) is how to build a lightweight model with limited training data for applying on the devices with limited in resources. In this paper, we propose a lightweight speech emotion recognition (SER) model with Bias-Focal loss function, where a Dynamic Separable Convolution (DySC) block is designed to extract more fine-grained emotional features and makes the model smaller. We propose a Bias-Focal loss to address the issue of the inconsistent of training samples, while focusing on data points with high feature diversity during the training phase. Experimental results show that our proposed lightweight model is the smallest comparing with other methods, while the number of parameters in our proposed model is 0.85M. Meanwhile, our proposed model achieves the best performance that the score of Unweighted Accuracy (UA) is 75.01 %, and that of Weighted Accuracy (WA) and F1-score are 74.05% and 74.29 % on the IEMOCAP (scripted+improvised)dataset.
The sound emitted by machines under abnormal working conditions exhibits various frequency patterns. Currently, the most advanced anomalous sound detection (ASD) approach is to apply a multi-head self-attention mechan...
详细信息
Snow images usually contain snow grains, snow streaks, and mist, which greatly affect the visibility of images. Currently, supervised learning with synthetic data often faces limitations when it comes to handling real...
Snow images usually contain snow grains, snow streaks, and mist, which greatly affect the visibility of images. Currently, supervised learning with synthetic data often faces limitations when it comes to handling real-world snow images. To address this crucial issue, this work proposes an unsupervised domain adaptation image snow removal framework. The framework improves the performance on real-world images by learning a domain classifier in adversarial training manner. Additionally, considering the diversity of snowflake shapes and sizes in real-world snow images, we design a multiple-kernel dilated convolution module. Extensive experiments on three representative datasets have validated that our model can achieve better results than existing desnowing methods. More importantly, experiments on real datasets show that the proposed method obtains state-of-the-art performance in real-world desnowing.
暂无评论