检索结果-内蒙古大学图书馆

Cloud Computing, Big Data Applications and Software Engineering (CBASE), International Conference on

作者： Senyan Li Ye Dong Xizhong Qin Xinjiang Signal Detection and Processing Key Laboratory Xinjiang University Urumqi China

ISBN: (数字)9798331506582

ISBN: (纸本)9798331506599

Currently, meta-learning is the mainstream approach to solving the problem of scarce data in few-shot text classification. Still, challenges remain, such as embedding vectors not being compact enough, suboptimal meta-task sampling strategies, and the rigidity of convolution operations in traditional Text CNNs. To address this, we propose the Attention Mechanism-based Improved Meta-learning Contrast Network (AMIMC), which enhances intra-class aggregation, increases inter-class separation and improves embedding quality. Additionally, Double Dynamic Similar Sampling (DDSS) generates more challenging meta-tasks, and the attention mechanism enhances the flexibility of Text CNNs, significantly boosting accuracy on five few-shot text classification datasets.

关键词： Metalearning Attention mechanisms Reviews Text categorization Semantics Vectors Robustness Rigidity Noise measurement Software engineering

来源：评论

学校读者我要写书评

暂无评论

Towards tiny object tracking for low-illumination wide-field

Towards tiny object tracking for low-illumination wide-field

引用

2022 International Conference on Computer Application and Information Security, ICCAIS 2022

作者： Xie, Zhaodong Jia, Zhenhong Jiang, Wangxi College of Information Science and Engineering Xinjiang University Xinjiang Urumqi830046 China Key Laboratory of Signal Detection and Processing Xinjiang University Xinjiang Urumqi830046 China

ISBN: (纸本)9781510663459

Night is an inevitable scene for surveillance video. Due to the high image resolution, complex background, uneven illumination, and similarity between the target and the background of hawk-eye surveillance video, it is difficult for previous trackers to apply the tracking of a tiny object in such scenes. In this regard, this paper proposes to combine an online automatically and adaptively learning spatio-temporal regularized tracking algorithm with an efficient and effective low-light image enhancement algorithm to improve tracker performance. We constructed a new benchmark that includes 41 night surveillance sequences captured by Hawk-Eye cameras at night. Exhausted experiments have been conducted on this dataset, and the results show that by combining the two methods, the original algorithm can obtain better results in this dataset, and can meet the real-time object tracking, which contributes to the application of tiny object tracking in eagle-eye surveillance video at night. © 2023 SPIE.

关键词： Image resolution

来源：评论

学校读者我要写书评

暂无评论

Research on Speech Synthesis Based on Prosodic Controllability 5

Research on Speech Synthesis Based on Prosodic Controllabili...

引用

5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2023

作者： Gao, Wenyu Hamdulla, Askar Yang, Xipeng School of Information Science and Engineering Xinjiang University of China Xinjiang Key Laboratory of Signal Detection and Processing Xinjiang Urumqi China Mobvoi AI Lab Suzhou China

ISBN: (纸本)9798350337525

In today's highly interactive human-computer world, speech synthesis is widely used in many scenarios, and the requirements for rhyme effects in speech synthesis technology are increasing, so rhyme-controllable models have become a research hotspot. Most current rhyme controllable models are achieved by using a separate neural network to generate reference features, but this approach requires the training of more complex neural network models and the availability of reference audio to achieve display rhyme control. This paper proposes a rhyme-controllable solution based on an end-to-end acoustic model to address the problem of models being unable to precisely control the tone at the word level. The proposed model includes a tone-controllable module, which obtains duration information through the MFA alignment tool and adjusts the tone of the words by using word-level pitch control values and duration information. The acoustic model in this paper is improved by introducing pitch control during the generation of acoustic features and generating more robust audio by combining it with the decoder. In addition, to adjust the overall tone of the audio, a fixed coefficient is multiplied to the pitch values of all frames. Furthermore, this paper also proposes a 48KHz ultra-high-definition audio model by increasing the spectral parameter dimensions and upsampling by a factor. © 2023 IEEE.

关键词： Speech synthesis

来源：评论

学校读者我要写书评

暂无评论

Improving Speaker Verification Back-End with Graph Neural Networks

引用

Journal of Shanghai Jiaotong University (Science) 2025年 1-9页

作者： Chen, Jinfeng Fang, Zhihua He, Liang School of Computer Science and Technology Xinjiang University Urumqi830017 China Xinjiang Key Laboratory of Signal Detection and Processing Urumqi830017 China School of Intelligence Science and Technology Xinjiang University Urumqi830017 China Department of Electronic Engineering Beijing National Research Center for Information Science and Technology Tsinghua University Beijing100084 China

Currently, research on speaker verification tasks is primarily concentrated on enhancing deep speaker models to extract high-quality speaker embeddings. Nevertheless, this speaker embeddings can be regarded as potential graph structures. Therefore, this paper introduces a graph neural network-based speaker verification back-end approach, treating the speaker embeddings derived from the front-end as graph structures and leveraging graph neural networks to uncover the relationships among embeddings, thereby achieving superior-quality speaker embeddings. In addition, we propose a group updating method to solve the problem of memory overflow when the number of nodes is excessive. Extensive experiments and ablation studies were carried out on the VoxCeleb dataset, and the experimental outcomes validate the effectiveness of our proposed graph neural network speaker back-end in significantly boosting the performance of speaker verification systems. (Figure presented.) © Shanghai Jiao Tong University 2025.

关键词： Graph embeddings

来源：评论

学校读者我要写书评

暂无评论

Dual-Path Spectrogram Refinement Network for Robust Speaker Verification

引用

Journal of Shanghai Jiaotong University (Science) 2025年 1-9页

作者： Wang, Zonghui Fang, Zhihua He, Liang School of Computer Science and Technology Xinjiang University Urumqi830017 China School of Intelligence Science and Technology Xinjiang University Urumqi830017 China Xinjiang Key Laboratory of Signal Detection and Processing Urumqi830017 China Department of Electronic Engineering Beijing National Research Center for Information Science and Technology Tsinghua University Beijing100084 China

The accuracy and reliability of automatic speaker verification (ASV) face significant challenges in noisy environments. In recent years, joint training of speech enhancement front-end and ASV back-end has been widely applied to improve the robustness of ASV systems. Traditional joint training directly uses the enhanced speech features as the input of the back-end. However, the diversity of noise types and noise intensities will excessively suppress the enhanced features, resulting in speech distortion or residual noise. To alleviate this problem, we propose a dual-path spectrogram refinement network that enables the enhanced features to learn supplementary information from the separated noise features. In addition, we incorporate coordinate attention into the overall joint architecture to capture more comprehensive frequency and temporal information of the speaker from different spatial positions. We conduct extensive experiments on the VoxCeleb1 test set, the out-of-domain noise test set, and the VOiCES corpus. The experimental results demonstrate that our proposed method significantly improves the accuracy and robustness of speaker verification systems in both clean and noisy environments. (Figure presented.) © Shanghai Jiao Tong University 2025.

关键词： Spectrographs

来源：评论

学校读者我要写书评

暂无评论

Multi-Task Model Based on Vision Task Level for Saliency Object detection in Foggy Conditions

Multi-Task Model Based on Vision Task Level for Saliency Obj...

引用

IEEE International Conference on Image processing

作者： Yusen Zhu Gang Zhou Jingxu Ren Jiakun Tian Zhenhong Jia Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi China

In recent years, saliency object detection methods based on convolutional neural networks have been widely studied, and have achieved excellent performance in clear images. However, due to the low visibility of images in foggy conditions, the existing saliency object detection methods will be seriously affected or even ineffective. To address this problem, we introduce an end-to-end multi-task learning network. We design two subetworks for depth estimation and image restoration as auxiliary tasks to improve saliency object detection in foggy conditions. According to different characteristics of vision tasks, different shared layers are assigned to improve the performance of saliency object detection. Experiments show that our method has been greatly improved on both synthetic foggy datasets and real-to-world foggy datasets, outperforming many state-to-the-art saliency object detection methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Sound source localization and detection based on parameter transfer learning 24

Sound source localization and detection based on parameter t...

引用

24th International Congress on Acoustics, ICA 2022

作者： Sun, Xinghao Ma, Mengzhen Hu, Ying Department of Information Science and Engineering Xinjiang University Urumqi830000 China Key Laboratory of Signal Detection and Processing in Xinjiang China

Sound source localization and detection is a joint task of identifying the presence of individual sound events and locating the sound sources in space. In order to promote the combination of two different tasks, we propose a sound source localization and detection method based on parameter transfer learning. Firstly, to solve the problem of uneven distribution of time-frequency dimension features, we propose a time-frequency dimension feature extractor, which extracts the features of time dimension and frequency dimension respectively and superimposes them together. Secondly, in order to promote feature interaction between different resolutions, we propose a refined feature learning module that can fuse features with different resolutions. Then, in order to reduce the training and reasoning time, we explore a temporal context representation module for learning temporal context information. The time context indicates that the module has better global feature capture capability and better parallelism capability compared with the recurrent neural network and gated recurrent unit. Finally, the rationality of our proposed model was verified by ablation experiments, and the effectiveness of our proposed model was verified by comparison with the best methods. © 2022 Proceedings of the International Congress on Acoustics. All rights reserved.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

A Lightweight Speech Emotion Recognition Model with Bias-Focal Loss

A Lightweight Speech Emotion Recognition Model with Bias-Foc...

引用

IEEE International Conference on Information Communication and signal processing (ICICSP)

作者： Shijing Hou Huamin Yang Ying Hu Key Laboratory of signal detection and processing Xinjiang University Urumqi China

A major challenge in speech emotion recognition (SER) is how to build a lightweight model with limited training data for applying on the devices with limited in resources. In this paper, we propose a lightweight speech emotion recognition (SER) model with Bias-Focal loss function, where a Dynamic Separable Convolution (DySC) block is designed to extract more fine-grained emotional features and makes the model smaller. We propose a Bias-Focal loss to address the issue of the inconsistent of training samples, while focusing on data points with high feature diversity during the training phase. Experimental results show that our proposed lightweight model is the smallest comparing with other methods, while the number of parameters in our proposed model is 0.85M. Meanwhile, our proposed model achieves the best performance that the score of Unweighted Accuracy (UA) is 75.01 %, and that of Weighted Accuracy (WA) and F1-score are 74.05% and 74.29 % on the IEMOCAP (scripted+improvised)dataset.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Anomalous Sound detection Using Time-Frequency Feature and Mixbatch

引用

Journal of Shanghai Jiaotong University (Science) 2025年 1-8页

作者： Huang, Shun Zhang, Yunxiang Fang, Zhihua Tang, Minrui Xu, Ruifeng He, Liang School of Computer Science and Technology Xinjiang University Urumqi830017 China Xinjiang Key Laboratory of Signal Detection and Processing Urumqi830017 China School of Intelligence Science and Technology Xinjiang University Urumqi830017 China Department of Electronic Engineering Beijing National Research Center for Information Science and Technology Tsinghua University Beijing100084 China

The sound emitted by machines under abnormal working conditions exhibits various frequency patterns. Currently, the most advanced anomalous sound detection (ASD) approach is to apply a multi-head self-attention mechanism to the Log-Mel spectrogram for automatic frequency pattern analysis. However, the Log-Mel spectrogram may filter out high-frequency components of abnormal sounds;thus the use of self-attention mechanisms on Log-Mel spectrogram seems to have certain limitations. In this paper, we construct a simple convolutional neural network to extract comprehensive frequency features from raw audio to complement spectral-temporal information fusion. The parameters of this neural network are continuously updated during the training process to extract better frequency features for downstream classification neural networks in ASD. Additionally, a method for data augmentation in the batch dimension was developed to help the classification model learn both types of comprehensive features simultaneously. The proposed method achieved an AUC of 94.13% and a pAUC of 89.09% on the DCASE 2020 Challenge Task 2 dataset. Even when using only the features proposed in this paper, an AUC of 84.62% was achieved. (Figure presented.). © Shanghai Jiao Tong University 2025.

关键词： Spectrographs

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Domain Adaptive Learning for Image Desnowing with Real-World Data

Unsupervised Domain Adaptive Learning for Image Desnowing wi...

引用

IEEE International Conference on Image processing

作者： Jingxu Ren Gang Zhou Yusen Zhu Yangxin Liu Juan Chen Zhenhong Jia Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi China

Snow images usually contain snow grains, snow streaks, and mist, which greatly affect the visibility of images. Currently, supervised learning with synthetic data often faces limitations when it comes to handling real-world snow images. To address this crucial issue, this work proposes an unsupervised domain adaptation image snow removal framework. The framework improves the performance on real-world images by learning a domain classifier in adversarial training manner. Additionally, considering the diversity of snowflake shapes and sizes in real-world snow images, we design a multiple-kernel dilated convolution module. Extensive experiments on three representative datasets have validated that our model can achieve better results than existing desnowing methods. More importantly, experiments on real datasets show that the proposed method obtains state-of-the-art performance in real-world desnowing.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：