检索结果-内蒙古大学图书馆

Improving ECAPA-TDNN Performance with Coordinate Attention

Journal of Shanghai Jiaotong University (Science) 2024年 1-7页

作者： Liu, Shuanghong Song, Zhida He, Liang School of Computer Science and Technology Xinjiang University Urumqi830017 China Xinjiang Key Laboratory of Signal Detection and Processing Urumqi830017 China Department of Electronic Engineering and Beijing National Research Center for Information Science and Technology Tsinghua University Beijing100084 China

The current mainstream networks, such as squeeze and excitation residual neural network (SE-ResNet) and emphasized channel attention, propagation and aggregation based time delay neural network (ECAPA-TDNN), enhance the capability of speaker embedding extractors to extract more discriminative speaker embeddings by incorporating squeeze and excitation (SE) attention within the convolutional blocks. However, the SE attention focuses solely on encoding inter-channel information, overlooking the importance of spatial positional information and time-frequency information, which are crucial for the model’s performance. In this paper, we first experimentally compare the effectiveness of several mainstream attention mechanisms in the computer vision domain for the ECAPA-TDNN model. Next, we focus on the substantial improvements that coordinate attention (CA) brings to the ECAPA-TDNN model. The introduction of CA can help the model embed time-frequency information into the channel representation. Even without using AS-Norm, our proposed model achieves relative reductions of about 5.3% equal error rate (EER) and 5.5% minimum detection cost function (minDCF) on both the Voxceleb-O and Voxceleb-H test sets compared to the ECAPA-TDNN baseline model. In addition, the EER is relatively reduced by 9.46% on the CN-Celeb1 test set. This result strongly demonstrates that the CA module can effectively improve the generalization ability of the ECAPA-TDNN model. © Shanghai Jiao Tong University 2024.

关键词： Neural network models

来源：评论

学校读者我要写书评

暂无评论

Research on Shaking Video Image Stabilization Algorithm for Moving Object detection

Research on Shaking Video Image Stabilization Algorithm for ...

引用

Algorithm, Image processing and Machine Vision (AIPMV), International Conference on

作者： Chuangchuang Wang Zhenhong Jia Xiaohui Huang Sensen Song Jiajia Wang Gang Zhou Fei Shi Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi China

ISBN: (数字)9798350390254

ISBN: (纸本)9798350390261

Video captured by surveillance equipment will jitter due to the shaking of the equipment, this jitter will affect the detection results of moving target detection algorithms that rely on stable video frames. This paper proposes an improved ORB algorithm to solve the video jitter problem, so that the moving target detection algorithm can accurately detect moving targets in jittery videos. First, wavelet transform is used to locate the high-frequency area of the image, and then feature extraction is performed on the area, which improves the efficiency of ORB feature extraction. And use Boosted Efficient Binary Local Image Descriptor (BEBLID) to replace the descriptors of directional FAST and rotation BRIEF (ORB) to improve matching accuracy. In the feature matching stage, a neighborhood query method is proposed to replace the global search of the traditional BFMatcher, which improves the matching speed. Finally, Progressive Sample Consistency (PROSAC) is employed to ensure accurate matching of point pairs, resulting in a motion matrix for video stabilization. Finally, a Gaussian mixture model with adaptive distribution numbers is combined to quickly detect moving targets. Comparative experiments with scale-invariant feature transform(SIFT), Accelerated-KAZE(AKAZE), ORB, Qtree _ ORB and SIRB prove the superior accuracy and speed of this algorithm.

关键词： Wavelet transforms Accuracy Surveillance Search methods Object detection Jitter Streaming media Feature extraction Real-time systems Gaussian mixture model

来源：评论

学校读者我要写书评

暂无评论

Beyond the Snowfall: Enhancing Snowy Day Object detection Through Progressive Restoration and Multi-Feature Fusion

Beyond the Snowfall: Enhancing Snowy Day Object Detection Th...

引用

International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Zhong Wang Gang Zhou Jing Ma Tianhao Xue Zhenhong Jia Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi China

In the field of computer vision, object detection is a prominent and challenging task. Despite the favorable performance of deep learning-based object detection techniques on clear images, it fails in inclement weather conditions like snow because of image degradation. Recent efforts have explored using image restoration methods to enhance degraded images before object detection. However, direct restoration can sometimes cause new disturbances, impeding detection performance improvements. To address this issue, we propose a joint framework that connects the iterative desnow module and detection module in an end-to-end manner. Specially, we design an Advantage Union structure for multi-feature fusion, which effectively combines original, intermediate, and restored features, reducing potential information loss from restoration. Experimental results show that our method achieves higher accuracy compared to the recent state-of-the-art methods in both synthetic dataset and real-to-world snowy images.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Coarse-to-fine change detection in low-illumination video sequences via classifier trained on ResNet18

Coarse-to-fine change detection in low-illumination video se...

引用

2024 International Conference on Image processing and Artificial Intelligence, ICIPAl 2024

作者： Lin, Jiajun Jia, Zhenhong School of Computer Science and Technology Xinjiang University Urumqi830046 China Key Laboratory of Signal Detection and Processing Xinjiang Uygur Autonomous Region Xinjiang University Urumqi830046 China

ISBN: (纸本)9781510681514

In low-illumination environments, the contrast between targets and the background sharply decreases, and imaging sensors introduce complex random noise while capturing more light, leading to severe distortion in the target features of visible light images. Under these conditions, target feature-based detection algorithms face significant challenges. In this paper, we propose a coarse-to-fine change detection algorithm that detects targets in low-illumination environments by focusing on the change features generated by the targets. First, a binary classifier based on ResNet18 is trained with a specially curated dataset of targets and noise. Second, this classifier is applied to identify potential change objects in test data, preserve targets, and extract corresponding bi-temporal local regions of interest. Third, novel difference feature extraction operators are employed to generate local difference images. Next, a Laplacian-of-Gaussian-based graph cut algorithm is used to perform binary segmentation, distinguishing foreground from background in the images. We validated the feasibility of this algorithm in three challenging night scenes. Compared with the current state-of-the-art unsupervised change detection algorithms, the proposed algorithm shows overall better detection performance. © 2024 SPIE.

关键词： Change detection

来源：评论

学校读者我要写书评

暂无评论

RVDNet: A Two-Stage Network for Real-World Video Desnowing with Domain Adaptation

RVDNet: A Two-Stage Network for Real-World Video Desnowing w...

引用

International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Tianhao Xue Gang Zhou Runlin He Zhong Wang Juan Chen Zhenhong Jia Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi China

Video snow removal is an important task in computer vision, as the snowflakes in videos reduce visibility and negatively affect the performance of outdoor visual systems. However, due to the complexity of real snowy scenarios, it is difficult to apply existing supervised learning-based methods to process real-world snowy videos. In this paper, we propose a novel two-stage video desnow network for the real world, called RVDNet. The first stage of RVDNet utilizes Spatial Feature Extraction Modules (SFEM) to extract the spatial features of the input frames. In the second stage, we design Spatial-Temporal Desnowing Modules (STDM) to remove snowflakes via spatio-temporal learning. Furthermore, we introduce the unsupervised domain adaptation module, which is embedded for aligning the feature space of real and synthetic data in the spatial and spatio-temporal domains, respectively. Experiments on the proposed SnowScape dataset prove that our method has superior desnow performance not only on synthetic data, but also in the real world.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Unsupervised wide field-of-view video image change detection

Unsupervised wide field-of-view video image change detection

引用

2024 International Conference on Image processing and Artificial Intelligence, ICIPAl 2024

作者： Liu, Rui Jia, Zhenhong Huang, Xiaohui Wang, Jiajia Zhou, Gang Shi, Fei School of Computer Science and Technology Xinjiang University Urumqi830046 China Key Laboratory of Signal Detection and Processing Xinjiang Uygur Autonomous Region Xinjiang University Urumqi830046 China

ISBN: (纸本)9781510681514

Video surveillance requires simultaneous monitoring of multiple areas. Consequently, real-time automatic change detection of the monitored areas becomes very important. In the context of wide field-of-view conditions, the combination of a wide field-of-view, intricate environmental factors, and a substantial presence of random noise can lead to the degradation of visual fidelity and a diminished signal-to-noise ratio in the video images acquired through the image sensor. As a consequence, the task of detecting subtle changes becomes challenging for the surveillance system. To address the above problems, we have proposed a change detection method that leverages improved difference images and super fast and robust fuzzy c-means with constraints clustering. Initially, we employ an improved log-ratio operator and an improved mean-ratio operator to generate two distinct difference images. Subsequently, the wavelet fusion algorithm is applied to merge these two difference images, effectively integrating their distinctive features and producing a fused difference image with differentiability. Then, the new difference image is subjected to soft threshold primary classification and a cumulative distribution function normalization to obtain the difference image after primary classification. Finally, the super fast and robust fuzzy c-means with constraints clustering algorithm is employed for the ultimate classification, enabling the separation of changed and unchanged areas within the image. © 2024 SPIE.

关键词： Clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Tor encrypted traffic recognition and classification technology based on flow information graph and deep learning

Tor encrypted traffic recognition and classification technol...

引用

2024 International Conference on Image processing and Artificial Intelligence, ICIPAl 2024

作者： Zhang, Yu Jia, Zhenhong Huang, Xiaohui Wang, Jiajia Zhou, Gang Shi, Fei Lu, Changwu School of Computer Science and Technology Xinjiang University Urumqi830046 China Key Laboratory of Signal Detection and Processing Xinjiang University Xinjiang Uygur Autonomous Region Urumqi830046 China

ISBN: (纸本)9781510681514

Internet traffic analysis is the core approach to network management and security. In the rapidly changing environment of encrypted traffic, traditional plaintext-based analysis methods have become obsolete. Although there are currently some methods for analysing encrypted traffic, they overlook the inherent logic and hierarchy of different encrypted traffic analysis requirements and lack research into the essential characteristics of encrypted *** article proposes a framework FM-ENet based on Graph Neural Network and Deep learning for Tor encrypted traffic classification to meet the practical needs of network management and security monitoring.A Flow Information Graph has been designed on the basis of Graph Neural Network, which can effectively alleviate the problem of concept *** the area of deep neural networks, we have developed an end-to-end multi-level spatio-temporal feature fusion enhanced network module ST-FENet, which has the advantage of automatically learning the non-linear relationship between input and output data without the need for manual feature extraction,this structure can complement each other in classification performance and solve the problem of low classifier efficiency. We compared FM-ENet to current popular methods using the public ISCX-Tor dataset. FM-ENet can achieve higher performance while saving costs, with a 9% improvement in accuracy compared to FlowPrint and a 4.4%, 4.5% and 4.4% improvement in PR, RC and F1 indicators compared to TSCRNN. © 2024 SPIE.

关键词： Network security

来源：评论

学校读者我要写书评

暂无评论

Infrared Small Target detection with Feature Refinement and Context Enhancement 31st

Infrared Small Target Detection with Feature Refinement and...

引用

31st International Conference on Multimedia Modeling, MMM 2025

作者： Li, Xiuhong Zhu, Xinyue Li, Boyuan Li, Songlin Wang, Luyao Jia, Zhenhong School of Computer Science and Technology Xinjiang University Urumgi China Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi China College of Mathematics and System Science Xinjiang University Urumqi China

ISBN: (纸本)9789819620609

Infrared small target detection has received widespread application and attention in both civilian and military fields. However, due to the very small size and lack of unique features of these targets, existing methods often suffer from inaccurate edge localization, and the targets are easily overwhelmed by complex backgrounds. To effectively address these issues, we designed a Feature Refinement and Context Enhancement Network (FCNet). The network consists of a multi-branch feature extraction module (MFEM), which integrates ordinary convolution and center difference convolution for diversified feature extraction, and further refines features through attention mechanism to enhance feature expression ability. In addition, in order to better capture the contextual information of the target, we have introduced the Context Enhancement Module (CEM). The CEM improves the robustness and detection accuracy of the network in complex backgrounds by preserving and enhancing important information in the image. Especially when processing infrared images with complex backgrounds and low signal-to-noise ratios, CEM can effectively highlight the target area and reduce false positives and false negatives. The experimental results on the SIRST dataset show that our FCNet performs well on multiple evaluation metrics. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Infrared imaging

来源：评论

学校读者我要写书评

暂无评论

Dual-MambaNet: A Lightweight Dual-Branch Brain Image Segmentation Network Based on Local Attention and Mamba 27th

Dual-MambaNet: A Lightweight Dual-Branch Brain Image Segment...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Zhang, Feifei Shi, Fei Ren, Dayong Jia, Zhenhong Wang, Jianyi School of Computer Science and Technology Xinjiang University Urumqi Xinjiang830046 China Key Laboratory of Signal Detection and Processing Xinjiang University Urumqi Xinjiang830046 China National Key Laboratory for Novel Software Technology Nanjing University Nanjing210023 China

ISBN: (纸本)9783031781032

Brain tissue segmentation is critical for diagnosing and treating brain diseases. While Mamba-based models excel in the medical field, they face performance bottlenecks with high-resolution MRI images, often losing local feature information in complex texture structures. To address these challenges and enable deployment in resource-limited settings, we propose Dual-MambaNet, a lightweight segmentation model based on Mamba. In Dual-MambaNet, we introduce the Outlook attention module to capture local complex textures and structures in brain MRI images. Subsequently, we combined it with the Mamba block to construct a feature extractor (FE) encoder layer to couple local and global features. Additionally, we integrate dual decoder branches and a multi-level pixel contrastive loss function(MPCL) to better integrate local and global features. This method optimizes global feature representation by refining local complex textures and structural details, effectively capturing multi-level features in MRI images. Experimental results on public brain MRI datasets OASIS1 and MRBrainS13 demonstrate that Dual-MambaNet achieves high segmentation accuracy with minimal parameters and computational complexity, making it suitable for deployment in resource-limited medical environments. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Magnetic resonance imaging

来源：评论

学校读者我要写书评

暂无评论

E-RNS : Enhancing Negative Sample Quality from Gradient Perspective for Graph Recommendation

E-RNS : Enhancing Negative Sample Quality from Gradient Pers...

引用

International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Qiangsheng Feng Jiwei Qin Jie Ma School of Computer Science and Technology Xinjiang University Key Laboratory of Signal Detection and Processing

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Bayesian Personalized Ranking (BPR) is a widely used optimization function in GNN-based recommender systems, and negative samples are usually obtained through the Random Negative Sampling (RNS) method during BPR training. However, from the gradient perspective, RNS tends to select low-quality samples with minimal information, resulting in small gradients. These small gradients contribute little to BPR optimization, limiting the model’s ability to effectively distinguish between positive and negative samples. To alleviate this issue, we propose a general negative sample information enhancement method: Enhancing Random Negative Sampling (E-RNS), which constructs hard negative samples by enhancing the information in randomly selected negative samples. Specifically, in the Noise Injection step, it generates initial noise and injects a certain amount of noise in the same direction into the vector dimensions of positive samples to create enriched information. Then, in the Information Fusion step, this enriched information is mixed with the negative samples to synthesize new hard negative samples. Extensive experiments demonstrate that applying E-RNS to GNN-based recommender models significantly improves performance.

关键词： Training Limiting Noise Information filters Vectors Acoustics Bayes methods Speech processing Optimization Recommender systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：