检索结果-内蒙古大学图书馆

Deep Feature Blend Attention: A New Frontier in Super Resolution Image Generation

NEUROCOMPUTING 2025年 618卷

作者： Dhanusha, P. B. Muthukumar, A. Lakshmi, A. Kalasalingam Acad Res & Educ Dept Elect & Commun Engn Virudunagar Tamil Nadu India Ramco Inst Technol Dept Elect & Commun Engn Rajapalayam Tamil Nadu India

Super Resolution (SR) images contain more useful information than Low Resolution (LR) images. Generally, SR images are highly preferred over LR images in the medical field due to their high quality. Generally, SR images are affected by many factors, like blur, noise, and decimation. Therefore, the present study proposes Deep Feature Blend Attention Mechanism for generating SR images with optimal outcomes. In this study, a Deep Learning (DL) based encoder-decoder is used to extract the detailed information from the LR image, assist in noise removal, and improve image quality. Using Gaussian blur, an LR image is produced, which is given as input to the encoder-decoder. The Attention mechanism with feature blending is performed to produce the reconstructed SR image. The low and high frequency component issues from feature map are improved using the attention mechanism. Also, feature blend attention mechanism is implied to reconstruct SR images from LR images. The present study utilizes the feature blend technique, which helps select the optimal features and blending to attain the optimal feature set. The study is performed with and without the feature blend technique in attention mechanism to expose the impact of the feature blend technique, and it is used to avoid overfitting issues and produce optimal results. The performance of the proposed system is assessed by using PSNR and SSIM to show the system's efficiency. The proposed system is compared with other state-of-the-art studies to expose the efficacy of the present system.

关键词： Super resolution image generation encoder-decoder Feature blend Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement☆

引用

NEURAL NETWORKS 2025年 189卷 107562页

作者： Lu, Ye-Xin Ai, Yang Ling, Zhen-Hua Univ Sci & Technol China Natl Engn Res Ctr Speech & Language Informat Proc Hefei Peoples R China

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet comprises a Transformer-embedded encoder-decoder architecture. The encoder aims to encode the input distorted magnitude and phase spectra into time-frequency representations, which are further fed into time-frequency Transformers for alternatively capturing time and frequency dependencies. The decoder comprises a magnitude mask decoder and a phase decoder, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude masking architecture and a phase parallel estimation architecture, respectively. Multi-level loss functions explicitly defined on the magnitude spectra, wrapped phase spectra, and short-time complex spectra are adopted to jointly train the MP-SENet model. A metric discriminator is further employed to compensate for the incomplete correlation between these losses and human auditory perception. Experimental results demonstrate that our proposed MP-SENet achieves state-of-the-art performance across multiple speech enhancement tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech. Remarkably, for the speech denoising task, the proposed MP-SENet yields a PESQ of 3.60 on the VoiceBank+DEMAND dataset and 3.62 on the DNS challenge dataset.

关键词： Speech enhancement encoder-decoder Magnitude prediction Phase prediction Explicit estimation

来源：评论

学校读者我要写书评

暂无评论

A Framework for Compound Interrupted Sampling Repeater Jamming Parameter Measurement

引用

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2025年 74卷

作者： Lv, Qinzhe Liu, Liyi Wu, Yaojun Peng, Mimi Quan, Yinghui Xidian Univ Sch Elect Engn Dept Remote Sensing Sci & Technol Key Lab Collaborat Intelligence SystMinist Edc Xian 710071 Peoples R China Xidian Univ Xian Key Lab Adv Remote Sensing Xian 710071 Peoples R China

Active compound jamming, particularly compound interrupted sampling repeater jamming (ISRJ), possesses excellent flexibility and jamming effectiveness, making it one of the major threats to radar systems. Accurately measuring the key parameters of each ISRJ component within the compound ISRJ can provide critical prior information for subsequent anti-jamming efforts. However, most of the existing ISRJ parameter measurement methods target a single ISRJ and lack in-depth research on the measurement of compound ISRJ parameters. Therefore, we propose a unified framework for compound ISRJ parameter measurement that contains a compound ISRJ separation network based on an encoder-decoder architecture and a parameter regression module for measuring the key parameters of each ISRJ component in the compound ISRJ. Experimental results indicate that the proposed framework achieves parameter measurement accuracies of over 89% and 85% for dual-compound and multicompound ISRJ, respectively, significantly outperforming existing methods.

关键词： Compounds Jamming Training Time-domain analysis Feature extraction Time measurement Analytical models Repeaters Kernel Decoding Channel attention compound interrupted sampling repeater jamming (ISRJ) encoder-decoder jamming separation parameter measurement

来源：评论

学校读者我要写书评

暂无评论

Devising single in-out long short-term memory univariate models for predicting the electricity price on the day-ahead markets

引用

CONNECTION SCIENCE 2024年第1期36卷

作者： Bara, Adela Oprea, Simona Vasilica Bucharest Univ Econ Studies Dept Econ Informat & Cybernet Bucharest Romania

We investigate the performance of intelligent systems such as various Long Short-Term Memory (LSTM) and hybrid models to forecast the electricity spot prices considering univariate and multivariate models. Six models are created to handle the Electricity Price Forecast (EPF). Furthermore, an EPF methodology that consists of a LSTM univariate model, namely Single in-out (Sio) model is proposed. It builds on the Day-Ahead electricity Market (DAM) specificity and, as a novelty, it inserts the predicted value back into the sliding input vector to predict the next values until the entire vector of 24 prices is predicted. The proposed model is further enhanced by the convolutional reading of input data that is embedded into the LSTM cell or by a hybrid combination of LSTM and Convolutional Neural Networks (CNN) that interprets sub-sequences of input data and extracts features that are provided as a sequence to the LSTM model. The methodology is validated using data sets from the Romanian Market Operator (OPCOM) and other market operators from Serbia (SEEPEX), Hungary (HUPX) and Bulgaria (IBEX). Our models improve the results for the day-ahead forecast in comparison with other models by 21.02% in terms of Mean Absolute Error (MAE).

关键词： Univariate and multivariate input encoder-decoder convolutional neural networks long short-term memory electricity price day-ahead forecast

来源：评论

学校读者我要写书评

暂无评论

Image Captioning- Bangladesh's Heritage Perspective Using Deep Learning

Image Captioning- Bangladesh's Heritage Perspective Using De...

引用

IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)

作者： Alam, Sarowar Islam, Khalidul Sharmila, Nishat Sovon, Ziaur Rahman Rahman, Rashedur M. North South Univ Dept Elect & Comp Engn Plot 15Block B Dhaka 1229 Bangladesh

ISBN: (纸本)9781665486842

Image captioning aims to make a textual short explanation of a given image. Despite the fact that it looks to be a straightforward task for human being, it is difficult for computers since it involves the ability to analyze the image and provide a human-like description. encoder-decoder architectures have recently reached advanced outcomes in the form of picture captioning. With some existing datasets, e.g., Flickr_data, Flickr8k_***, and heritage dataset, we build our model that can create captions from the images related to Bangladeshi culture, tradition and historical places. Bangladesh is enriched with great culture;many heritage places and cultural programs that attract travelers to visit our country. We try to relate our culture, place, and food, together with machine learning techniques by appropriate captioning and spread over our cultural strengths through proper captioning. Our image captioning tool can be very helpful for travel lovers who want to know more about Bangladesh.

关键词： encoder-decoder datasets LSTM CNN RNNs ResNet-50 tensorflow & keras RNN

来源：评论

学校读者我要写书评

暂无评论

MULTI-SCALE REFINEMENT NETWORK BASED ACOUSTIC ECHO CANCELLATION 47

MULTI-SCALE REFINEMENT NETWORK BASED ACOUSTIC ECHO CANCELLAT...

引用

47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Cui, Fan Guo, Liyong Li, Wenfeng Gao, Peng Wang, Yujun Xiaomi Inc Beijing Peoples R China

ISBN: (纸本)9781665405409

Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, highlevel features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multiscale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.

关键词： acoustic echo cancellation encoder-decoder multi-scale

来源：评论

学校读者我要写书评

暂无评论

Weakly Supervised Semantic Segmentation of Echocardiography Videos via Multi-level Features Selection 5th

Weakly Supervised Semantic Segmentation of Echocardiography ...

引用

5th Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

作者： Chen, Erna Cai, Zemin Lai, Jian-huang Shantou Univ Dept Elect Engn Shantou 515063 Guangdong Peoples R China Sun Yat Sen Univ Sch Data & Comp Sci Guangzhou 510006 Peoples R China

ISBN: (纸本)9783031189098;9783031189104

Echocardiogram illustrates what the capacity it owns of detecting the global and regional functions of the heart. With obvious benefits of non-invasion, visuality and mobility, it has become an indispensable technology for clinical evaluation of cardiac function. However, the uncertainty in measurement of ultrasonic equipment and inter-reader variability are always inevitable. Regarding of this situation, researchers have proposed many methods for cardiac function assessment based on deep learning. In this paper, we propose UDeep, an encoder-decoder model for left ventricular segmentation of echocardiography, which pays attention to both multi-scale high-level semantic information and multi-scale low-level fine-grained information. Our model maintains sensitivity to semantic edges, so as to accurately segment the left ventricle. The encoder extracts multiple scales high-level semantic features through a computation efficient backbone named Separated Xception and the Atrous Spacial Pyramid Pooling module. A new decoder module consisting of several Upsampling Fusion Modules (UPFMs), at the same time, is applied to fuse features of different levels. To improve the generalization of our model to different echocardiography images, we propose Pseudo-Segmentation Penalty loss function. Our model accurately segments the left ventricle with a Dice Similarity Coefficient of 0.9290 on the test set of echocardiography videos dataset.

关键词： Echocardiography Left ventricle Semantic segmentation encoder-decoder Pseudo-Segmentation Penalty Loss Function

来源：评论

学校读者我要写书评

暂无评论

Spatio-temporal Super-resolution Network: Enhance Visual Representations for Video Captioning

Spatio-temporal Super-resolution Network: Enhance Visual Rep...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Cao, Quanhui Tang, Pengjie Wang, Hanli Tongji Univ Dept Comp Sci & Technol Shanghai Peoples R China Jinggangshan Univ Coll Elect & Informat Engn Jian Jiangxi Peoples R China Tongji Univ Key Lab Embedded Syst & Serv Comp Minist Educ Shanghai Peoples R China Frontiers Sci Ctr Intelligent Autonomous Syst Shanghai Peoples R China

ISBN: (纸本)9781665484855

Video captioning is a sequence-to-sequence task of automatically generating descriptions for given videos. Due to the diversity of video scenes, learning rich representations is critical for video captioning. However, previous works mainly exploited elaborate features but neglected the loss of information caused by frame sampling and image compression. In this paper, we propose a novel spatio-temporal super-resolution (STSR) network which is jointly trained for the video captioning task and the video super-resolution task in an end-to-end fashion. Specifically, a video super-resolution task consists of two subtasks: spatial super-resolution restores high-resolution image features while temporal super-resolution reconstructs missing frame features between two adjacent sampled frames. By sharing multi-modal encoders across both of these two tasks, STSR encourages encoders to capture salient visual contents and learn context-aware representations. Experiments on two benchmark datasets demonstrate that the proposed STSR boosts video captioning performances significantly and outperforms most state-of-the-art approaches.

关键词： video captioning video super-resolution encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Automatic facial expressions, gaze direction and head movements generation of a virtual agent 24

Automatic facial expressions, gaze direction and head moveme...

引用

24th ACM International Conference on Multimodal Interaction (ICMI)

作者： Delbosc, Alice Ochs, Magalie Ayache, Stephane Aix Marseille Univ Marseille France

ISBN: (纸本)9781450393898

In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial encoder-decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system. A large corpus of almost 4 hours of videos, involving 89 different speakers is used to train our models. We extract the speech and visual features automatically from these videos using existing tools. The evaluation of these models is conducted objectively with measures such as density evaluation and a visualisation from PCA reduction, as well as subjectively through a users perceptive study. Our proposed methodology shows that on 15 seconds sequences, encoder-decoder architecture drastically improves the perception of generated behaviours in two criteria: the coordination with speech and the naturalness. Our code can be found in : https://***/aldelb/non-verbal-behaviours-generation.

关键词： Non-verbal behaviour behaviour generation embodied conversational agent neural networks adversarial learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Enhancing Indic Handwritten Text Recognition Using Global Semantic Information 1

引用

18th International Conference on Frontiers in Handwriting Recognition (ICFHR)

作者： Mondal, Ajoy Jawahar, C. V. Int Inst Informat Technol Hyderabad India

ISBN: (数字)9783031216480

ISBN: (纸本)9783031216473;9783031216480

Handwritten Text Recognition (htr) is more interesting and challenging than printed text due to uneven variations in the handwriting style of the writers, content, and time. htr becomes more challenging for the Indic languages because of (i) multiple characters combined to form conjuncts which increase the number of characters of respective languages, and (ii) near to 100 unique basic Unicode characters in each Indic script. Recently, many recognition methods based on the encoder-decoder framework have been proposed to handle such problems. They still face many challenges, such as image blur and incomplete characters due to varying writing styles and ink density. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we enhance the performance of Indic handwritten text recognizers using global semantic information. We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts. The semantic information is used in both the encoder for supervision and the decoder for initialization. The semantic information is predicted from the word embedding of a pre-trained language model. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages.

关键词： Indic handwritten text encoder-decoder Global semantic information Word embedding Language model Indic language

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：