Super Resolution (SR) images contain more useful information than Low Resolution (LR) images. Generally, SR images are highly preferred over LR images in the medical field due to their high quality. Generally, SR imag...
详细信息
Super Resolution (SR) images contain more useful information than Low Resolution (LR) images. Generally, SR images are highly preferred over LR images in the medical field due to their high quality. Generally, SR images are affected by many factors, like blur, noise, and decimation. Therefore, the present study proposes Deep Feature Blend Attention Mechanism for generating SR images with optimal outcomes. In this study, a Deep Learning (DL) based encoder-decoder is used to extract the detailed information from the LR image, assist in noise removal, and improve image quality. Using Gaussian blur, an LR image is produced, which is given as input to the encoder-decoder. The Attention mechanism with feature blending is performed to produce the reconstructed SR image. The low and high frequency component issues from feature map are improved using the attention mechanism. Also, feature blend attention mechanism is implied to reconstruct SR images from LR images. The present study utilizes the feature blend technique, which helps select the optimal features and blending to attain the optimal feature set. The study is performed with and without the feature blend technique in attention mechanism to expose the impact of the feature blend technique, and it is used to avoid overfitting issues and produce optimal results. The performance of the proposed system is assessed by using PSNR and SSIM to show the system's efficiency. The proposed system is compared with other state-of-the-art studies to expose the efficacy of the present system.
Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural na...
详细信息
Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet comprises a Transformer-embedded encoder-decoder architecture. The encoder aims to encode the input distorted magnitude and phase spectra into time-frequency representations, which are further fed into time-frequency Transformers for alternatively capturing time and frequency dependencies. The decoder comprises a magnitude mask decoder and a phase decoder, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude masking architecture and a phase parallel estimation architecture, respectively. Multi-level loss functions explicitly defined on the magnitude spectra, wrapped phase spectra, and short-time complex spectra are adopted to jointly train the MP-SENet model. A metric discriminator is further employed to compensate for the incomplete correlation between these losses and human auditory perception. Experimental results demonstrate that our proposed MP-SENet achieves state-of-the-art performance across multiple speech enhancement tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech. Remarkably, for the speech denoising task, the proposed MP-SENet yields a PESQ of 3.60 on the VoiceBank+DEMAND dataset and 3.62 on the DNS challenge dataset.
Active compound jamming, particularly compound interrupted sampling repeater jamming (ISRJ), possesses excellent flexibility and jamming effectiveness, making it one of the major threats to radar systems. Accurately m...
详细信息
Active compound jamming, particularly compound interrupted sampling repeater jamming (ISRJ), possesses excellent flexibility and jamming effectiveness, making it one of the major threats to radar systems. Accurately measuring the key parameters of each ISRJ component within the compound ISRJ can provide critical prior information for subsequent anti-jamming efforts. However, most of the existing ISRJ parameter measurement methods target a single ISRJ and lack in-depth research on the measurement of compound ISRJ parameters. Therefore, we propose a unified framework for compound ISRJ parameter measurement that contains a compound ISRJ separation network based on an encoder-decoder architecture and a parameter regression module for measuring the key parameters of each ISRJ component in the compound ISRJ. Experimental results indicate that the proposed framework achieves parameter measurement accuracies of over 89% and 85% for dual-compound and multicompound ISRJ, respectively, significantly outperforming existing methods.
We investigate the performance of intelligent systems such as various Long Short-Term Memory (LSTM) and hybrid models to forecast the electricity spot prices considering univariate and multivariate models. Six models ...
详细信息
We investigate the performance of intelligent systems such as various Long Short-Term Memory (LSTM) and hybrid models to forecast the electricity spot prices considering univariate and multivariate models. Six models are created to handle the Electricity Price Forecast (EPF). Furthermore, an EPF methodology that consists of a LSTM univariate model, namely Single in-out (Sio) model is proposed. It builds on the Day-Ahead electricity Market (DAM) specificity and, as a novelty, it inserts the predicted value back into the sliding input vector to predict the next values until the entire vector of 24 prices is predicted. The proposed model is further enhanced by the convolutional reading of input data that is embedded into the LSTM cell or by a hybrid combination of LSTM and Convolutional Neural Networks (CNN) that interprets sub-sequences of input data and extracts features that are provided as a sequence to the LSTM model. The methodology is validated using data sets from the Romanian Market Operator (OPCOM) and other market operators from Serbia (SEEPEX), Hungary (HUPX) and Bulgaria (IBEX). Our models improve the results for the day-ahead forecast in comparison with other models by 21.02% in terms of Mean Absolute Error (MAE).
Image captioning aims to make a textual short explanation of a given image. Despite the fact that it looks to be a straightforward task for human being, it is difficult for computers since it involves the ability to a...
详细信息
ISBN:
(纸本)9781665486842
Image captioning aims to make a textual short explanation of a given image. Despite the fact that it looks to be a straightforward task for human being, it is difficult for computers since it involves the ability to analyze the image and provide a human-like description. encoder-decoder architectures have recently reached advanced outcomes in the form of picture captioning. With some existing datasets, e.g., Flickr_data, Flickr8k_***, and heritage dataset, we build our model that can create captions from the images related to Bangladeshi culture, tradition and historical places. Bangladesh is enriched with great culture;many heritage places and cultural programs that attract travelers to visit our country. We try to relate our culture, place, and food, together with machine learning techniques by appropriate captioning and spread over our cultural strengths through proper captioning. Our image captioning tool can be very helpful for travel lovers who want to know more about Bangladesh.
Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decreas...
详细信息
ISBN:
(纸本)9781665405409
Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, highlevel features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multiscale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.
Echocardiogram illustrates what the capacity it owns of detecting the global and regional functions of the heart. With obvious benefits of non-invasion, visuality and mobility, it has become an indispensable technolog...
详细信息
ISBN:
(纸本)9783031189098;9783031189104
Echocardiogram illustrates what the capacity it owns of detecting the global and regional functions of the heart. With obvious benefits of non-invasion, visuality and mobility, it has become an indispensable technology for clinical evaluation of cardiac function. However, the uncertainty in measurement of ultrasonic equipment and inter-reader variability are always inevitable. Regarding of this situation, researchers have proposed many methods for cardiac function assessment based on deep learning. In this paper, we propose UDeep, an encoder-decoder model for left ventricular segmentation of echocardiography, which pays attention to both multi-scale high-level semantic information and multi-scale low-level fine-grained information. Our model maintains sensitivity to semantic edges, so as to accurately segment the left ventricle. The encoder extracts multiple scales high-level semantic features through a computation efficient backbone named Separated Xception and the Atrous Spacial Pyramid Pooling module. A new decoder module consisting of several Upsampling Fusion Modules (UPFMs), at the same time, is applied to fuse features of different levels. To improve the generalization of our model to different echocardiography images, we propose Pseudo-Segmentation Penalty loss function. Our model accurately segments the left ventricle with a Dice Similarity Coefficient of 0.9290 on the test set of echocardiography videos dataset.
Video captioning is a sequence-to-sequence task of automatically generating descriptions for given videos. Due to the diversity of video scenes, learning rich representations is critical for video captioning. However,...
详细信息
ISBN:
(纸本)9781665484855
Video captioning is a sequence-to-sequence task of automatically generating descriptions for given videos. Due to the diversity of video scenes, learning rich representations is critical for video captioning. However, previous works mainly exploited elaborate features but neglected the loss of information caused by frame sampling and image compression. In this paper, we propose a novel spatio-temporal super-resolution (STSR) network which is jointly trained for the video captioning task and the video super-resolution task in an end-to-end fashion. Specifically, a video super-resolution task consists of two subtasks: spatial super-resolution restores high-resolution image features while temporal super-resolution reconstructs missing frame features between two adjacent sampled frames. By sharing multi-modal encoders across both of these two tasks, STSR encourages encoders to capture salient visual contents and learn context-aware representations. Experiments on two benchmark datasets demonstrate that the proposed STSR boosts video captioning performances significantly and outperforms most state-of-the-art approaches.
In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversari...
详细信息
ISBN:
(纸本)9781450393898
In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial encoder-decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system. A large corpus of almost 4 hours of videos, involving 89 different speakers is used to train our models. We extract the speech and visual features automatically from these videos using existing tools. The evaluation of these models is conducted objectively with measures such as density evaluation and a visualisation from PCA reduction, as well as subjectively through a users perceptive study. Our proposed methodology shows that on 15 seconds sequences, encoder-decoder architecture drastically improves the perception of generated behaviours in two criteria: the coordination with speech and the naturalness. Our code can be found in : https://***/aldelb/non-verbal-behaviours-generation.
Handwritten Text Recognition (htr) is more interesting and challenging than printed text due to uneven variations in the handwriting style of the writers, content, and time. htr becomes more challenging for the Indic ...
详细信息
ISBN:
(数字)9783031216480
ISBN:
(纸本)9783031216473;9783031216480
Handwritten Text Recognition (htr) is more interesting and challenging than printed text due to uneven variations in the handwriting style of the writers, content, and time. htr becomes more challenging for the Indic languages because of (i) multiple characters combined to form conjuncts which increase the number of characters of respective languages, and (ii) near to 100 unique basic Unicode characters in each Indic script. Recently, many recognition methods based on the encoder-decoder framework have been proposed to handle such problems. They still face many challenges, such as image blur and incomplete characters due to varying writing styles and ink density. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we enhance the performance of Indic handwritten text recognizers using global semantic information. We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts. The semantic information is used in both the encoder for supervision and the decoder for initialization. The semantic information is predicted from the word embedding of a pre-trained language model. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages.
暂无评论