With the rise of deep learning, numerous methods based on convolutional neural networks have emerged in various fields of image restoration. neural networks and recent advancements like transformers have performed exc...
详细信息
neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group ...
详细信息
neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group of Pictures (GOP) is usually compressed fully as an I frame, and the subsequent P frames are compressed by referencing this I frame at Low Delay P (LDP) encoding mode. However, this compression approach overlooks the utilization of background information, which limits its adaptability to different scenarios. In this paper, we propose a novel Adaptive Surveillance Video Compression framework based on background hyperprior, dubbed as ASVC. This background hyperprior is related with side information to assist in coding both the temporal and spatial domains. Our method mainly consists of two components. First, the background information from a GOP is extracted, modeled as hyperprior and is compressed by exiting methods. Then these hyperprior is used as side information to compress both I frames and P frames. ASVC effectively captures the temporal dependencies in the latent representations of surveillance videos by leveraging background hyperprior for auxiliary video encoding. The experimental results demonstrate that applying ASVC to traditional and learning based methods significantly improves performance.
Most existing infrared image enhancement algorithms focus on detail and contrast enhancement of ordinary infrared images, and when applied to low-light infrared images, detail and target texture are often severely los...
详细信息
Most existing infrared image enhancement algorithms focus on detail and contrast enhancement of ordinary infrared images, and when applied to low-light infrared images, detail and target texture are often severely lost. The reason is that most algorithms process images in a single scale and have difficulty coping with the degradation of image features while enhancing brightness. To solve this problem, we propose a multi-layer and multi-scale feature fusion network (MMFF-Net). It can improve the brightness of low-light infrared images in the absence of normal-light reference samples and keep the image details consistent with the source image. In this paper, features at different layers of the image are extracted using an adaptively modified deep network. A multi-scale adaptive feature fusion module (MAFFM) is designed to preserve and fuse multi-scale information from different convolutional layer features. The fusion features are passed to the iterative function as pixel-wise parameters for image brightness enhancement. We also propose the local feature fusion module (LFFM), which reconstructs images after fusing multiple features, including brightness enhancement images and source images. Finally, in order to implement the training of the whole network, a set of loss functions is carefully designed in this paper. After extensive experiments, it is shown that the algorithm in this paper can effectively enhance low-light infrared images and perform well in subjective visual tests and quantitative tests compared to existing methods.
In this letter, an improved gated linear unit (GLU) structure for end-to-end (E2E) speech enhancement is proposed. In the U-Net structure, which is widely used as the foundational architecture for E2E deep neural netw...
详细信息
In this letter, an improved gated linear unit (GLU) structure for end-to-end (E2E) speech enhancement is proposed. In the U-Net structure, which is widely used as the foundational architecture for E2E deep neural network-based speech denoising, the input noisy speech signal undergoes multiple layers of encoding and is compressed into essential potential representative information at the bottleneck. The latent information is then transmitted to the decoder stage for the restoration of the target clean speech. Among these approaches, CleanUNet, a prominent state-of-the-art (SOTA) method, enhances temporal attention in latent space by employing multi-head self-attention. However, unlike the approach of applying the attention mechanism to the potentially compressed representative information of the bottleneck layer, the proposed method instead assigns the attention module to the GLU of each encoder/decoder block layer. The proposed method is validated by measuring short-term objective speech intelligibility and sound quality. The objective evaluation results indicated that the proposed method using residual-attention GLU outperformed existing methods using SOTA models such as FAIR-denoiser and CleanUNet across signal-to-noise ratios ranging from 0 to 15 dB. The current gated linear unit (GLU) utilizes half of the signal as a gating signal, applying it to the main signal portion corresponding to the speech feature map of the other half. In contrast, the proposed residual-attention GLU employs a residual-attention network to improve the channel and temporal context within the signal, enhancing the noise-robust feature map in the main signal part. image
The categories of diabetic retinopathy (DR) are interrelated, and different ophthalmologists often give different results for the same fundus image. Automatic cross image retrieval of DR can provide an effective diagn...
详细信息
The categories of diabetic retinopathy (DR) are interrelated, and different ophthalmologists often give different results for the same fundus image. Automatic cross image retrieval of DR can provide an effective diagnostic solution for ophthalmologists and is of great significance in clinical practice. Cross-image (i.e. left and right fundus images for a patient) information is highly correlated and complementary and can be harnessed to improve various computer vision tasks such as image classification, object detection, image segmentation, and image retrieval. Previous studies did not explore the correlation between lesion areas in left and right fundus images of patients, limiting the effective diagnosis of DR. In this study, we proposed a cross-image siamese graph convolutional network(CIS-GCN) to retrieve fine-grained diabetic retinopathy fundus images. First, we constructed a global-specific structure to obtain the specific features of the left and right eyes. Then, we passed the specific features through the pathological localization network to obtain the location features of the lesion. Finally, a graph convolutional neural network was introduced to construct node sets for the left and right eyes, respectively, to represent relatively consistent regions in the fundus images of patients and learn their correlations. We tested our method using Diabetic Retinopathy Detection datasets and the results showed that our algorithm outperforms other state-of-the-art methods by 2.2 % similar to 3.7 % in image data retrieval.
Pneumonia is a common and sometimes fatal lung infection that continues to be a major global health concern. The prediction of pneumonia has become a crucial factor in saving people's lives and improving their qua...
详细信息
Pneumonia is a common and sometimes fatal lung infection that continues to be a major global health concern. The prediction of pneumonia has become a crucial factor in saving people's lives and improving their quality of life. For this purpose, traditional clinical procedures are considered time-consuming. In addition, researchers have used various algorithms to forecast pneumonia due to advances in imageprocessing techniques. However, these algorithms have proven ineffective in terms of feature extraction, which negatively impacts prediction rates. This research aims to predict pneumonia in people worldwide and address the problem of low accuracy. This work introduces a novel method for pneumonia prediction using a deep CNN (Deep Convolutional neural Network) and an InceptionV3 model for feature extraction. Additionally, it introduces an entropy-normalized Neighbourhood Component Analysis (NCA) technique, complemented by Ensemble-Modified Classifiers (EMC) with Naive Bayes, XGBoost, and Random Forest for classification to enhance predictive accuracy. Accurate pneumonia diagnosis is crucial for patient care, but misdiagnoses and delays in diagnosis are not uncommon. This research establishes a robust framework for pneumonia prediction based on deep learning, capable of identifying both normal and atypical pneumonia patterns in medical images. To enhance feature extraction and improve model generalization, the proposed approach combines entropy normalization techniques. This method includes an NCA-based reduction in dimensionality, resulting in more efficient and discriminative feature representations. Furthermore, an ensemble-modified classifier is introduced to refine predictions and improve the model's ability to differentiate between pneumonia and non-pneumonia cases. Experimental results demonstrate that the proposed model surpasses existing methods in terms of accuracy, sensitivity, and specificity. The effectiveness of the proposed system has been confirmed b
Scene Text image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Scene Text image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous methods predominantly employ discriminative Convolutional neural Networks (CNNs) augmented with diverse forms of text guidance to address this issue. Nevertheless, they remain deficient when confronted with severely blurred images, due to their insufficient generation capability when little structural or semantic information can be extracted from original images. Therefore, we introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios. Moreover, we propose a Recognition-Guided Denoising Network, to guide the diffusion model generating LR-consistent results through succinct semantic guidance. Experiments on the TextZoom dataset demonstrate the superiority of RGDiffSR over prior state-of-the-art methods in both text recognition accuracy and image fidelity.
This study presents the RBP-CNN model, a convolutional neural network specifically designed for the precise classification of brain tumors in medical imaging. Conventional methods often encounter difficulties in extra...
详细信息
This study presents the RBP-CNN model, a convolutional neural network specifically designed for the precise classification of brain tumors in medical imaging. Conventional methods often encounter difficulties in extracting image noise and texture features, which has led to the incorporation of regional binary patterns (RBP) and Gray Standard Normalization (GSN) preprocessing techniques in CNN. The research addresses fundamental inquiries regarding the impact of the model on accuracy, false classifications, and efficiency. The novelty of RBPCNN lies in its distinctive approach to extracting texture features, which involves optimizing pixel values through GSN preprocessing and generating regional binary patterns based on integral images. The objective of this research is to bridge a critical gap by providing a more accurate and efficient model for classifying brain tumors. The key findings reveal the exceptional performance of RBP-CNN, achieving a classification accuracy of 96% with a reduced false classification ratio of 7% across a dataset of 3000 samples. Comparative analyses position RBP-CNN as superior to alternative models in terms of accuracy, false classification rates, and efficiency. The structural insights and hyperparameter values of the model, as well as its application to the FigShare dataset, demonstrate its robustness and scalability. RBP-CNN emerges as an innovative and effective solution, advancing the field of medical image categorization. The findings of this study contribute a novel methodology, paving the way for future exploration in hyperspectral image applications and positioning RBP-CNN as a potential state-ofthe-art tool for medical image analysis.
Biometric facial identification presents a distinct and reliable method for distinguishing individuals based on unique physical or behavioral characteristics. Unlike traditional security measures such as passwords, fa...
详细信息
Biometric facial identification presents a distinct and reliable method for distinguishing individuals based on unique physical or behavioral characteristics. Unlike traditional security measures such as passwords, facial features offer a level of security that cannot be shared, replicated, or forgotten. This study focuses on the application of facial biometrics for person identification, leveraging the advantages of non-contact biometrics like facial features over other methods such as fingerprint or palm recognition. Facial recognition in this work is predicated on the geometric shapes or facial characteristics. Emphasis is placed on three fundamental views of the face: upward, frontal, and downward. For each of these views, specific regions are extracted for processing, including the right-eye region and its width. Simultaneously, the dimensions of the mouth, both height and width, are extracted in a similar manner. Training and evaluation of the proposed system are accomplished using three soft computing models: an Artificial neural Network (ANN), a Particle Swarm Optimization neural Network (PSO-NN) model, and an Adaptive Neuro-Fuzzy Inference System (ANFIS) model. Each model employs a dataset constructed for each view. Optimization of the models is achieved by adjusting parameters like the number of neurons used in the hidden layer for recognition in neural network-based procedures. Performance evaluation of the proposed system is conducted by computing the mean square error, obtained by random data division. The models demonstrated a training set accuracy of 97.20% and a testing data set accuracy of 90.86%. These results indicate the effectiveness of the proposed system for both individual and combined face views, underscoring the potential of facial biometrics in secure identification applications.
In electronic warfare, radar signal deinterleaving is a critical task. While many researchers have applied deep learning and utilised known radar classes to construct interleaved pulse sequences training sets for dein...
详细信息
In electronic warfare, radar signal deinterleaving is a critical task. While many researchers have applied deep learning and utilised known radar classes to construct interleaved pulse sequences training sets for deinterleaving models, these models face challenges in distinguishing between known and unknown radar classes in open-set scenarios. To address this challenge, the authors propose a novel model, the Reconstruction Bidirectional Recurrent neural Network (RBi-RNN). RBi-RNN utilises input reconstruction and employs a joint training strategy incorporating cross-entropy loss, reconstruction loss, and centre loss. These strategies aim to maximise inter-class latent representation distances while minimising intra-class disparities. By incorporating an open-set recognition method based on extreme value theory, RBi-RNN adapts to open-set scenarios. Simulation results demonstrate the superiority of RBi-RNN over conventional models in both closed-set and open-set scenarios. In open-set scenarios, it successfully discriminates between known and unknown radar signals within interleaved pulse sequences, deinterleaving known radar classes with high stability. The authors lay the foundation for future unsupervised deinterleaving methods designed specifically for unknown radar pulses. In electronic warfare, radar signal deinterleaving is crucial. Existing deinterleaving deep-learning models trained with known radar classes struggle in open-set scenarios. The authors introduce reconstruction bidirectional recurrent neural network (RBi-RNN), a novel model incorporating input reconstruction and a joint training strategy. RBi-RNN can distinguish known and unknown radar signals within interleaved pulse sequences and deinterleave the known radar classes. image
暂无评论