360 degrees depth estimation has been extensively studied because 360 degrees images provide a full field of view of the surrounding environment as well as a detailed description of the entire scene. However, most wel...
详细信息
360 degrees depth estimation has been extensively studied because 360 degrees images provide a full field of view of the surrounding environment as well as a detailed description of the entire scene. However, most well-studied convolutional neural networks (CNNs) for 360 degrees depth estimation can extract local features well, but fail to capture rich global features from the panorama due to a fixed receptive field in CNNs. PCformer, a parallel convolutional transformer network that combines the benefits of CNNs and transformers, is proposed for 360 degrees depth estimation. The transformer has the nature to model long-range dependency and extract global features. With PCformer, both global dependency and local spatial features can be efficiently captured. To fully incorporate global and local features, a dual attention fusion module is designed. Besides, a distortion-weighted loss function is designed to reduce the distortion in panoramas. Extensive experiments demonstrate that the proposed method achieves competitive results against the state-of-the-art methods on three benchmark datasets. Additional experiments also demonstrate that the proposed model has benefits in terms of model complexity and generalisation capability.
Precise segmentation of lesions from dermoscopy images is an essential task in computer-aided surgical planning. Unlike current methods that often concentrate on attention mechanisms, we build a pixel -to -pixel segme...
详细信息
Precise segmentation of lesions from dermoscopy images is an essential task in computer-aided surgical planning. Unlike current methods that often concentrate on attention mechanisms, we build a pixel -to -pixel segmentation model called Graph reasoning and Inception Attention Network (GIAN). First, we propose a graph reasoning module that is data -dependent. The node matrix of the graph reasoning derives from the original image and the feature map, so our graph reasoning module can accurately capture global information in feature maps. Second, to avoid the information redundancy caused by channels, we propose the Inception attention module based on the original Inception module, which extracts the local spatial semantic information in the feature information. The Inception attention module can select representative node graphs as feature guidance graphs for image segmentation. The spatial information extracted by multiple parallel convolution kernels ensures the stability of subsequent pixel classification. In this way, the GIAN takes the extraction of global and the guidance of local information into account. The organic combination of the two modules provides a correct ideological basis for the segmentation task. In particular, we extensively evaluate the proposed method on two challenging datasets. The experimental results show that the GIAN can obtain comparable performance to state -of -the -art deep learning models under the same environmental conditions.
Medical imaging techniques are frequently used for tumor detection and diagnosis. Segmentation of tumor from medical images is a popular field of study. To this end, various deep neural network based methods are intro...
详细信息
ISBN:
(纸本)9798350343557
Medical imaging techniques are frequently used for tumor detection and diagnosis. Segmentation of tumor from medical images is a popular field of study. To this end, various deep neural network based methods are introduced for segmenting tumor regions. Within the scope of this study, we first collected a data set consisting of thorax CT (Computed Tomography) images with two class labels as benign and malignant with the help of chest radiologists and chest disease clinicians. Then, we trained four different deep neural network based segmentation methods, Mask R-CNN, YOLACT, SOLOV2, and U-Net, and compared their accuracies. Finally, we conducted experiments to show which CT image channels are more useful for segmentation. Among the tested methods, it was observed that the YOLACT algorithm returned the best results in classifying tumors and U-Net yielded the best segmentation masks.
The denoising task of low-dose CT images is a highly complex and uncertain inverse problem. Previous studies have primarily relied on convolutional neural network to reduce noise by learning the mapping from LDCT imag...
详细信息
The denoising task of low-dose CT images is a highly complex and uncertain inverse problem. Previous studies have primarily relied on convolutional neural network to reduce noise by learning the mapping from LDCT images to normal dose CT images. However, simply increasing the network depth alone is not an optimal choice due to the limited performance improvement and significant computational cost. In contrast, integrating prior knowledge of images with a model to assist in image reconstruction is a more efficient approach. This study proposes a new framework for denoising LDCT images, named Noise-Optimized Edge Feature Guided Network (NEFGN). The task of NEFGN is to integrate the noise optimization model of adaptive weighted total variation expansion, the edge detection model guided by Gaussian curvature, and image reconstruction into an end -to -end CNN framework. In order to achieve this goal, the noise optimization model is first constructed by learning the parameters in the adaptive weighted total variation regularization model to approximate the noise level in the NDCT image. The edge detection network is constructed using Gaussian curvature, predicting clear edges directly from the noise image. Finally, under the guidance of the noise optimization model and the edge detail model, NEFGN is more capable of suppressing artifact noise, demonstrating good accuracy and robustness, and can restore finer details. Numerous experimental studies demonstrate that the NEFGN denoising framework effectively restores the structure of LDCT images with limited image details and outperforms other methods in terms of performance.
Biometric systems play a crucial role in securely recognizing an individual's identity based on physical and behavioral traits. Among these methods, finger vein recognition stands out due to its unique position be...
详细信息
Biometric systems play a crucial role in securely recognizing an individual's identity based on physical and behavioral traits. Among these methods, finger vein recognition stands out due to its unique position beneath the skin, providing heightened security and individual distinctiveness that cannot be easily manipulated. In our study, we propose a robust biometric recognition system that combines a lightweight architecture with depth-wise separable convolutions and residual blocks, along with a machine-learning algorithm. This system employs two distinct learning strategies: single-instance and multi-instance. Using these strategies demonstrates the benefits of combining largely independent information. Initially, we address the problem of shading of finger vein images by applying the histogram equalization technique to enhance their quality. After that, we extract the features using a MobileNetV2 model that has been fine-tuned for this task. Finally, our system utilizes a support vector machines (SVM) to classify the finger vein features into their classes. Our experiments are conducted on two widely recognized datasets: SDUMLA and FV-USM and the results are promising and show excellent rank-one identification rates with 99.57% and 99.90%, respectively.
Recently, Video Coding for Machines (VCM) has gained more and more attention due to its efforts in machine vision tasks. As a crucial track in VCM, feature compression preserves and transmits critical feature informat...
详细信息
Recently, Video Coding for Machines (VCM) has gained more and more attention due to its efforts in machine vision tasks. As a crucial track in VCM, feature compression preserves and transmits critical feature information for machine vision. Most existing studies employ dimensionality reduction to the raw multi-scale feature before compression. However, feature sparsity is left insufficiently considered in removing redundancy in compressed features. In this letter, we propose a novel framework for image feature compression for machines, where the multi-scale feature is hierarchically transformed into a sparse representation for compression. The multi-scale feature is first fused by convolutional neural networks and the attention mechanism. To introduce sparsity into the fused feature, informative channels are identified by a channel-wise binary mask where activated elements are sampled from the importance distribution of channels learned from feature content. Then, the fused feature is masked to generate a sparse representation for compression. Experiments conducted on two machine tasks show significant improvements in our model over state-of-the-art methods.
Background: Spiking neural Networks (SNNs) hold significant potential in brain simulation and temporal data processing. While recent research has focused on developing neuron models and leveraging temporal dynamics to...
详细信息
Background: Spiking neural Networks (SNNs) hold significant potential in brain simulation and temporal data processing. While recent research has focused on developing neuron models and leveraging temporal dynamics to enhance performance, there is a lack of explicit studies on neuromorphic datasets. This research aims to address this question by exploring temporal information dynamics in SNNs. New Method: To quantify the dynamics of temporal information during training, this study measures the Fisher information in SNNs trained on neuromorphic datasets. The information centroid is calculated to analyze the influence of key factors, such as the parameter k, on temporal information dynamics. Results: Experimental results reveal that the information centroid exhibits two distinct behaviors: stability and fluctuation. This study terms this phenomenon the Stable Information Centroid (SIC), which is closely related to the parameter k. Based on these findings, we propose the Fast Temporal Efficient Training (FTET) algorithm. Comparison with Existing methods: Firstly, the method proposed in this paper does not require the introduction of additional complex training techniques. Secondly, it can reduce the computational load by 30% in the final 50 epochs. However, the drawback is the issue of slow convergence during the early stages of training. Conclusion: This study reveals that the learning processes of SNNs vary across different datasets, providing new insights into the mechanisms of human brain learning. A limitation is the restricted sample size, focusing only on a few datasets and image classification tasks. The code is available at https://***/gtii123/fasttemporal-efficient-training.
Explainable methods for understanding deep neural networks are currently being employed for many visual tasks and provide valuable insights about their decisions. While post-hoc visual explanations offer easily unders...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Explainable methods for understanding deep neural networks are currently being employed for many visual tasks and provide valuable insights about their decisions. While post-hoc visual explanations offer easily understandable human cues behind neural networks' decision-making processes, comparing their outcomes still remains challenging. Furthermore, balancing the performance-explainability trade-off could be a time-consuming process and require a deep domain knowledge. In this regard, we propose a novel auxiliary module, built upon convolutional-based encoders, which acts on the final layers of convolutional neural networks (CNNs) to learn orthogonal feature maps with a more discriminative and explainable power. This module is trained via a disentangle loss which specifically aims to decouple the object from the background in the input image. To quantitatively assess its impact on standard CNNs, and compare the quality of the resulting visual explanations, we employ metrics specifically designed for semantic segmentation tasks. These metrics rely on bounding-box annotations that may accompany image classification (or recognition) datasets, allowing us to compare both ground-truth and predicted regions. Finally, we explore the impact of various self-supervised pre-training strategies, due to their positive influence on vision tasks, and assess their effectiveness on our considered metrics.
Convolutional neural networks and Transformer methods have been widely applied in medical image segmentation and have shown tremendous potential. However, existing methods still face challenges in effectively integrat...
详细信息
Convolutional neural networks and Transformer methods have been widely applied in medical image segmentation and have shown tremendous potential. However, existing methods still face challenges in effectively integrating local semantic information and long-range dependencies, leading to suboptimal performance and reduced efficiency. Moreover, due to complex deformations and low-contrast blurry edges, the recognition capability of small organs is also unsatisfactory. To address these issues, we propose DGCA-Net. First, we design a Dual-axis Generalized Cross Attention module (DGCA) in the encoding phase to effectively integrate long- and short-range semantic relationships. DGCA consists of two continuous attention based on axial features, namely Generalized Channel Attention (GCA) and Generalized Efficient Attention (GEA), which enhance the recognition of large organs with long-range dependencies through axis-based generalized features and more efficient computation. Secondly, we design a boundary-constrained decoder comprising the Inter-scale Boundary Detector (IBD) and the Boundary Attention Guidance (BAG) to better identify small organs with blurry boundaries. The IBD extracts boundary information of foreground objects from multi-scale features, while the BAG leverages enhanced boundary features to guide the fusion of encoder features and decoder contexts, complementing fine spatial and edge details. DGCA-Net achieves SOTA performance four public datasets covering different modalities and segmentation regions (Synapse, FLARE2023, ACDC, and MoNuSeg), demonstrating its superiority, transferability, and strong generalization capability. Our code: ***/zzm3zz/DGCA-Net.
Quadratic time-frequency (TF) methods are commonly used for the analysis, modeling, and classification of time-varying non-stationary electroencephalogram (EEG) signals. Commonly employed TF methods suffer from an inh...
详细信息
Quadratic time-frequency (TF) methods are commonly used for the analysis, modeling, and classification of time-varying non-stationary electroencephalogram (EEG) signals. Commonly employed TF methods suffer from an inherent tradeoff between cross-term suppression and preservation of auto-terms. In this paper, we propose a new convolutional neural network (CNN) based approach to enhancing TF images. The proposed method trains a CNN using the Wigner-Ville distribution as the input image and the ideal time-frequency distribution with the total concentration of signal energy along the IF curves as the output image. The results show significant improvement compared to the other state-of-the-art TF enhancement methods. The codes for reproducing the results can be accessed on the GitHub via https://***/nabeelalikhan1/CNN-based-TF-image-enhancement .
暂无评论