Recently, deep learning has made significant progress in image denoising. However, most of existing deep learning based methods are purely data-driven, without considering the knowledge of image denoising. Moreover, t...
详细信息
Recently, deep learning has made significant progress in image denoising. However, most of existing deep learning based methods are purely data-driven, without considering the knowledge of image denoising. Moreover, the parameters of deep denoising network are not explainable. According to these issues, this paper proposes a deep side group sparse coding network for image denoising, named a side group sparse coding (SGSC)-Net. First, SGSC model for image denoising by exploiting prior information regarding the group sparse coefficients consistency is developed. Specifically, the side information is constructed as the weighted combination of intermediate estimations, and updated iteratively. Then, the optimisation solution of SGSC model is turned into a deep neural network using deep unfolding, that is, SGSC-Net. The computational path of SGSC-Net fully follows the iterations of optimisation solution, and consequently the network parameters are interpretable. Furthermore, the design of SGSC-Net employs the insight of SGSC denoising model. The experimental results on well-known datasets quantitatively and qualitatively demonstrate that SGSC-Net is competitive to existing deep unfolding-based and typical deep neural network-based methods.
The stomata on the leaf surface are mainly responsible for the material exchange between the internal and external environments of the plant, a large number of methods have been proposed to automatically measure the d...
详细信息
The stomata on the leaf surface are mainly responsible for the material exchange between the internal and external environments of the plant, a large number of methods have been proposed to automatically measure the distribution position and number of stomatal, but few methods could achieve both stomatal count and open/closed-state judgment. Therefore, this study proposes an automatic detection method for leaf stomatal morphology analysis based on an attention mechanism and deep learning. In order to obtain more stomatal feature information and send it to the network for learning, the proposed method adds a coordinate attention (CA) mechanism to the YOLOV5 backbone part. At the same time, in order to avoid the overfitting of the model during the training process, the authors added the training trick of label smoothing. Finally, the detection ability of the proposed method for stomata is verified on the broad bean leaves stomata dataset. The experimental results show that our method achieves a detection accuracy of 0.934 and an mAP of 0.968. By comparing with other state-of-the-art algorithms, the detection capability of our method has been significantly improved. The generalization of the model is verified on the wheat leaf stomatal dataset. The experimental results show that our method can achieve a detection accuracy of 0.894 and an mAP of 0.907.
The human finger is the essential carrier of biometric features. The finger itself contains multi-modal traits, including fingerprint and finger-vein, which provides convenience and practicality for finger bi-modal fu...
详细信息
The human finger is the essential carrier of biometric features. The finger itself contains multi-modal traits, including fingerprint and finger-vein, which provides convenience and practicality for finger bi-modal fusion recognition. The scale inconsistency and feature space mismatch of finger bi-modal images are important reasons for the fusion effect. The feature extraction method based on graph structure can well solve the problem of feature space mismatch for the finger bi-modalities, and the end-to-end fusion recognition can be realised based on graph convolutional neural networks (GCNs). However, this fusion recognition strategy based on GCNs still has two urgent problems: first, lack of stable and efficient graph fusion method;second, over-smoothing problem of GCNs will lead to the degradation of recognition performance. A novel fusion method is proposed to integrate the graph features of fingerprint (FP) and finger-vein (FV). Furthermore, we analyse the inner relationship between the information transmission process and the over-smoothing problem in GCNs from an optimisation perspective, and point out that the differentiated information between neighbouring nodes decreases as the number of layers increases, which is the direct reason for the over-smoothing problem. A modified deep graph convolution neural network is proposed, aiming to alleviate the over-smoothing problem. The intuition is that the differentiated features of the nodes should be properly preserved to ensure the uniqueness of the nodes themselves. Thus, a constraint term to the objective function of the GCN is added to emphasise the differentiation features of the nodes themselves. The experimental results show that the proposed fusion method can achieve more satisfied performance in finger bi-modal biometric recognition, and the proposed constrained GCN can well alleviate the problem of over-smoothing.
Iris biometrics is one of the fastest-growing technologies, and it has received a lot of attention from the community. Iris-biometric-based human recognition does not require contact with the human body. Iris is a com...
详细信息
Iris biometrics is one of the fastest-growing technologies, and it has received a lot of attention from the community. Iris-biometric-based human recognition does not require contact with the human body. Iris is a combination of crypts, wolflin nodules, concentrated furrows, and pigment spots. The existing methods feed the eye image into deep learning network which result in improper iris features and certainly reduce the accuracy. This research study proposes a model to feed preprocessed accurate iris boundary into Alexnet deep learning neural network-based system for classification. The pupil centre and boundary are initially recorded and identified from the given eye images. The iris boundary and the centre are then compared for the identification using the reference pupil centre and boundary. The iris portion, exclusive feature of the pupil area is segmented using the parameters of multiple left-right point (MLRP) algorithms. The Alexnet deep learning multilayer networks 23, 24, and 25 are replaced according to the segmented iris classes. The remaining Alexnet layers are trained using the square gradient decay factor (GDF) in accordance with the iris features. The trained Alexnet iris is validated using suitable classes. The proposed system classifies the iris with an accuracy of 99.1%. The sensitivity, specificity, and F1-score of the proposed system are 99.68%, 98.36%, and 0.995. The experimental results show that the proposed model has advantages over current models.
Infrared and visible images come from different sensors, and they have their advantages and disadvantages. In order to make the fused images contain as much salience information as possible, a practical fusion method,...
详细信息
Infrared and visible images come from different sensors, and they have their advantages and disadvantages. In order to make the fused images contain as much salience information as possible, a practical fusion method, termed EDAfuse, is proposed in this paper. In EDAfuse, the authors introduce an encoder-decoder with the atrous spatial pyramid network for infrared and visible image fusion. The authors use the encoding network which includes three convolutional neural network (CNN) layers to extract deep features from input images. Then the proposed atrous spatial pyramid model is utilized to get five different scale features. The same scale features from the two original images are fused by our fusion strategy with the attention model and information quantity model. Finally, the decoding network is utilized to reconstruct the fused image. In the training process, the authors introduce a loss function with saliency loss to improve the ability of the model for extracting salient features from original images. In the experiment process, the authors use the average values of seven metrics for 21 fused images to evaluate the proposed method and the other seven existing methods. The results show that our method has four best values and three second-best values. The subjective assessment also demonstrates that the proposed method outperforms the state-of-the-art fusion methods.
In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi-modal fusion framework for HAR. In this framework, a module called impro...
详细信息
In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi-modal fusion framework for HAR. In this framework, a module called improved attention long short-term memory (IAL) is proposed, which combines the improved SE-ResNet50 (ISE-ResNet50) with long short-term memory (LSTM). IAL can extract the video sequence features and the skeleton sequence features of human behaviour. To improve the performance of HAR at a high semantic level, the obtained multi-modal sequence features are fed into a couple hidden Markov model (CHMM), and a multi-modal IAL+CHMM method called IALC is developed based on a probability graph model. To test the performance of the proposed method, experiments are conducted on the HMDB51, UCF101, Kinetics 400k, and ActivityNet datasets, and the obtained recognition accuracy are 86.40%, 97.78%, 81.12%, and 69.36% on the four datasets, respectively. The experimental results show that when the environment is complex, the proposed multi-modal fusion method for HAR based on the IALC can achieve more accurate target recognition results.
Magnetic resonance imaging (MRI) is a high-quality medical image that is used to detect brain tumours in a complex and time-consuming manner. In this study, a back propagation neural network (BPNN) along with the Leve...
详细信息
Magnetic resonance imaging (MRI) is a high-quality medical image that is used to detect brain tumours in a complex and time-consuming manner. In this study, a back propagation neural network (BPNN) along with the Levenberg-Marquardt algorithm (LMA) is proposed to classify MRIs and diagnose brain tumours in a simple and fast process. The BPNN has 10 neurons in the hidden layer, and the default function of the feedforward feeds is mean squared error (MSE). The LMA is optimized as a multivariable adaptive approach and considerably decreases the MSE of the BPNN, so the errors of the tumour classification are diminished. The proposed method follows four steps including preprocessing, skull removal, feature extraction, and classification. The input MRIs are converted to greyscale, resized, and thresholding is performed in the preprocessing step and followed by skull removal. Morphological operations of closing, opening, and dilation are used to segment abnormal areas in the MRIs, and the opening operator recognizes the tumour more accurately. Using statistical analysis and a grey-level co-occurrence matrix (GLCM) 12 features are extracted from the MRIs and used as the inputs of the BPNN. To evaluate the proposed method, 670 normal and 670 abnormal brain MRIs are used as input data, and the classification is performed in 0.494 s. The accuracy, sensitivity, specificity, precision, dice, recall, and MSE are 98.7%, 97.61%, 99.7%, 97.61%, 98.6%, 97.61%, and 0.005, respectively. The approach is accurate and fast for medical images classification.
Self-attention has been successfully leveraged for long-range feature-wise similarities in deep learning super-resolution (SR) methods. However, most of the SR methods only explore the features on the original scale, ...
详细信息
Self-attention has been successfully leveraged for long-range feature-wise similarities in deep learning super-resolution (SR) methods. However, most of the SR methods only explore the features on the original scale, but do not take full advantage of self-similarities features on different scales especially in generative adversarial networks (GAN). In this paper, self-similarity generative adversarial networks (SSGAN) are proposed as the SR framework. The framework establishes the multi-scale feature correlation by adding two modules to the generative network: downscale attention block (DAB) and upscale attention block (UAB). Specifically, DAB is designed to restore the repetitive details from the corresponding downsampled image, which achieves multi-scale feature restoration through self-similarity. And UAB improves the baseline up-sampling operations and captures low-resolution to high-resolution feature mapping, which enhances the cross-scale repetitive features to reconstruct the high-resolution image. Experimental results demonstrate that the proposed SSGAN achieve better visual performance especially in the similar pattern details.
GENERATIVE AI art has exploded onto the scene over the past few months through advanced online platforms like DALL-E2, Midjourney and Stable Diffusion, which enable anyone with access to a smartphone or PC to create h...
详细信息
GENERATIVE AI art has exploded onto the scene over the past few months through advanced online platforms like DALL-E2, Midjourney and Stable Diffusion, which enable anyone with access to a smartphone or PC to create highly polished art by typing in simple text instructions.
3D point cloud segmentation is a non-trivial problem due to its irregular, sparse, and unordered data structure. Existing methods only consider structural relationships of a 3D point and its spatial neighbours. Howeve...
详细信息
3D point cloud segmentation is a non-trivial problem due to its irregular, sparse, and unordered data structure. Existing methods only consider structural relationships of a 3D point and its spatial neighbours. However, the inner-point interactions and long-distance context of a 3D point cloud have been less investigated. In this study, we propose an effective plug-and-play module called the Long Short-Distance Topologically Modelled (LSDTM) Graph Convolutional Neural Network (GCNN) to learn the underlying structure of 3D point clouds. Specifically, we introduce the concept of subgraph to model the contextual-point relationships within a short distance. Then the proposed topology can be reconstructed by recursive aggregation of subgraphs, and importantly, to propagate the contextual scope to a long range. The proposed LSDTM can parse the point cloud data with maximisation of preserving the geometric structure and contextual structure, and the topological graph can be trained end-to-end through a seamlessly integrated GCNN. We provide a case study of triple-layer ternary topology and experimental results on ShapeNetPart, Stanford 3D Indoor Semantics and ScanNet datasets, indicating a significant improvement on the task of 3D point cloud segmentation and validating the effectiveness of our research.
暂无评论