Hand gesture recognition is so versatile and easy to use, it is among the best methods for facilitating human-computer interaction. High recognition performance, user-independent interfaces should be the goal of real-...
详细信息
Hand gesture recognition is so versatile and easy to use, it is among the best methods for facilitating human-computer interaction. High recognition performance, user-independent interfaces should be the goal of real-time manual recognition systems. Convolutional neural networks (CNNs) have demonstrated impressive recognition rates in image classification tasks in recent times. Thus, we employ multi-scale deep convolutional neural networks and the Entropy Controlled Tiger Optimization (ENcTO) classification method, which is motivated by CNN performance, to recognize and classify human palms and palmprints. Finger segmentation, feature extraction, preprocessing of hand regions of interest using mask images, and finger recognition using a multi-scale deep CNN classifier are all included in the processing flow. A mask picture is used to preprocess the whole image's hand region. To boost the contrast of every pixel in the image, the adaptive histogram equalization approach is used. Next, features are extracted from the preprocessed images using SIFT (Scale Invariant Feature Transform). The gesture recognition pipeline first separates the fingers in the mask picture, then segments the hand's region of interest and normalizes the segmented finger images. Hand images with segmented finger regions are input into a multi- scale deep CNN that classifies the images into several categories using the Entropy Controlled Tiger Optimization (ENcTO) classification method. This research presents a high-performance state-of-the-art approach for gesture detection and identification combining multi-scale deep CNN and Entropy Controlled Tiger Optimization (ENcTO) classification algorithm and augmentation techniques with a recognition rate of 96.72%, Athe results demonstrate the superiority of the proposed method over alternative approaches. These results demonstrate how well gray wolf optimization and deep learning work together to increase the precision of human identification from palm
The recently discovered neural collapse (NC) phenomenon states that the last-layer weights of Deep neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The recently discovered neural collapse (NC) phenomenon states that the last-layer weights of Deep neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains.
Outdoor haze images are typically degraded by noise due to the external environment and imaging equipment. The existing haze image enhancement methods ignore the interrelation between haze and noise, which cannot supp...
详细信息
Outdoor haze images are typically degraded by noise due to the external environment and imaging equipment. The existing haze image enhancement methods ignore the interrelation between haze and noise, which cannot suppress the noise and remove the haze simultaneously. To address these intractable problems, a dual-branch architecture that combines dehazing and denoising is proposed here to restore clear images. First, dark channel prior and unsupervised networks in the image dehazing branch to remove the image blur are adopted. Then, the image denoising branch removes the image noise in parallel by constructing a mean/extreme sampler and a self-supervised network. Finally, a convolutional neural network fusion strategy is presented to fuse output images from the aforementioned two branches to generate the final qualified results. Extensive experiments reveal that the proposed haze image enhancement method outperforms other state-of-the-art methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM).
In this manuscript, we propose a novel method to perform audio inpainting, i.e., the restoration of audio signals presenting multiple missing parts. Audio inpainting can be interpreted in the context of inverse proble...
详细信息
In this manuscript, we propose a novel method to perform audio inpainting, i.e., the restoration of audio signals presenting multiple missing parts. Audio inpainting can be interpreted in the context of inverse problems as the task of reconstructing an audio signal from its corrupted observation. For this reason, our method is based on a deep prior approach, a recently proposed technique that proved to be effective in the solution of many inverse problems, among which image inpainting. Deep prior allows one to consider the structure of a neural network as an implicit prior and to adopt it as a regularizer. Differently from the classical deep learning paradigm, deep prior performs a single-element training and thus it can be applied to corrupted audio signals independently from the available training data sets. In the context of audio inpainting, a network presenting relevant audio priors will possibly generate a restored version of an audio signal, only provided with its corrupted observation. Our method exploits a time-frequency representation of audio signals and makes use of a multi-resolution convolutional autoencoder, that has been enhanced to perform the harmonic convolution operation. Results show that the proposed technique is able to provide a coherent and meaningful reconstruction of the corrupted audio. It is also able to outperform the methods considered for comparison, in its domain of application.
In recent years, vision transformer (ViT) has achieved remarkable breakthroughs in fine-grained visual classification (FGVC) because of its self-attention mechanism that excels in extracting distinctive features from ...
详细信息
In recent years, vision transformer (ViT) has achieved remarkable breakthroughs in fine-grained visual classification (FGVC) because of its self-attention mechanism that excels in extracting distinctive features from different pixels. However, pure ViT falls short in capturing the crucial multi-scale, local, and low-layer features that hold significance for FGVC. To compensate for these shortcomings, a new hybrid network called HVCNet is designed, which fuses the advantages of ViT and convolutional neural networks (CNN). The three modifications in the original ViT are: 1) using a multi-scale image-to-tokens (MIT) module instead of directly tokenizing the raw input image, thus enabling the network to capture the features at different scales;2) substituting feed-forward network in ViT's encoder with mixed convolution feed-forward (MCF) module, which enhances the capability of the network in capturing the local and multi-scale features;3) designing multi-layer feature selection (MFS) module to address the issue of deep-layer tokens in ViT to avoid ignoring the local and low-layer features. The experiment results indicate that the proposed method surpasses state-of-the-art methods on publicly datasets.
The rapid advancement of Artificial Intelligence (AI) has led to the displacement of traditional fossil pattern recognition methods in paleontological studies, particularly through the application of imageprocessing ...
详细信息
The rapid advancement of Artificial Intelligence (AI) has led to the displacement of traditional fossil pattern recognition methods in paleontological studies, particularly through the application of imageprocessing technologies. This study focuses on the fossilized whorls of ancient organisms from the Yangquan region, employing state-of-the-art AI-driven techniques to identify and extract distinctive features from these fossils for automated pattern recognition. Existing paleontological databases of whorl fossils were reviewed, and a deep learning model was developed using convolutional neural networks (CNNs) to facilitate the extraction and classification of fossil whorl patterns. The model incorporates multi-level feature abstraction through various image preprocessing techniques to enhance both the accuracy and robustness of the recognition process. A transfer learning strategy based on CNNs was introduced, allowing for rapid adaptation to new fossil patterns despite limited sample sizes. Furthermore, an improved feature extraction algorithm leveraging Scale-Invariant Feature Transform (SIFT) for feature point matching was implemented, significantly accelerating the speed and accuracy of the feature extraction process. In the experimental phase, over 300 images of fossilized whorls were utilized for model training and validation, achieving a recognition accuracy exceeding 95%, Awhich represents an improvement of nearly 30% over traditional manual methods. The generalization ability of the model was also evaluated, confirming its stability and reliability across diverse fossil data sets. This research underscores the transformative potential of AI-based imageprocessing technologies in the extraction and analysis of paleontological patterns, offering new tools for the study of Yangquan fossils while also contributing to broader applications in cultural heritage preservation and scientific education. This work provides a solid foundation for the further integrati
This study aims to explore deep learning-based image target recognition methods to improve the performance of target detection and classification in the field of computer vision. The experiments use satellite-acquired...
详细信息
The artist's style can be quickly imitated by fine-tuning a text-to-image model using artist's artworks, which raises serious copyright concerns. Scholars have proposed many watermarking methods to protect the...
详细信息
The artist's style can be quickly imitated by fine-tuning a text-to-image model using artist's artworks, which raises serious copyright concerns. Scholars have proposed many watermarking methods to protect the artists' copyright. To evaluate the security and enhance the performance of existing watermarking, this paper proposes a watermark removal attack for text-to-image generative model watermarking for the first time. This attack aims to invalidate watermarking designed to detect art theft mimicry in text-to-image models. In this method, a watermark recognition network and a watermark removal network are designed. The watermark recognition network identifies whether an artwork contains watermark, and the watermark removal network is used to remove it. Consequently, text-to-image models fine-tuned with watermark-removed artworks can reproduce an artist's style while evading watermark detection. This makes the copyright authentication of artworks ineffective. Experiments show that the proposed attack can effectively remove watermarks, with watermark extraction accuracy dropping below 48.64%. Additionally, the images after watermark removal retain high similarity to the original images, with PSNR exceeding 27.96 and SSIM exceeding 0.92.
Depth sensing is of paramount importance for unmanned aerial and autonomous vehicles. Nonetheless, contemporary monocular depth estimation methods employing complex deep neural networks within Convolutional neural Net...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Depth sensing is of paramount importance for unmanned aerial and autonomous vehicles. Nonetheless, contemporary monocular depth estimation methods employing complex deep neural networks within Convolutional neural Networks are inadequately expedient for real-time inference on embedded platforms. This paper endeavors to surmount this challenge by proposing two efficient and lightweight architectures, RT-MonoDepth and RT-MonoDepth-S, thereby mitigating computational complexity and latency. Our methodologies not only attain accuracy comparable to prior depth estimation methods but also yield faster inference speeds. Specifically, RT-MonoDepth and RT-MonoDepth-S achieve frame rates of 18.4&30.5 FPS on NVIDIA Jetson Nano and 253.0&364.1 FPS on Jetson AGX Orin, utilizing a single RGB image of resolution 640x192. The experimental results underscore the superior accuracy and faster inference speed of our methods in comparison to existing fast monocular depth estimation methodologies on the KITTI dataset.
Medical image fusion analyzes multiple images obtained by the same/different medical modalities and constructs a robust image that is more useful for physicians by merging the complementary details contained in these ...
详细信息
Medical image fusion analyzes multiple images obtained by the same/different medical modalities and constructs a robust image that is more useful for physicians by merging the complementary details contained in these images. Recently, pulse coupled neural network (PCNN) models constructed efficient image fusion algorithms, but at the expense of many parameters. Here, a novel adaptive Gaussian PCNN (AGPCNN) model is proposed that constitutes few parameters, adopts adaptive linking strength, and employs a Gaussian filter to effectively combine the surrounding neurons. In this paper, a new medical image fusion algorithm is introduced in the non-subsampled Shearlet transform domain that applies the novel AGPCNN to combine the high-pass sub-bands, whereas a new improved Roberts operator-based mechanism is incorporated to merge the low-pass sub-bands. The power of the proposed method is demonstrated using the experimental results of seven latest methods with twelve objective metrics on ten diverse medical image pairs that include the image pairs of an AIDS dementia complex patient.
暂无评论