检索结果-内蒙古大学图书馆

Performance Analysis of neural Network Based Human Palmprint and Hand Gesture Recognition Techniques

TRAITEMENT DU signal 2025年第1期42卷 541-555页

作者： Kaliaperumal, Ramasamy Anandan, Srinivasan SASTRA Deemed Univ Dept Elect & Commun Engn SRC Kumbakonam 612001 India

Hand gesture recognition is so versatile and easy to use, it is among the best methods for facilitating human-computer interaction. High recognition performance, user-independent interfaces should be the goal of real-time manual recognition systems. Convolutional neural networks (CNNs) have demonstrated impressive recognition rates in image classification tasks in recent times. Thus, we employ multi-scale deep convolutional neural networks and the Entropy Controlled Tiger Optimization (ENcTO) classification method, which is motivated by CNN performance, to recognize and classify human palms and palmprints. Finger segmentation, feature extraction, preprocessing of hand regions of interest using mask images, and finger recognition using a multi-scale deep CNN classifier are all included in the processing flow. A mask picture is used to preprocess the whole image's hand region. To boost the contrast of every pixel in the image, the adaptive histogram equalization approach is used. Next, features are extracted from the preprocessed images using SIFT (Scale Invariant Feature Transform). The gesture recognition pipeline first separates the fingers in the mask picture, then segments the hand's region of interest and normalizes the segmented finger images. Hand images with segmented finger regions are input into a multi- scale deep CNN that classifies the images into several categories using the Entropy Controlled Tiger Optimization (ENcTO) classification method. This research presents a high-performance state-of-the-art approach for gesture detection and identification combining multi-scale deep CNN and Entropy Controlled Tiger Optimization (ENcTO) classification algorithm and augmentation techniques with a recognition rate of 96.72%, Athe results demonstrate the superiority of the proposed method over alternative approaches. These results demonstrate how well gray wolf optimization and deep learning work together to increase the precision of human identification from palm

关键词： human palm prints and hand gesture invariant feature transformation Aadaptive histogram equalization

来源：评论

学校读者我要写书评

暂无评论

DEEP neural NETWORK MODELS TRAINED WITH A FIXED RANDOM CLASSIFIER TRANSFER BETTER ACROSS DOMAINS 49

DEEP NEURAL NETWORK MODELS TRAINED WITH A FIXED RANDOM CLASS...

引用

49th IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Ali, Hafiz Tiomoko Michieli, Umberto Moon, Ji Joong Kim, Daehyun Ozay, Mete Samsung Res UK London England Samsung Res Korea Seoul South Korea

ISBN: (纸本)9798350344868;9798350344851

The recently discovered neural collapse (NC) phenomenon states that the last-layer weights of Deep neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains.

关键词： Transfer learning neural Collapse Random Projections Deep Learning Models

来源：评论

学校读者我要写书评

暂无评论

Joint image dehazing and denoising for single haze image enhancement

引用

ELECTRONICS LETTERS 2024年第22期60卷

作者： Yu, Yuting Bosheng, Ding Huang, Shizhao Cheng, Ming Wang, Enliang Tu, Defeng Tan, Linglong Anhui Xinhua Univ Coll Elect Engn Hefei Peoples R China Anhui AnRui Electromech CO LTD Hefei Anhui Peoples R China

Outdoor haze images are typically degraded by noise due to the external environment and imaging equipment. The existing haze image enhancement methods ignore the interrelation between haze and noise, which cannot suppress the noise and remove the haze simultaneously. To address these intractable problems, a dual-branch architecture that combines dehazing and denoising is proposed here to restore clear images. First, dark channel prior and unsupervised networks in the image dehazing branch to remove the image blur are adopted. Then, the image denoising branch removes the image noise in parallel by constructing a mean/extreme sampler and a self-supervised network. Finally, a convolutional neural network fusion strategy is presented to fuse output images from the aforementioned two branches to generate the final qualified results. Extensive experiments reveal that the proposed haze image enhancement method outperforms other state-of-the-art methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM).

关键词： image enhancement image processing intelligent networks

来源：评论

学校读者我要写书评

暂无评论

Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional neural Networks

引用

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE processing 2024年 32卷 113-123页

作者： Miotello, Federico Pezzoli, Mirco Comanducci, Luca Antonacci, Fabio Sarti, Augusto Politecn Milan Dipartimento Elettron Informaz & Bioingn I-20133 Milan Italy

In this manuscript, we propose a novel method to perform audio inpainting, i.e., the restoration of audio signals presenting multiple missing parts. Audio inpainting can be interpreted in the context of inverse problems as the task of reconstructing an audio signal from its corrupted observation. For this reason, our method is based on a deep prior approach, a recently proposed technique that proved to be effective in the solution of many inverse problems, among which image inpainting. Deep prior allows one to consider the structure of a neural network as an implicit prior and to adopt it as a regularizer. Differently from the classical deep learning paradigm, deep prior performs a single-element training and thus it can be applied to corrupted audio signals independently from the available training data sets. In the context of audio inpainting, a network presenting relevant audio priors will possibly generate a restored version of an audio signal, only provided with its corrupted observation. Our method exploits a time-frequency representation of audio signals and makes use of a multi-resolution convolutional autoencoder, that has been enhanced to perform the harmonic convolution operation. Results show that the proposed technique is able to provide a coherent and meaningful reconstruction of the corrupted audio. It is also able to outperform the methods considered for comparison, in its domain of application.

关键词： Time-frequency analysis Task analysis Deep learning Inverse problems Convolution Spectrogram image restoration Audio inpainting deep prior deep learning

来源：评论

学校读者我要写书评

暂无评论

Hybrid ViT-CNN Network for Fine-Grained image Classification

引用

IEEE signal processing LETTERS 2024年 31卷 1109-1113页

作者： Shao, Ran Bi, Xiao-Jun Chen, Zheng Harbin Engn Univ Coll Informat & Commun Engn Harbin 150000 Peoples R China Harbin Vocat & Tech Coll Coll Elect & Informat Engn Harbin 150000 Peoples R China Minzu Univ China Key Lab Ethn Language Intelligent Anal & Secur Gov Beijing 100081 Peoples R China Minzu Univ China Dept Informat Engn Beijing 100081 Peoples R China

In recent years, vision transformer (ViT) has achieved remarkable breakthroughs in fine-grained visual classification (FGVC) because of its self-attention mechanism that excels in extracting distinctive features from different pixels. However, pure ViT falls short in capturing the crucial multi-scale, local, and low-layer features that hold significance for FGVC. To compensate for these shortcomings, a new hybrid network called HVCNet is designed, which fuses the advantages of ViT and convolutional neural networks (CNN). The three modifications in the original ViT are: 1) using a multi-scale image-to-tokens (MIT) module instead of directly tokenizing the raw input image, thus enabling the network to capture the features at different scales;2) substituting feed-forward network in ViT's encoder with mixed convolution feed-forward (MCF) module, which enhances the capability of the network in capturing the local and multi-scale features;3) designing multi-layer feature selection (MFS) module to address the issue of deep-layer tokens in ViT to avoid ignoring the local and low-layer features. The experiment results indicate that the proposed method surpasses state-of-the-art methods on publicly datasets.

关键词： Convolutional neural networks fine-grained visual classification multi-scale feature vision transformer

来源：评论

学校读者我要写书评

暂无评论

Automated Extraction and Classification of Fossil Whorl Patterns Using Deep Learning: A Novel Approach to Paleontological image Analysis

引用

TRAITEMENT DU signal 2025年第1期42卷 67-77页

作者： Bi, Ruifang Shi, Hualin Shanxi Inst Technol Dept Art & Design Sci Yangquan 045000 Peoples R China Huayuan Int Land Port Grp Co Ltd Enterprise Technol Ctr Taiyuan 030000 Peoples R China Key Lab Intelligent Logist Shanxi Prov Taiyuan 030000 Peoples R China

The rapid advancement of Artificial Intelligence (AI) has led to the displacement of traditional fossil pattern recognition methods in paleontological studies, particularly through the application of image processing technologies. This study focuses on the fossilized whorls of ancient organisms from the Yangquan region, employing state-of-the-art AI-driven techniques to identify and extract distinctive features from these fossils for automated pattern recognition. Existing paleontological databases of whorl fossils were reviewed, and a deep learning model was developed using convolutional neural networks (CNNs) to facilitate the extraction and classification of fossil whorl patterns. The model incorporates multi-level feature abstraction through various image preprocessing techniques to enhance both the accuracy and robustness of the recognition process. A transfer learning strategy based on CNNs was introduced, allowing for rapid adaptation to new fossil patterns despite limited sample sizes. Furthermore, an improved feature extraction algorithm leveraging Scale-Invariant Feature Transform (SIFT) for feature point matching was implemented, significantly accelerating the speed and accuracy of the feature extraction process. In the experimental phase, over 300 images of fossilized whorls were utilized for model training and validation, achieving a recognition accuracy exceeding 95%, Awhich represents an improvement of nearly 30% over traditional manual methods. The generalization ability of the model was also evaluated, confirming its stability and reliability across diverse fossil data sets. This research underscores the transformative potential of AI-based image processing technologies in the extraction and analysis of paleontological patterns, offering new tools for the study of Yangquan fossils while also contributing to broader applications in cultural heritage preservation and scientific education. This work provides a solid foundation for the further integrati

关键词： paleontological whorl fossils AI image processing CNN feature point extraction pattern recognition

来源：评论

学校读者我要写书评

暂无评论

A Study on Deep Learning-Based image Target Identification Techniques 5th

A Study on Deep Learning-Based Image Target Identification T...

引用

5th International Conference on 3D Imaging Technologies—Multidimensional signal processing and Deep Learning, 3DIT-MSP and DL 2023

作者： Wang, Yining College of Electronic Information and Optical Engineering Taiyuan University of Technology Shanxi Taiyuan China

ISBN: (纸本)9789819751808

This study aims to explore deep learning-based image target recognition methods to improve the performance of target detection and classification in the field of computer vision. The experiments use satellite-acquired remote sensing images of moving objects, including four target categories such as buses, cars, bicycles and ships. In model training, we performed parameter optimisation by stochastic gradient descent algorithm and set the training period to 12. To validate the model’s effectiveness, we compare the improved R-CNN (Region-based Convolutional neural Network), traditional CNN (Convolutional neural Network) and Faster-RCNN (Faster Region-based Convolutional neural Network) models, and generate confusion matrices to evaluate their performance. The experimental results show that the improved model exhibits excellent performance in recognition of different types of target objects, especially for larger targets. This research provides reliable experimental support for the application of deep learning in the field of image target recognition and is expected to promote further development and innovation in this field. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Watermark Removal Attack Against Text-to-image Generative Model Watermarking

引用

IEEE signal processing LETTERS 2025年 32卷 1470-1474页

作者： Yuan, Zihan Li, Li Wang, Zichi Jiang, Jingyuan Zhang, Xinpeng Shanghai Univ Sch Commun & Informat Engn Shanghai 200444 Peoples R China

The artist's style can be quickly imitated by fine-tuning a text-to-image model using artist's artworks, which raises serious copyright concerns. Scholars have proposed many watermarking methods to protect the artists' copyright. To evaluate the security and enhance the performance of existing watermarking, this paper proposes a watermark removal attack for text-to-image generative model watermarking for the first time. This attack aims to invalidate watermarking designed to detect art theft mimicry in text-to-image models. In this method, a watermark recognition network and a watermark removal network are designed. The watermark recognition network identifies whether an artwork contains watermark, and the watermark removal network is used to remove it. Consequently, text-to-image models fine-tuned with watermark-removed artworks can reproduce an artist's style while evading watermark detection. This makes the copyright authentication of artworks ineffective. Experiments show that the proposed attack can effectively remove watermarks, with watermark extraction accuracy dropping below 48.64%. Additionally, the images after watermark removal retain high similarity to the original images, with PSNR exceeding 27.96 and SSIM exceeding 0.92.

关键词： Watermarking Text to image Accuracy image recognition Training Data mining Discrete wavelet transforms neural networks Diffusion models Tuning Watermark removal attack text-to-image model model watermarking deep learning

来源：评论

学校读者我要写书评

暂无评论

REAL-TIME MONOCULAR DEPTH ESTIMATION ON EMBEDDED SYSTEMS 31

REAL-TIME MONOCULAR DEPTH ESTIMATION ON EMBEDDED SYSTEMS

引用

2024 International Conference on image processing

作者： Feng, Cheng Zhang, Congxuan Chen, Zhen Hu, Weiming Ge, Liyue Beihang Univ Sch Instrumentat & Optoelect Engn Beijing Peoples R China Nanchang Hangkong Univ Sch Measuring & Opt Engn Nanchang Jiangxi Peoples R China Chinese Acad Sci NLPR Inst Automat Beijing Peoples R China

ISBN: (纸本)9798350349405;9798350349399

Depth sensing is of paramount importance for unmanned aerial and autonomous vehicles. Nonetheless, contemporary monocular depth estimation methods employing complex deep neural networks within Convolutional neural Networks are inadequately expedient for real-time inference on embedded platforms. This paper endeavors to surmount this challenge by proposing two efficient and lightweight architectures, RT-MonoDepth and RT-MonoDepth-S, thereby mitigating computational complexity and latency. Our methodologies not only attain accuracy comparable to prior depth estimation methods but also yield faster inference speeds. Specifically, RT-MonoDepth and RT-MonoDepth-S achieve frame rates of 18.4&30.5 FPS on NVIDIA Jetson Nano and 253.0&364.1 FPS on Jetson AGX Orin, utilizing a single RGB image of resolution 640x192. The experimental results underscore the superior accuracy and faster inference speed of our methods in comparison to existing fast monocular depth estimation methodologies on the KITTI dataset.

关键词： Deep Learning Convolutional neural Networks Embedded Systems Real-time Depth Estimation

来源：评论

学校读者我要写书评

暂无评论

Medical image fusion by adaptive Gaussian PCNN and improved Roberts operator

引用

signal image AND VIDEO processing 2023年第7期17卷 3565-3573页

作者： Vajpayee, Pravesh Panigrahy, Chinmaya Kumar, Anil PDPM Indian Inst Informat Technol Design & Mfg Elect & Commun Engn Jabalpur 482005 Madhya Pradesh India Thapar Inst Engn & Technol Comp Sci & Engn Patiala 147004 Punjab India

Medical image fusion analyzes multiple images obtained by the same/different medical modalities and constructs a robust image that is more useful for physicians by merging the complementary details contained in these images. Recently, pulse coupled neural network (PCNN) models constructed efficient image fusion algorithms, but at the expense of many parameters. Here, a novel adaptive Gaussian PCNN (AGPCNN) model is proposed that constitutes few parameters, adopts adaptive linking strength, and employs a Gaussian filter to effectively combine the surrounding neurons. In this paper, a new medical image fusion algorithm is introduced in the non-subsampled Shearlet transform domain that applies the novel AGPCNN to combine the high-pass sub-bands, whereas a new improved Roberts operator-based mechanism is incorporated to merge the low-pass sub-bands. The power of the proposed method is demonstrated using the experimental results of seven latest methods with twelve objective metrics on ten diverse medical image pairs that include the image pairs of an AIDS dementia complex patient.

关键词： Pulse coupled neural network Roberts operator Medical image fusion Spatial frequency

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：