检索结果-内蒙古大学图书馆

Elastic temporal alignment for few-shot action recognition

IET COMPUTER VISION 2023年第1期17卷 39-50页

作者： Pan, Fei Xu, Chunlei Zhang, Hongjie Guo, Jie Guo, Yanwen Nanjing Univ Natl Key Lab Novel Software Technol 163 Xianlin Ave Nanjing Peoples R China

Few-shot action recognition aims to learn a classification model with good generalisation ability when trained with only a few labelled videos. However, it is difficult to learn discriminative feature representations for videos in such a setting. The Elastic Temporal Alignment (ETA) for few-shot action recognition is proposed. First, a convolutional neural network is employed to extract feature representations of video frames sparsely sampled from videos. In order to obtain the similarity of two videos, a temporal alignment estimation function is utilised to estimate the matching score between each pair of frames from the two videos through an elastic alignment mechanism. The analysis shows that when we judge whether two frames from respective videos are matched, multiple adjacent frames in the videos should be considered, so as to embody the temporal information. Thus, before feeding per-frame feature vectors of videos into the temporal alignment estimation function, a temporal message passing function is leveraged to propagate the information of per-frame features in the temporal domain. The method has been evaluated on four action recognition datasets, including Kinetics, Something-Something V2, HMDB51, and UCF101. The experimental results verify the effectiveness of ETA and show its superiority over state-of-the-art methods.

关键词： temporal alignment estimation function feature extraction temporal message passing function video signal processing image representation learning (artificial intelligence) neural nets elastic alignment mechanism per-frame feature vectors message passing multiple adjacent frames action recognition datasets Computer vision and image processing techniques Elastic Temporal Alignment convolutional neural nets per-frame features Elastic temporal alignment respective videos image recognition temporal information Video signal processing temporal domain labelled videos few-shot action recognition aims discriminative feature representations image motion analysis video frames

来源：评论

学校读者我要写书评

暂无评论

A Representation-Enhanced Vibration signal Imaging Method Based on MTF-NMF for Φ-OTDR Recognition

引用

JOURNAL OF LIGHTWAVE TECHNOLOGY 2024年第18期42卷 6395-6401页

作者： Wei, Ziyi Dai, Jingyi Huang, Yi Shen, Wei Hu, Chengyong Pang, Fufei Zhang, Xiaobei Wang, Tingyun Shanghai Univ Shanghai Inst Adv Commun & Data Sci Key Lab Specialty Fiber Opt & Opt Access Networks Shanghai 200444 Peoples R China

The phase signals, combined with time-domain signal processing methods, are often used for recognition with the phase-sensitive optical time-domain reflectometer (Phi-OTDR). Considering the advanced and sophisticated algorithms prevalent in the field of image processing, a vibration signal imaging method is proposed to enhance the adaptability of phase signals for learning by image network. The phase time series is converted to aMarkov Transition Fields (MTF) matrix, from which the based matrix is extracted by Non-negative Matrix Factorization (NMF) and saved as an RGBimage. One-dimensional (1-D) Convolutional neural Network (CNN) and 2-D CNN are applied in the experiment to classify the phase signals and images, respectively. The experimental results show that the training convergence efficiency of 2-D CNN using NMF-MTF images is significantly higher than that of 1-D CNN, demonstrating the effectiveness of converting phase signals into images. In addition, the average recognition accuracy for the four fence events is improved by more than 13% by introducing the NMF algorithm on the MTF matrix.

关键词： Distributed sensing disturbance recognition markov transition fields non-negative matrix factorization Phi-OTDR

来源：评论

学校读者我要写书评

暂无评论

Cross-Domain Deepfake Detection Based on Latent Domain Knowledge Distillation

引用

IEEE signal processing LETTERS 2025年 32卷 896-900页

作者： Wang, Chunpeng Meng, Lingshan Xia, Zhiqiu Ren, Na Ma, Bin Qilu Univ Technol Shandong Acad Sci Shandong Comp Sci Ctr Minist EducKey Lab Comp Power Network & Informat Jinan 250353 Peoples R China Shenyang Inst Engn Coll Informat Shenyang 110136 Peoples R China

The rapid development of deepfake technology poses challenges to face-centered data security. Existing methods primarily focus on how to transfer deepfake detectors from the source domain to the target domain to handle diverse deepfake techniques. In practical application scenarios, it is usually difficult to access the true and false labels of the source domain. In this letter, we introduce a new adaptation framework called Latent Domain Knowledge Distillation (LDKD) for cross-domain deepfake detection. In the proposed framework, we construct a knowledge distillation structure that includes a student network and a teacher network, which are jointly optimized in a coupled manner to facilitate the model's adaptation to the target domain. Furthermore, to improve the quality of pseudo-labels generated by the teacher network, we propose a Fourier Latent Domain Generation Module (FLGM) and a stochastic Complementary Mask Module (SCMM). The former is used to generate latent domains to bridge domain differences at the image level, while the latter is employed to mine richer contextual cues for the model. Extensive cross-domain experimental results demonstrate that our method achieves state-of-the-art performance, and the model analysis proves the effectiveness of our key components.

关键词： Cross-domain deepfake detection Fourier latent domain generation knowledge distillation knowledge distillation stochastic complementary mask stochastic complementary mask stochastic complementary mask

来源：评论

学校读者我要写书评

暂无评论

1D-convolutional neural network approach and feature extraction methods for automatic detection of schizophrenia

引用

signal image AND VIDEO processing 2023年第5期17卷 2627-2636页

作者： Goker, Hanife Kutahya Dumlupinar Univ Fac Simav Technol Dept Elect Elect Engn TR-43500 Kutahya Turkiye

Schizophrenia is a complex psychiatric disorder characterized by delusions, hallucinations, disorganized speech, mood disturbances, and abnormal behavior. Early diagnosis of schizophrenia depends on the manifestation of the disorder, its symptoms are complex, heterogeneous and cannot be clearly separated from other neurological categories. Therefore, its early diagnosis is quite difficult. An objective, effective and simple diagnostic model and procedure are essential for diagnosing schizophrenia. Electroencephalography (EEG)-based models are a strong candidate to overcome these limits. In this study, we proposed an EEG-based solution for the diagnosis of schizophrenia using 1D-convolutional neural network deep learning approach and multitaper method. Firstly, the raw EEG signals were segmented and denoised using multiscale principal component analysis. Then, three different feature sets were extracted using leading feature extraction methods such as periodogram, welch, and multitaper. The performance of each feature extraction method was compared. Finally, classification performance of support vector machine, decision trees, k-nearest neighbors, and 1D-convolutional neural network algorithms were tested according to model evaluation criteria. The highest performance was obtained with the multitaper and 1D-convolutional neural network approach, and the highest accuracy was 98.76%. The results of the model were found to be 0.991 sensitivity, 0.984 precision, 0.983 specificity, 0.975 Matthews correlation coefficient, 0.987 f1-score, and 0.975 kappa statistic. This study presents the multitaper and 1D-convolutional neural network approach framework for the first time in the diagnosis of schizophrenia. Moreover, this study achieved satisfactorily high classification performance for the diagnosis of schizophrenia compared to methods in the relevant literature.

关键词： signal processing Deep learning 1D-convolutional neural network Feature extraction EEG Schizophrenia

来源：评论

学校读者我要写书评

暂无评论

Reliable Automated ECG Arrhythmia Classification Using Reinforced VGG-27 neural Network Framework

引用

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND signal processing 2025年第1期39卷 163-176页

作者： Thite, Trupti G. Jagtap, Sonal K. GH Raisoni Coll Engn & Management Dept Elect & Telecommun Engn Pune Maharashtra India Trinity Acad Engn Dept Elect & Telecommun Pune Maharashtra India STESs Smt Kashibai Navale Coll Engn Dept Elect & Telecommun Engn Pune Maharashtra India

Automated categorization of electrocardiogram (ECG) waveforms using deep learning (DL) methods has garnered considerable attention in recent research. However, prevalent DL networks encounter challenges including overfitting, class imbalance, limitations in deeper network training, and high computational demands. To address these issues, this study proposes an Automated ECG Arrhythmia Classification framework employing the Reinforced Visual Geometry Group-27 (REF-VGG-27). Initially, the framework encompasses preprocessing steps such as denoising, R-peak identification, data balancing, and cross-validation. For automatic feature extraction and classification, two DL architectures are suggested: a novel hybrid model combining 2D convolutional neural network (2DCNN) with VGG-16, featuring a deep architecture for extracting morphological characteristics, frequency features related to heart rate variability (HRV), and statistical attributes crucial for identifying atrial fibrillation (AF). Subsequently, to classify arrhythmia patterns, the VGG-16 Model is employed. Utilizing publicly available ECG image datasets, the proposed model achieved remarkable accuracy benchmarks: 99.61% accuracy, precision of 99.61%, and recall of 99.48%. Comparative analysis with existing approaches substantiates the efficiency and robustness of our model.

关键词： arrhythmia detection convolutional neural network deep learning electrocardiogram feature extraction and classification

来源：评论

学校读者我要写书评

暂无评论

Advancements in image Feature-Based Classification of Motor imagery EEG Data: A Comprehensive Review

引用

TRAITEMENT DU signal 2023年第5期40卷 1857-1868页

作者： Yilmaz, Cagatay Murat Yilmaz, Bahar Hatipoglu Karadeniz Tech Univ Dept Software Engn TR-61080 Trabzon Turkiye Karadeniz Tech Univ Dept Comp Engn TR-61080 Trabzon Turkiye

Non-invasive acquisition and analysis of human brain signals play a crucial role in the development of brain-computer interfaces, enabling their widespread applicability in daily life. Motor imagery has emerged as a prominent technique for the advancement of such interfaces. While initial machine and deep learning studies have shown promising results in the context of motor imagery, several challenges remain to be addressed prior to their extensive adoption. Deep learning, renowned for its automated feature extraction and classification capabilities, has been successfully employed in various domains. Notably, recent research efforts have focused on processing and classifying motor imagery EEG signals using two-dimensional data formats, yielding noteworthy advancements. Although existing literature encompasses reviews primarily centered on machine learning or deep learning techniques, this paper uniquely emphasizes the review of methods for constructing two-dimensional image features, marking the first comprehensive exploration of this subject. In this study, we present an overview of datasets, survey a range of signal-to-image conversion methods, and discuss classification approaches. Furthermore, we comprehensively examine the current challenges and outline future directions for this research domain.

关键词： motor imagery brain -computer interface signal -to -image conversion short -time Fourier transform deep learning convolutional neural networks reviews

来源：评论

学校读者我要写书评

暂无评论

Improved Step-Size Schedules for Proximal Noisy Gradient methods

引用

IEEE TRANSACTIONS ON signal processing 2023年 71卷 189-201页

作者： Khirirat, Sarit Wang, Xiaoyu Magnusson, Sindri Johansson, Mikael Royal Inst Technol KTH Div Decis & Control Syst S-11428 Stockholm Sweden Stockholm Univ Dept Comp & Syst Sci S-11419 Stockholm Sweden

Noisy gradient algorithms have emerged as one of the most popular algorithms for distributed optimization with massive data. Choosing proper step-size schedules is an important task to tune in the algorithms for good performance. For the algorithms to attain fast convergence and high accuracy, it is intuitive to use large step-sizes in the initial iterations when the gradient noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient descent. However, similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient algorithms. Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the convergence performance of many algorithms. Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. We exemplify our results on stochastic compression algorithms. Our experiments validate fast convergence of these algorithms with the step decay schedules.

关键词： Gradient methods compression algorithms convex functions machine learning neural networks

来源：评论

学校读者我要写书评

暂无评论

Age Estimation from Speech Using Tuned CNN Model on Edge Devices

引用

JOURNAL OF signal processing SYSTEMS FOR signal image AND VIDEO TECHNOLOGY 2024年第10期96卷 569-585页

作者： Durgam, Laxmi Kantham Jatoth, Ravi Kumar NIT Warangal Dept Elect & Commun Engn Hanamkonda 506004 Telangana India

The speaker's emotions, age, and gender have all been ascertained through imaginative investigation. This information can be applied to communications, common applications like biometric identification and human-machine interactions. The Edge Impulse framework employs a tiny model that has been trained to identify the speaker's age based on speech attributes. As a result, a speaker's age can be inferred from their voice. With the help of an external microphone connected to the Jetson Nano and the MP34DT05 digital microphone on the Arduino Nano BLE 33 device. It is possible to record and determine a person's age from their speech in real-time applications. Making an effective human-machine interface for practical applications is speech recognition's fundamental goal. The Arduino Nano BLE 33 has an integrated RGB LED that enables it to determine a speaker's age and determine if they are a child or an adult. A red led will be used to signify a child speaker, while a blue led will be used to identify an adult speaker. The proposed tuned deep convolution neural networks outperform the more commonly used convolutional neural networks in tests compared to training *** proposed tuned 1D CNN with MFCC speech features are outperforming compared to existing traditional methods. The Nvidia Jetson Nano and Nano BLE 33 Microcontrollers are ideal for applications needing speaker age detection because of their low power consumption, ease of use, small size, and excellent computational performance.

关键词： Speech recognition Age identification Tiny ML Edge impulse Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

An Asynchronous Spiking neural Membrane System for Edge Detection

引用

INTERNATIONAL JOURNAL OF neural SYSTEMS 2024年第6期34卷 2450023-2450023页

作者： Zhang, Luping Xu, Fei Neri, Ferrante East China Univ Technol Jiangxi Engn Technol Res Ctr Nucl Geosci Data Sci Sch Informat Engn Jiangxi Engn Lab Radioact Geosci & Big Data Techno Nanchang 330013 Peoples R China Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automat Key Lab Image Informat Proc & Intelligent Control Educ Minist China Wuhan 430074 Peoples R China Univ Surrey Sch Comp Sci & Elect Engn NICE Res Grp Guildford GU2 7XH Surrey England

Spiking neural membrane systems (SN P systems) are a class of bio-inspired models inspired by the activities and connectivity of neurons. Extensive studies have been made on SN P systems with synchronization-based communication, while further efforts are needed for the systems with rhythm-based communication. In this work, we design an asynchronous SN P system with resonant connections where all the enabled neurons in the same group connected by resonant connections should instantly produce spikes with the same rhythm. In the designed system, each of the three modules implements one type of the three operations associated with the edge detection of digital images, and they collaborate each other through the resonant connections. An algorithm called EDSNP for edge detection is proposed to simulate the working of the designed asynchronous SN P system. A quantitative analysis of EDSNP and the related methods for edge detection had been conducted to evaluate the performance of EDSNP. The performance of the EDSNP in processing the testing images is superior to the compared methods, based on the quantitative metrics of accuracy, error rate, mean square error, peak signal-to-noise ratio and true positive rate. The results indicate the potential of the temporal firing and the proper neuronal connections in the SN P system to achieve good performance in edge detection.

关键词： Bio-inspired computing membrane computing spiking neural P system communication network image processing

来源：评论

学校读者我要写书评

暂无评论

INVERTIBLE MOSAIC image HIDING NETWORK FOR VERY LARGE CAPACITY image STEGANOGRAPHY 49

INVERTIBLE MOSAIC IMAGE HIDING NETWORK FOR VERY LARGE CAPACI...

引用

49th IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Chen, Zihan Liu, Tianrui Huang, Jun-Jie Zhao, Wentao Bi, Xing Wang, Meng Natl Univ Def Technol Coll Comp Sci & Technol Changsha Peoples R China Natl Univ Def Technol Coll Syst Engn Changsha Peoples R China Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei Peoples R China

ISBN: (纸本)9798350344868;9798350344851

The existing image steganography methods either sequentially conceal secret images or conceal a concatenation of multiple images. In such ways, the interference of information among multiple images will become increasingly severe when the number of secret images becomes larger, thus restrict the development of very large capacity image steganography. In this paper, we propose an Invertible Mosaic image Hiding Network (InvMIHNet) which realizes very large capacity image steganography with high quality by concealing a single mosaic secret image. InvMIHNet consists of an Invertible image Rescaling (IIR) module and an Invertible image Hiding (IIH) module. The IIR module works for downscaling the single mosaic secret image form by spatially splicing the multiple secret images, and the IIH module then conceal this mosaic image under the cover image. The proposed InvMIHNet successfully conceal and reveal up to 16 secret images with a small number of parameters and memory consumption. Extensive experiments on imageNet-1K, COCO and DIV2K show InvMIHNet outperforms state-of-the-art methods in terms of both the imperceptibility of stego image, recover accuracy of secret image and security against steganlysis methods. The code is available at https://***/Brittany-Chen/InvMIHNet.

关键词： image steganography image rescaling Invertible neural Networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：