Few-shot action recognition aims to learn a classification model with good generalisation ability when trained with only a few labelled videos. However, it is difficult to learn discriminative feature representations ...
详细信息
Few-shot action recognition aims to learn a classification model with good generalisation ability when trained with only a few labelled videos. However, it is difficult to learn discriminative feature representations for videos in such a setting. The Elastic Temporal Alignment (ETA) for few-shot action recognition is proposed. First, a convolutional neural network is employed to extract feature representations of video frames sparsely sampled from videos. In order to obtain the similarity of two videos, a temporal alignment estimation function is utilised to estimate the matching score between each pair of frames from the two videos through an elastic alignment mechanism. The analysis shows that when we judge whether two frames from respective videos are matched, multiple adjacent frames in the videos should be considered, so as to embody the temporal information. Thus, before feeding per-frame feature vectors of videos into the temporal alignment estimation function, a temporal message passing function is leveraged to propagate the information of per-frame features in the temporal domain. The method has been evaluated on four action recognition datasets, including Kinetics, Something-Something V2, HMDB51, and UCF101. The experimental results verify the effectiveness of ETA and show its superiority over state-of-the-art methods.
The phase signals, combined with time-domain signalprocessingmethods, are often used for recognition with the phase-sensitive optical time-domain reflectometer (Phi-OTDR). Considering the advanced and sophisticated ...
详细信息
The phase signals, combined with time-domain signalprocessingmethods, are often used for recognition with the phase-sensitive optical time-domain reflectometer (Phi-OTDR). Considering the advanced and sophisticated algorithms prevalent in the field of imageprocessing, a vibration signal imaging method is proposed to enhance the adaptability of phase signals for learning by image network. The phase time series is converted to aMarkov Transition Fields (MTF) matrix, from which the based matrix is extracted by Non-negative Matrix Factorization (NMF) and saved as an RGBimage. One-dimensional (1-D) Convolutional neural Network (CNN) and 2-D CNN are applied in the experiment to classify the phase signals and images, respectively. The experimental results show that the training convergence efficiency of 2-D CNN using NMF-MTF images is significantly higher than that of 1-D CNN, demonstrating the effectiveness of converting phase signals into images. In addition, the average recognition accuracy for the four fence events is improved by more than 13% by introducing the NMF algorithm on the MTF matrix.
The rapid development of deepfake technology poses challenges to face-centered data security. Existing methods primarily focus on how to transfer deepfake detectors from the source domain to the target domain to handl...
详细信息
The rapid development of deepfake technology poses challenges to face-centered data security. Existing methods primarily focus on how to transfer deepfake detectors from the source domain to the target domain to handle diverse deepfake techniques. In practical application scenarios, it is usually difficult to access the true and false labels of the source domain. In this letter, we introduce a new adaptation framework called Latent Domain Knowledge Distillation (LDKD) for cross-domain deepfake detection. In the proposed framework, we construct a knowledge distillation structure that includes a student network and a teacher network, which are jointly optimized in a coupled manner to facilitate the model's adaptation to the target domain. Furthermore, to improve the quality of pseudo-labels generated by the teacher network, we propose a Fourier Latent Domain Generation Module (FLGM) and a stochastic Complementary Mask Module (SCMM). The former is used to generate latent domains to bridge domain differences at the image level, while the latter is employed to mine richer contextual cues for the model. Extensive cross-domain experimental results demonstrate that our method achieves state-of-the-art performance, and the model analysis proves the effectiveness of our key components.
Schizophrenia is a complex psychiatric disorder characterized by delusions, hallucinations, disorganized speech, mood disturbances, and abnormal behavior. Early diagnosis of schizophrenia depends on the manifestation ...
详细信息
Schizophrenia is a complex psychiatric disorder characterized by delusions, hallucinations, disorganized speech, mood disturbances, and abnormal behavior. Early diagnosis of schizophrenia depends on the manifestation of the disorder, its symptoms are complex, heterogeneous and cannot be clearly separated from other neurological categories. Therefore, its early diagnosis is quite difficult. An objective, effective and simple diagnostic model and procedure are essential for diagnosing schizophrenia. Electroencephalography (EEG)-based models are a strong candidate to overcome these limits. In this study, we proposed an EEG-based solution for the diagnosis of schizophrenia using 1D-convolutional neural network deep learning approach and multitaper method. Firstly, the raw EEG signals were segmented and denoised using multiscale principal component analysis. Then, three different feature sets were extracted using leading feature extraction methods such as periodogram, welch, and multitaper. The performance of each feature extraction method was compared. Finally, classification performance of support vector machine, decision trees, k-nearest neighbors, and 1D-convolutional neural network algorithms were tested according to model evaluation criteria. The highest performance was obtained with the multitaper and 1D-convolutional neural network approach, and the highest accuracy was 98.76%. The results of the model were found to be 0.991 sensitivity, 0.984 precision, 0.983 specificity, 0.975 Matthews correlation coefficient, 0.987 f1-score, and 0.975 kappa statistic. This study presents the multitaper and 1D-convolutional neural network approach framework for the first time in the diagnosis of schizophrenia. Moreover, this study achieved satisfactorily high classification performance for the diagnosis of schizophrenia compared to methods in the relevant literature.
Automated categorization of electrocardiogram (ECG) waveforms using deep learning (DL) methods has garnered considerable attention in recent research. However, prevalent DL networks encounter challenges including over...
详细信息
Automated categorization of electrocardiogram (ECG) waveforms using deep learning (DL) methods has garnered considerable attention in recent research. However, prevalent DL networks encounter challenges including overfitting, class imbalance, limitations in deeper network training, and high computational demands. To address these issues, this study proposes an Automated ECG Arrhythmia Classification framework employing the Reinforced Visual Geometry Group-27 (REF-VGG-27). Initially, the framework encompasses preprocessing steps such as denoising, R-peak identification, data balancing, and cross-validation. For automatic feature extraction and classification, two DL architectures are suggested: a novel hybrid model combining 2D convolutional neural network (2DCNN) with VGG-16, featuring a deep architecture for extracting morphological characteristics, frequency features related to heart rate variability (HRV), and statistical attributes crucial for identifying atrial fibrillation (AF). Subsequently, to classify arrhythmia patterns, the VGG-16 Model is employed. Utilizing publicly available ECG image datasets, the proposed model achieved remarkable accuracy benchmarks: 99.61% accuracy, precision of 99.61%, and recall of 99.48%. Comparative analysis with existing approaches substantiates the efficiency and robustness of our model.
Non-invasive acquisition and analysis of human brain signals play a crucial role in the development of brain-computer interfaces, enabling their widespread applicability in daily life. Motor imagery has emerged as a p...
详细信息
Non-invasive acquisition and analysis of human brain signals play a crucial role in the development of brain-computer interfaces, enabling their widespread applicability in daily life. Motor imagery has emerged as a prominent technique for the advancement of such interfaces. While initial machine and deep learning studies have shown promising results in the context of motor imagery, several challenges remain to be addressed prior to their extensive adoption. Deep learning, renowned for its automated feature extraction and classification capabilities, has been successfully employed in various domains. Notably, recent research efforts have focused on processing and classifying motor imagery EEG signals using two-dimensional data formats, yielding noteworthy advancements. Although existing literature encompasses reviews primarily centered on machine learning or deep learning techniques, this paper uniquely emphasizes the review of methods for constructing two-dimensional image features, marking the first comprehensive exploration of this subject. In this study, we present an overview of datasets, survey a range of signal-to-image conversion methods, and discuss classification approaches. Furthermore, we comprehensively examine the current challenges and outline future directions for this research domain.
Noisy gradient algorithms have emerged as one of the most popular algorithms for distributed optimization with massive data. Choosing proper step-size schedules is an important task to tune in the algorithms for good ...
详细信息
Noisy gradient algorithms have emerged as one of the most popular algorithms for distributed optimization with massive data. Choosing proper step-size schedules is an important task to tune in the algorithms for good performance. For the algorithms to attain fast convergence and high accuracy, it is intuitive to use large step-sizes in the initial iterations when the gradient noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient descent. However, similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient algorithms. Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the convergence performance of many algorithms. Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. We exemplify our results on stochastic compression algorithms. Our experiments validate fast convergence of these algorithms with the step decay schedules.
The speaker's emotions, age, and gender have all been ascertained through imaginative investigation. This information can be applied to communications, common applications like biometric identification and human-m...
详细信息
The speaker's emotions, age, and gender have all been ascertained through imaginative investigation. This information can be applied to communications, common applications like biometric identification and human-machine interactions. The Edge Impulse framework employs a tiny model that has been trained to identify the speaker's age based on speech attributes. As a result, a speaker's age can be inferred from their voice. With the help of an external microphone connected to the Jetson Nano and the MP34DT05 digital microphone on the Arduino Nano BLE 33 device. It is possible to record and determine a person's age from their speech in real-time applications. Making an effective human-machine interface for practical applications is speech recognition's fundamental goal. The Arduino Nano BLE 33 has an integrated RGB LED that enables it to determine a speaker's age and determine if they are a child or an adult. A red led will be used to signify a child speaker, while a blue led will be used to identify an adult speaker. The proposed tuned deep convolution neural networks outperform the more commonly used convolutional neural networks in tests compared to training *** proposed tuned 1D CNN with MFCC speech features are outperforming compared to existing traditional methods. The Nvidia Jetson Nano and Nano BLE 33 Microcontrollers are ideal for applications needing speaker age detection because of their low power consumption, ease of use, small size, and excellent computational performance.
作者:
Zhang, LupingXu, FeiNeri, FerranteEast China Univ Technol
Jiangxi Engn Technol Res Ctr Nucl Geosci Data Sci Sch Informat Engn Jiangxi Engn Lab Radioact Geosci & Big Data Techno Nanchang 330013 Peoples R China Huazhong Univ Sci & Technol
Sch Artificial Intelligence & Automat Key Lab Image Informat Proc & Intelligent Control Educ Minist China Wuhan 430074 Peoples R China Univ Surrey
Sch Comp Sci & Elect Engn NICE Res Grp Guildford GU2 7XH Surrey England
Spiking neural membrane systems (SN P systems) are a class of bio-inspired models inspired by the activities and connectivity of neurons. Extensive studies have been made on SN P systems with synchronization-based com...
详细信息
Spiking neural membrane systems (SN P systems) are a class of bio-inspired models inspired by the activities and connectivity of neurons. Extensive studies have been made on SN P systems with synchronization-based communication, while further efforts are needed for the systems with rhythm-based communication. In this work, we design an asynchronous SN P system with resonant connections where all the enabled neurons in the same group connected by resonant connections should instantly produce spikes with the same rhythm. In the designed system, each of the three modules implements one type of the three operations associated with the edge detection of digital images, and they collaborate each other through the resonant connections. An algorithm called EDSNP for edge detection is proposed to simulate the working of the designed asynchronous SN P system. A quantitative analysis of EDSNP and the related methods for edge detection had been conducted to evaluate the performance of EDSNP. The performance of the EDSNP in processing the testing images is superior to the compared methods, based on the quantitative metrics of accuracy, error rate, mean square error, peak signal-to-noise ratio and true positive rate. The results indicate the potential of the temporal firing and the proper neuronal connections in the SN P system to achieve good performance in edge detection.
The existing image steganography methods either sequentially conceal secret images or conceal a concatenation of multiple images. In such ways, the interference of information among multiple images will become increas...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The existing image steganography methods either sequentially conceal secret images or conceal a concatenation of multiple images. In such ways, the interference of information among multiple images will become increasingly severe when the number of secret images becomes larger, thus restrict the development of very large capacity image steganography. In this paper, we propose an Invertible Mosaic image Hiding Network (InvMIHNet) which realizes very large capacity image steganography with high quality by concealing a single mosaic secret image. InvMIHNet consists of an Invertible image Rescaling (IIR) module and an Invertible image Hiding (IIH) module. The IIR module works for downscaling the single mosaic secret image form by spatially splicing the multiple secret images, and the IIH module then conceal this mosaic image under the cover image. The proposed InvMIHNet successfully conceal and reveal up to 16 secret images with a small number of parameters and memory consumption. Extensive experiments on imageNet-1K, COCO and DIV2K show InvMIHNet outperforms state-of-the-art methods in terms of both the imperceptibility of stego image, recover accuracy of secret image and security against steganlysis methods. The code is available at https://***/Brittany-Chen/InvMIHNet.
暂无评论