Speech-based interaction systems are widely used in mobile devices like smartphones. With advances in deep neural networks, tasks such as speech emotion recognition (SER) enhance these systems’ user-friendliness. How...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Speech-based interaction systems are widely used in mobile devices like smartphones. With advances in deep neural networks, tasks such as speech emotion recognition (SER) enhance these systems’ user-friendliness. However, deploying SER models on mobile devices is challenging due to their complexity and computational demands. While pruning can reduce complexity, it often compromises accuracy, and hardware accelerators like FPGAs are difficult to integrate into mobile devices. This paper proposes AMSER, a real-time speech emotion recognition framework using signal compression and task offloading. AMSER utilizes logarithmic Mel-filter bank coefficients (Fbank) and singular value decomposition (SVD) for feature extraction and compression. The compressed signal is only 6.25% of the original size, achieving 2.24x faster transfer rates and 55.35% energy savings compared to raw audio transmission. Despite the compression, the features preserve key audio information for text and emotion recognition, performed server-side. Experiments show a WER of 4.68% (Librispeech), 10.69% (CommonVoice), and 69.83% emotion recognition accuracy (IEMOCAP).
In modern software development, Python third-party libraries play a critical role, especially in fields like deep learning and scientific computing. However, API parameters in these libraries often change during evolu...
详细信息
In this paper, we consider the learning of a Reduced-Order Linear Parameter-Varying Model (ROLPVM) of a nonlinear dynamical system based on data. This is achieved by a two-step procedure. In the first step, we learn a...
详细信息
In this paper, we consider the learning of a Reduced-Order Linear Parameter-Varying Model (ROLPVM) of a nonlinear dynamical system based on data. This is achieved by a two-step procedure. In the first step, we learn a projection to a lower dimensional state-space. In step two, an LPV model is learned on the reduced-order state-space using a novel, efficient parameterization in terms of neural networks. The improved modeling accuracy of the method compared to an existing method is demonstrated by simulation examples.
The core task of tracking control is to make the controlled plant track a desired *** traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps *...
详细信息
The core task of tracking control is to make the controlled plant track a desired *** traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps *** this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control *** the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to *** discounted iterative scheme under the new cost function for the special case of linear systems is ***,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.
The Gestalt principles of perceptual learning elucidate how the human brain categorizes and comprehends a set of visual elements grouped together. One of the principles of Gestalt perceptual learning is the law of clo...
详细信息
The Gestalt principles of perceptual learning elucidate how the human brain categorizes and comprehends a set of visual elements grouped together. One of the principles of Gestalt perceptual learning is the law of closure which propounds that human perception has the proclivity to visualize a fragmented object as a preknown whole by bridging the missing gaps. Herein, a letter recognition scheme emulating the Gestalt closure principle is demonstrated, utilizing artificial synapses made of 3D integrated MA(3)Bi(2)I(9) (MBI) perovskite nanowire (NW) array. The artificial synapses exhibit short-term plasticity (STP) and long-term potentiation (LTP) and a transition from STP to LTP with increasing number of input electrical pulses. Initiatory ab initio molecular dynamics (AIMD) simulations attribute the conductance change in the MBI NW artificial synapses to the rotation of MA(+) clusters, culminating in charge exchange between MA(+) and Bi2I93-. Each device yields 40 conductance states with excellent retention >10(5) s, minimal variation (2 sigma/mean) <10%, and endurance of approximate to 10(5) cycles. MBI NW-based artificial neural network (ANN) is constructed to recognize fragmented letters alike their distinction in unabridged form and also the gradual withering of synaptic connectivity with engendered missing fragments is demonstrated, thereby successfully implementing Gestalt closure principle.
Image segmentation is a significant problem in image *** this paper,we propose a new two-stage scheme for segmentation based on the Fischer-Burmeister total variation(FBTV).The first stage of our method is to calculat...
详细信息
Image segmentation is a significant problem in image *** this paper,we propose a new two-stage scheme for segmentation based on the Fischer-Burmeister total variation(FBTV).The first stage of our method is to calculate a smooth solution from the FBTV Mumford-Shah ***,we design a new difference of convex algorithm(DCA)with the semi-proximal alternating direction method of multipliers(sPADMM)*** the second stage,we make use of the smooth solution and the K-means method to obtain the segmentation *** simulate images more accurately,a useful operator is introduced,which enables the proposed model to segment not only the noisy or blurry images but the images with missing pixels *** demonstrate the proposed method produces more preferable results comparing with some state-of-the-art methods,especially on the images with missing pixels.
User intent recognition from multimodal neurophysiological signals, particularly electroencephalography (EEG) and electromyography (EMG), is critical for enhancing human-machine interaction in assistive robotics. Rece...
详细信息
User intent recognition from multimodal neurophysiological signals, particularly electroencephalography (EEG) and electromyography (EMG), is critical for enhancing human-machine interaction in assistive robotics. Recent advances in neurophysiological signal processing have enabled enhanced user intent recognition for assistive robotics and human-machine interfaces. However, achieving high accuracy and real-time adaptability in electromyography (EMG) and electroencephalography (EEG)-based gesture recognition remains challenging due to temporal misalignment, weak cross-modality fusion, and lack of adaptive learning. This paper proposes NeuroFusion-Trans, a novel transformer-based framework that improves EEG-EMG gesture recognition by improving temporal resolution, using cross-modality attention, and integrating adaptive online learning. Temporal resolution enhancement ensures dynamic EEG-EMG synchronization for improved signal alignment. The cross-modality attention mechanism captures interdependencies between EEG and EMG signals, leading to more accurate intent classification. Adaptive online learning enables real-time personalization by dynamically adjusting to user-specific variations. The model is evaluated on two publicly available EEG-EMG upper-limb gesture datasets: Dataset 1 (5,296 for training, 1,324 for validation) and Dataset 2 (5,276 for training, 1,304 for validation). NeuroFusion-Trans achieves state-of-the-art performance, with an accuracy of 97% and 96% and Cohen’s Kappa of 0.97 and 0.95 after online adaptation, significantly outperforming baseline models such as CNN-LSTM, GRU, and LSTMNet. Ablation studies reveal that removing the cross-modality attention mechanism reduces accuracy by 6.1%, underscoring its importance in exploiting the EEG-EMG dependencies. Turning off synchronization leads to a 6.7% performance drop, demonstrating the necessity of real-time learning for robust intent recognition. Furthermore, NeuroFusion-Trans enhances EEG-EMG synchr
The Quantum K-means clustering algorithm offers the advantage of quantum parallel computing, but suffers from issues related to cluster center initialization and sensitivity to noisy data due to its similarity with th...
详细信息
In this paper, we propose a new method, called DoubleCoverUDF, for extracting the zero level-set from unsigned distance fields (UDFs). DoubleCoverUDF takes a learned UDF and a user-specified parameter r (a small posit...
详细信息
Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We stud...
Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new results relating the ensemble improvement rate (a measure of how much ensembling decreases the error rate versus a single model, on a relative scale) to the disagreement-error ratio. We show that ensembling improves performance significantly whenever the disagreement rate is large relative to the average error rate; and that, conversely, one classifier is often enough whenever the disagreement rate is low relative to the average error rate. On the way to proving these results, we derive, under a mild condition called competence, improved upper and lower bounds on the average test error rate of the majority vote classifier. To complement this theory, we study ensembling empirically in a variety of settings, verifying the predictions made by our theory, and identifying practical scenarios where ensembling does and does not result in large performance improvements. Perhaps most notably, we demonstrate a distinct difference in behavior between interpolating models (popular in current practice) and non-interpolating models (such as tree-based methods, where ensembling is popular), demonstrating that ensembling helps considerably more in the latter case than in the former.
暂无评论