Automatic speech emotion recognition (ASER) from source speech signals is quite a challenging task since the recognition accuracy is highly dependent on extracted features of speech that are utilized for the classific...
详细信息
Automatic speech emotion recognition (ASER) from source speech signals is quite a challenging task since the recognition accuracy is highly dependent on extracted features of speech that are utilized for the classification of speech emotion. In addition, pre-processing and classification phases also play a key role in improving the accuracy of ASER system. Therefore, this paper proposes a deep learning convolutional neural network (DLCNN)-based ASER model, hereafter denoted with ASERNet. In addition, the speech denoising is employed with spectral subtraction (SS) and the extraction of deep features is done using integration of linear predictive coding (LPC) with Mel-frequency Cepstrum coefficients (MFCCs). Finally, DLCNN is employed to classify the emotion of speech from extracted deep features using LPC-MFCC. The simulation results demonstrate the superior performance of the proposed ASERNet model in terms of quality metrics such as accuracy, precision, recall, and F1-score, respectively, compared to state-of-the-art ASER approaches.
This paper presents the way in which different audio emotional states are classified using various feature extraction types. Our audio soundbase for robot service application, in Romanian language, alongside with four...
详细信息
ISBN:
(数字)9798350361933
ISBN:
(纸本)9798350361940
This paper presents the way in which different audio emotional states are classified using various feature extraction types. Our audio soundbase for robot service application, in Romanian language, alongside with four established soundbases, Crema, Ravdess, Savee and Emo-DB were used in this paper. We analyzed the influence of different feature extraction methods upon classification accuracy. We focus on Mel Frequency Cepstral Coefficients (MFCC), linear predictive coding (LPC), Magnitude-based Spectral Root Cepstral Coefficients (MSRCC), Normalized Gammachirp Cepstral Coefficients (NGCC) and Gammatone Frequency Cepstral Coefficients (GFCC). One classifier, k-nearest neighbor (k-NN), is utilized to evaluate the effectiveness of each feature set in recognizing emotional states. We compared the correct classification rates to determine which features have the best results in the emotional recognition field. We show that by using $30-\mathrm{NGCC}$ and 5 -nearest neighbors we reach an overall correct classification rate of $96.46 \%$ in the testing phase. Computational time was also incorporated into the analysis, including the testing and training phases. This study makes contributions to the field of emotion recognition from audio signals, facilitating advancements in human-computer interaction.
This paper proposes a real-time heart rate detection method based on 77 GHz FMCW radar. Firstly, the method establishes a new motion model according to respiratory and heartbeat rules, and extracts the motion signals ...
详细信息
This paper proposes a real-time heart rate detection method based on 77 GHz FMCW radar. Firstly, the method establishes a new motion model according to respiratory and heartbeat rules, and extracts the motion signals of the chest and the abdomen;then, the random body motion (RBM) signal is eliminated by a combination of polynomial fitting and recursive least squares (RLS) adaptive filtering;lastly, multi-detection-point adaptive harmonics cancellation (AHC) is used to eliminate respiratory harmonics. In addition, the method introduces a spectrum analysis algorithm based on linear predictive coding (LPC). The experimental results show that the method can effectively eliminate the RBM signal and respiratory harmonics, and that the average real-time heart rate detection error rate is 2.925%.
Speech coders are fundamental component in telecommunication and multimedia infrastructure. Several systems like, mobile telephony, voice over internet protocol (VOIP), audio conferencing etc., rely on efficient speec...
详细信息
ISBN:
(纸本)9781467363204
Speech coders are fundamental component in telecommunication and multimedia infrastructure. Several systems like, mobile telephony, voice over internet protocol (VOIP), audio conferencing etc., rely on efficient speech coding. Speech coders strive to provide low-bit rate maintaining the same speech quality and intelligibility. linear predictive coding uses spectral properties of the speech to "optimize" the coder's performance for human ear. In this paper we perform a comparative assessment of speech coding performance of some state-space filters to give designers an insight into capabilities of these filters. The filters considered are Kalman filter, state-space recursive least-squares (SSRLS) and SSRLS with adaptive memory (SSRLSWAM). The results of RLS and LMS are also quoted. The performance is judged in terms of perceptual evaluation of speech quality (PESQ) and prediction gain.
The validity of glottal inverse filtering (GIF) to obtain a glottal flow waveform from radiated pressure signal in the presence and absence of source-filter interaction was studied systematically. A driven vocal fold ...
详细信息
The validity of glottal inverse filtering (GIF) to obtain a glottal flow waveform from radiated pressure signal in the presence and absence of source-filter interaction was studied systematically. A driven vocal fold surface model of vocal fold vibration was used to generate source signals. A one-dimensional wave reflection algorithm was used to solve for acoustic pressures in the vocal tract. Several test signals were generated with and without source-filter interaction at various fundamental frequencies and vowels. linear predictive coding (LPC), Quasi Closed Phase (QCP), and Quadratic Programming (QPR) based algorithms, along with supraglottal impulse response, were used to inverse filter the radiated pressure signals to obtain the glottal flow pulses. The accuracy of each algorithm was tested for its recovery of maximum flow declination rate (MFDR), peak glottal flow, open phase ripple factor, closed phase ripple factor, and mean squared error. The algorithms were also tested for their absolute relative errors of the Normalized Amplitude Quotient, the Quasi-Open Quotient, and the Harmonic Richness Factor. The results indicated that the mean squared error decreased with increase in source-filter interaction level suggesting that the inverse filtering algorithms perform better in the presence of source-filter interaction. All glottal inverse filtering algorithms predicted the open phase ripple factor better than the closed phase ripple factor of a glottal flow waveform, irrespective of the source-filter interaction level. Major prediction errors occurred in the estimation of the closed phase ripple factor, MFDR, peak glottal flow, normalized amplitude quotient, and Quasi-Open Quotient. Feedback-related nonlinearity (source-filter interaction) affected the recovered signal primarily when f(o) was well below the first formant frequency of a vowel. The prediction error increased when f(o) was close to the first formant frequency due to the difficulty of estimating th
Staggered synthetic aperture radar (SAR) is an innovative SAR acquisition concept which exploits digital beam-forming (DBF) in elevation to form multiple receive beams and continuous variation of the pulse repetition ...
详细信息
Staggered synthetic aperture radar (SAR) is an innovative SAR acquisition concept which exploits digital beam-forming (DBF) in elevation to form multiple receive beams and continuous variation of the pulse repetition interval to achieve high-resolution imaging of a wide continuous swath. Staggered SAR requires an azimuth oversampling higher than an SAR with constant pulse repetition interval (PRI), which results in an increased volume of data. In this article, we investigate the use of linear predictive coding, which exploits the correlation properties exhibited by the nonuniform azimuth raw data stream. According to this, the prediction of each sample is calculated onboard as a linear combination of a set of previous samples. The resulting prediction error is then quantized and downlinked (instead of the original value), which allows for a reduction of the signal entropy and, in turn, of the onboard data rate achievable for a given target performance. In addition, the a priori knowledge of the gap positions can be exploited to dynamically adapt the bit rate allocation and the prediction order to further improve the performance. Simulations of the proposed dynamic predictive block-adaptive quantization (DP-BAQ) are carried out considering a Tandem-L-like staggered SAR system for different orders of prediction and target scenarios, demonstrating that a significant data reduction can be achieved with a modest increase of the system complexity.
This paper presents a method for performance improvement by combining feature vectors in piano authentication from the audio signal. So far, we have shown that the combination of the linear predictive coding spectral ...
详细信息
This paper presents a method for performance improvement by combining feature vectors in piano authentication from the audio signal. So far, we have shown that the combination of the linear predictive coding spectral envelope (LPCSE), the Mel-frequency cepstral coefficients (MFCC) and the piecewise linear predictive coding pole distribution (pLPCPD) improves the performance in speaker verification or authentication, where pLPCPD is the feature vector that we have introduced and developed for speaker verification. We also have analyzed the performance improvement from the point of view of the aperiodicity extracted by pLPCPD and the periodicity extracted by LPCSE and MFCC. This paper applies the method to the verification or authentication of three piano makers: Yamaha, Bösendorfer, and Steinway. Different from speaker verification, a piano has 88 keys to produce a wide range of pitch sounds and has several play styles, such as normal, staccato, tremolo (repeated notes), and pedal. Through the experiment of piano maker verification using all datasets involving different pitches and play styles, we show that pLPCPD+LPCSE (the combination of pLPCPD and LPCSE) has achieved the best performance. Through the experiments using restricted datasets, we show that pLPCPD+MFCC+LPCSE and pLPCPD+MFCC have achieved the best performance in the pitch and the play-style dependent verification, respectively, while pLPCPD+MFCC and pLPCPD have achieved the best performance in the pitch and the play-style independent verification, respectively. As a result, pLPCPD has the largest amount of information dependent on the piano makers and independent from the pitch and the play-style, while the three features have supplementary information each other.
Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with ...
详细信息
Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with intellectual disabilities using two speech feature extraction techniques: linear predictive coding (LPC) based cepstral parameters, and Mel-frequency cepstral coefficients (MFCC). Four different classification models: k-nearest neighbour (k-NN), support vector machine (SVM), linear discriminant analysis (LDA) and radial basis function neural network (RBFNN) are employed for classification purposes. 48 speech samples of each group are taken for analysis, from subjects with a similar age and socio-economic background. The effect of the different frame length with the number of filterbanks in the MFCC and different frame length with the order in the LPC is also examined for better accuracy. The experimental outcomes show that the projected technique can be used to help speech pathologists in estimating intellectual disability at early ages.
This paper presents a method for performance improvement using the combination of feature vectors for speaker detection from the mixed speech of multiple speakers. Recently, we have shown that the combination of the l...
详细信息
This paper presents a method for performance improvement using the combination of feature vectors for speaker detection from the mixed speech of multiple speakers. Recently, we have shown that the combination of the linear predictive coding spectral envelope (LPCSE), the Mel-frequency cepstral coefficients (MFCC) and the piecewise linear predictive coding pole distribution (pLPCPD) improves the performance of the speaker verification, where pLPCPD is the feature vector that we have introduced and developed for speaker verification. On the other hand, we have examined a method of speaker verification from the mixed speech of multiple speakers using pLPCPD feature vectors, where we have shown that the recall, or the performance measure for the verification, decreases suddenly with the change of the speech from unmixed to mixed, while the precision does not decrease so much. This paper applies the above method of combining features and the findings on speaker verification to the speaker detection from the mixed speech of multiple speakers using the probabilistic prediction which we are also developing. Through the experiments, we show the performance improvement by the combination of three features and the effectiveness of the pesent prediction method.
Fault isolation in electronic circuits is a trending area of interest as analog circuits find valuable application in industry. The failures in circuit systems cause severe issues in the normal functioning of the syst...
详细信息
Fault isolation in electronic circuits is a trending area of interest as analog circuits find valuable application in industry. The failures in circuit systems cause severe issues in the normal functioning of the system that insists on the need for an automatic method of fault isolation in analog circuits. Literature conveys the issues associated with the fault isolation and hence, to address the severity of the faults, a novel model is proposed to isolate the fault causing component in the circuit. The proposed Multi-Rider Optimization-based Neural Network (M-RideNN) isolates the faulty part of the circuit from the fault-free areas such that the fault diagnosis is structured in an effective way. The fault isolation is progressed as four major steps such as establishing the fault dictionary, signal normalization using linear predictive coding (LPC), effective dimensional reduction methodology using Probabilistic Principal Component Analysis (PPCA), and fault isolation using the proposed M-RideNN classifier. Finally, the experimentation using three circuits, namely Triangular Wave Generator (TWG), Bipolar Transistor Amplifier (BTA), differentiator (DIF), and an application circuit, Solar Power Converter (SPC), proves that the proposed M-RideNN classifier offers better classification accuracy of 93.18% with a minimum Mean Square Error (MSE) of 0.0682.
暂无评论