检索结果-内蒙古大学图书馆

ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN

INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING 2023年第4期14卷

作者： Jagadeeshwar, Kalyanapu Sreenivasarao, T. Pulicherla, Padmaja Satyanarayana, K. N. V. Lakshmi, K. Mohana Kumar, Pala Mahesh VIT AP Univ Dept Comp Sci & Engn Amaravati Andhra Pradesh India Seshadri Rao Gudlavalleru Engn Coll Dept Comp Sci & Engn Gudlavalleru Andhra Pradesh India Teegala Krishna Reddy Engn Coll Dept Comp Sci & Engn Hyderabad Telangana India Sagi Rama Krishnam Raju Engn Coll Dept Elect & Commun Engn Bhimavaram Andhra Pradesh India CMR Tech Campus Dept Elect & Commun Engn Hyderabad Telangana India SAK Informat Dept Artificial Intelligence Hyderabad Telangana India

Automatic speech emotion recognition (ASER) from source speech signals is quite a challenging task since the recognition accuracy is highly dependent on extracted features of speech that are utilized for the classification of speech emotion. In addition, pre-processing and classification phases also play a key role in improving the accuracy of ASER system. Therefore, this paper proposes a deep learning convolutional neural network (DLCNN)-based ASER model, hereafter denoted with ASERNet. In addition, the speech denoising is employed with spectral subtraction (SS) and the extraction of deep features is done using integration of linear predictive coding (LPC) with Mel-frequency Cepstrum coefficients (MFCCs). Finally, DLCNN is employed to classify the emotion of speech from extracted deep features using LPC-MFCC. The simulation results demonstrate the superior performance of the proposed ASERNet model in terms of quality metrics such as accuracy, precision, recall, and F1-score, respectively, compared to state-of-the-art ASER approaches.

关键词： Automatic speech emotion recognition Mel-frequency cepstral coefficients linear predictive coding convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

What Influence Does the Type of Extracted Audio Features Have on Emotional States?

What Influence Does the Type of Extracted Audio Features Hav...

引用

IEEE International Conference on Automation, Quality and Testing, Robotics, AQTR

作者： Toma Telembici Lorena Muscar Corneliu Rusu Signal Processing Group Faculty of Electronics Telecommunications and Information Technology Technical University of cluj-Napoca Cluj-Napoca Romania

ISBN: (数字)9798350361933

ISBN: (纸本)9798350361940

This paper presents the way in which different audio emotional states are classified using various feature extraction types. Our audio soundbase for robot service application, in Romanian language, alongside with four established soundbases, Crema, Ravdess, Savee and Emo-DB were used in this paper. We analyzed the influence of different feature extraction methods upon classification accuracy. We focus on Mel Frequency Cepstral Coefficients (MFCC), linear predictive coding (LPC), Magnitude-based Spectral Root Cepstral Coefficients (MSRCC), Normalized Gammachirp Cepstral Coefficients (NGCC) and Gammatone Frequency Cepstral Coefficients (GFCC). One classifier, k-nearest neighbor (k-NN), is utilized to evaluate the effectiveness of each feature set in recognizing emotional states. We compared the correct classification rates to determine which features have the best results in the emotional recognition field. We show that by using $30-\mathrm{NGCC}$ and 5 -nearest neighbors we reach an overall correct classification rate of $96.46 \%$ in the testing phase. Computational time was also incorporated into the analysis, including the testing and training phases. This study makes contributions to the field of emotion recognition from audio signals, facilitating advancements in human-computer interaction.

关键词： Training Human computer interaction Emotion recognition Automation Accuracy Feature extraction linear predictive coding

来源：评论

学校读者我要写书评

暂无评论

Real-Time Heart Rate Detection Method Based on 77 GHz FMCW Radar

引用

MICROMACHINES 2022年第11期13卷 1960-1960页

作者： Huang, Xiaohong Ju, Zedong Zhang, Rundong North China Univ Sci & Technol Coll Artificial Intelligence Tangshan 063210 Peoples R China Hebei Key Lab Ind Intelligent Percept Tangshan 063210 Peoples R China North China Univ Sci & Technol Coll Management Tangshan 063210 Peoples R China

This paper proposes a real-time heart rate detection method based on 77 GHz FMCW radar. Firstly, the method establishes a new motion model according to respiratory and heartbeat rules, and extracts the motion signals of the chest and the abdomen;then, the random body motion (RBM) signal is eliminated by a combination of polynomial fitting and recursive least squares (RLS) adaptive filtering;lastly, multi-detection-point adaptive harmonics cancellation (AHC) is used to eliminate respiratory harmonics. In addition, the method introduces a spectrum analysis algorithm based on linear predictive coding (LPC). The experimental results show that the method can effectively eliminate the RBM signal and respiratory harmonics, and that the average real-time heart rate detection error rate is 2.925%.

关键词： real-time heart rate 77 GHz FMCW radar random body motion multi-detection-point adaptive harmonics cancellation linear predictive coding

来源：评论

学校读者我要写书评

暂无评论

State-space approach to linear predictive coding of speech - A comparative assessment

State-space approach to linear predictive coding of speech -...

引用

IEEE Conference on Industrial Electronics and Applications

作者： Azeem Irshad Muhammad Salman College of Electrical and Mechanical Engineering National University of Sciences and Technology

ISBN: (纸本)9781467363204

Speech coders are fundamental component in telecommunication and multimedia infrastructure. Several systems like, mobile telephony, voice over internet protocol (VOIP), audio conferencing etc., rely on efficient speech coding. Speech coders strive to provide low-bit rate maintaining the same speech quality and intelligibility. linear predictive coding uses spectral properties of the speech to "optimize" the coder's performance for human ear. In this paper we perform a comparative assessment of speech coding performance of some state-space filters to give designers an insight into capabilities of these filters. The filters considered are Kalman filter, state-space recursive least-squares (SSRLS) and SSRLS with adaptive memory (SSRLSWAM). The results of RLS and LMS are also quoted. The performance is judged in terms of perceptual evaluation of speech quality (PESQ) and prediction gain.

关键词： Speech coding Kalman Filter SSRLS SSRL-SWAM linear predictive coding

来源：评论

学校读者我要写书评

暂无评论

Analysis of glottal inverse filtering in the presence of source-filter interaction

引用

SPEECH COMMUNICATION 2020年 123卷 98-108页

作者： Palaparthi, Anil Titze, Ingo R. Univ Utah Natl Ctr Voice & Speech Salt Lake City UT 84112 USA Univ Utah Dept Biomed Engn Salt Lake City UT 84112 USA

The validity of glottal inverse filtering (GIF) to obtain a glottal flow waveform from radiated pressure signal in the presence and absence of source-filter interaction was studied systematically. A driven vocal fold surface model of vocal fold vibration was used to generate source signals. A one-dimensional wave reflection algorithm was used to solve for acoustic pressures in the vocal tract. Several test signals were generated with and without source-filter interaction at various fundamental frequencies and vowels. linear predictive coding (LPC), Quasi Closed Phase (QCP), and Quadratic Programming (QPR) based algorithms, along with supraglottal impulse response, were used to inverse filter the radiated pressure signals to obtain the glottal flow pulses. The accuracy of each algorithm was tested for its recovery of maximum flow declination rate (MFDR), peak glottal flow, open phase ripple factor, closed phase ripple factor, and mean squared error. The algorithms were also tested for their absolute relative errors of the Normalized Amplitude Quotient, the Quasi-Open Quotient, and the Harmonic Richness Factor. The results indicated that the mean squared error decreased with increase in source-filter interaction level suggesting that the inverse filtering algorithms perform better in the presence of source-filter interaction. All glottal inverse filtering algorithms predicted the open phase ripple factor better than the closed phase ripple factor of a glottal flow waveform, irrespective of the source-filter interaction level. Major prediction errors occurred in the estimation of the closed phase ripple factor, MFDR, peak glottal flow, normalized amplitude quotient, and Quasi-Open Quotient. Feedback-related nonlinearity (source-filter interaction) affected the recovered signal primarily when f(o) was well below the first formant frequency of a vowel. The prediction error increased when f(o) was close to the first formant frequency due to the difficulty of estimating th

关键词： Glottal inverse filtering linear predictive coding Source-filter interaction Speech synthesis

来源：评论

学校读者我要写书评

暂无评论

predictive Quantization for Data Volume Reduction in Staggered SAR Systems

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2020年第8期58卷 5575-5587页

作者： Martone, Michele Gollin, Nicola Villano, Michelangelo Rizzoli, Paola Krieger, Gerhard German Aerosp Ctr DLR Microwaves & Radar Inst D-82234 Wessling Germany

Staggered synthetic aperture radar (SAR) is an innovative SAR acquisition concept which exploits digital beam-forming (DBF) in elevation to form multiple receive beams and continuous variation of the pulse repetition interval to achieve high-resolution imaging of a wide continuous swath. Staggered SAR requires an azimuth oversampling higher than an SAR with constant pulse repetition interval (PRI), which results in an increased volume of data. In this article, we investigate the use of linear predictive coding, which exploits the correlation properties exhibited by the nonuniform azimuth raw data stream. According to this, the prediction of each sample is calculated onboard as a linear combination of a set of previous samples. The resulting prediction error is then quantized and downlinked (instead of the original value), which allows for a reduction of the signal entropy and, in turn, of the onboard data rate achievable for a given target performance. In addition, the a priori knowledge of the gap positions can be exploited to dynamically adapt the bit rate allocation and the prediction order to further improve the performance. Simulations of the proposed dynamic predictive block-adaptive quantization (DP-BAQ) are carried out considering a Tandem-L-like staggered SAR system for different orders of prediction and target scenarios, demonstrating that a significant data reduction can be achieved with a modest increase of the system complexity.

关键词： Block adaptive quantization (BAQ) data reduction linear predictive coding staggered synthetic aperture radar (SAR)

来源：评论

学校读者我要写书评

暂无评论

Performance Improvement and Analysis of a Method for Piano Authentication From Audio Signal by Combining Feature Vectors of LPC Spectral Envelope, MFCC, and pLPC Pole Distribution

Performance Improvement and Analysis of a Method for Piano A...

引用

Computer Science and Data Engineering (CSDE), IEEE Asia-Pacific Conference on

作者： Kodai Komatsu Shun Oyabu Shuichi Kurogi Dept. of Mechanical and Control Eng. Kyushu Institute of Technology Kitakyushu Japan

This paper presents a method for performance improvement by combining feature vectors in piano authentication from the audio signal. So far, we have shown that the combination of the linear predictive coding spectral envelope (LPCSE), the Mel-frequency cepstral coefficients (MFCC) and the piecewise linear predictive coding pole distribution (pLPCPD) improves the performance in speaker verification or authentication, where pLPCPD is the feature vector that we have introduced and developed for speaker verification. We also have analyzed the performance improvement from the point of view of the aperiodicity extracted by pLPCPD and the periodicity extracted by LPCSE and MFCC. This paper applies the method to the verification or authentication of three piano makers: Yamaha, Bösendorfer, and Steinway. Different from speaker verification, a piano has 88 keys to produce a wide range of pitch sounds and has several play styles, such as normal, staccato, tremolo (repeated notes), and pedal. Through the experiment of piano maker verification using all datasets involving different pitches and play styles, we show that pLPCPD+LPCSE (the combination of pLPCPD and LPCSE) has achieved the best performance. Through the experiments using restricted datasets, we show that pLPCPD+MFCC+LPCSE and pLPCPD+MFCC have achieved the best performance in the pitch and the play-style dependent verification, respectively, while pLPCPD+MFCC and pLPCPD have achieved the best performance in the pitch and the play-style independent verification, respectively. As a result, pLPCPD has the largest amount of information dependent on the piano makers and independent from the pitch and the play-style, while the three features have supplementary information each other.

关键词： Computer science Analytical models Computational modeling Authentication Feature extraction Data engineering linear predictive coding

来源：评论

学校读者我要写书评

暂无评论

Comparisons of Speech Parameterisation Techniques for Classification of Intellectual Disability Using Machine Learning

引用

INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE 2020年第2期14卷 16-34页

作者： Aggarwal, Gaurav Singh, Latika Manipal Univ Dept Informat Technol Jaipur Rajasthan India Ansal Univ Gurgaon India

Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with intellectual disabilities using two speech feature extraction techniques: linear predictive coding (LPC) based cepstral parameters, and Mel-frequency cepstral coefficients (MFCC). Four different classification models: k-nearest neighbour (k-NN), support vector machine (SVM), linear discriminant analysis (LDA) and radial basis function neural network (RBFNN) are employed for classification purposes. 48 speech samples of each group are taken for analysis, from subjects with a similar age and socio-economic background. The effect of the different frame length with the number of filterbanks in the MFCC and different frame length with the order in the LPC is also examined for better accuracy. The experimental outcomes show that the projected technique can be used to help speech pathologists in estimating intellectual disability at early ages.

关键词： Classification Intellectual Disability (ID) linear predictive coding Mel-Frequency Cepstral Coefficients Typically Developed (TD)

来源：评论

学校读者我要写书评

暂无评论

Performance Improvement for Speaker Detection from Mixed Speech of Multiple Speakers Using Probabilistic Prediction and Combining LPCSE, MFCC and pLPCPD

Performance Improvement for Speaker Detection from Mixed Spe...

引用

Computer Science and Data Engineering (CSDE), IEEE Asia-Pacific Conference on

作者： Kodai Komatsu Shuichi Kurogi Dept. of Mechanical and Control Eng. Kyushu Institute of Technology Kitakyushu Japan

This paper presents a method for performance improvement using the combination of feature vectors for speaker detection from the mixed speech of multiple speakers. Recently, we have shown that the combination of the linear predictive coding spectral envelope (LPCSE), the Mel-frequency cepstral coefficients (MFCC) and the piecewise linear predictive coding pole distribution (pLPCPD) improves the performance of the speaker verification, where pLPCPD is the feature vector that we have introduced and developed for speaker verification. On the other hand, we have examined a method of speaker verification from the mixed speech of multiple speakers using pLPCPD feature vectors, where we have shown that the recall, or the performance measure for the verification, decreases suddenly with the change of the speech from unmixed to mixed, while the precision does not decrease so much. This paper applies the above method of combining features and the findings on speaker verification to the speaker detection from the mixed speech of multiple speakers using the probabilistic prediction which we are also developing. Through the experiments, we show the performance improvement by the combination of three features and the effectiveness of the pesent prediction method.

关键词： Computer science Feature extraction Probabilistic logic Data engineering linear predictive coding Mel frequency cepstral coefficient

来源：评论

学校读者我要写书评

暂无评论

Multi-Rider Optimization-Based Neural Network for Fault Isolation in Analog Circuits

引用

JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS 2021年第3期30卷

作者： Binu, D. Kariyappa, B. S. VTU Univ RV Coll Engn Dept Elect & Commun Engn Bangalore Karnataka India

Fault isolation in electronic circuits is a trending area of interest as analog circuits find valuable application in industry. The failures in circuit systems cause severe issues in the normal functioning of the system that insists on the need for an automatic method of fault isolation in analog circuits. Literature conveys the issues associated with the fault isolation and hence, to address the severity of the faults, a novel model is proposed to isolate the fault causing component in the circuit. The proposed Multi-Rider Optimization-based Neural Network (M-RideNN) isolates the faulty part of the circuit from the fault-free areas such that the fault diagnosis is structured in an effective way. The fault isolation is progressed as four major steps such as establishing the fault dictionary, signal normalization using linear predictive coding (LPC), effective dimensional reduction methodology using Probabilistic Principal Component Analysis (PPCA), and fault isolation using the proposed M-RideNN classifier. Finally, the experimentation using three circuits, namely Triangular Wave Generator (TWG), Bipolar Transistor Amplifier (BTA), differentiator (DIF), and an application circuit, Solar Power Converter (SPC), proves that the proposed M-RideNN classifier offers better classification accuracy of 93.18% with a minimum Mean Square Error (MSE) of 0.0682.

关键词： Rider optimization fault isolation linear predictive coding probabilistic principal component analysis analog circuits

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：