In this paper, we propose adopting the algorithm of linear prediction coding (LPC) to proceeds the temporal feature streams in speech recognition for noise robustness. Using LPC, an FIR filter can be obtained and appl...
详细信息
ISBN:
(纸本)9781509020744
In this paper, we propose adopting the algorithm of linear prediction coding (LPC) to proceeds the temporal feature streams in speech recognition for noise robustness. Using LPC, an FIR filter can be obtained and applied to the time series of Mel-frequency cepstral coefficients (MFCC), and in general the fast-varying component in the modulation spectrum of MFCC can be alleviated accordingly. We have found that the smoothing of MFCC modulation spectrum helps to reduce the noise effect and enhance noise robustness of MFCC. Experiments conducted on the Aurora-2 connected digit database shows that the proposed LPC-wise method improves the recognition accuracy of MVN- and HEQ-preprocessed MFCC under a wide range of noise-corrupted situations.
We have devised a high-quality frequency-domain audio coder based on the state-of-the-art monaural wide-band coder aiming at its use in low-delay and low-bit-rate conditions. The coder efficiently represents frequency...
详细信息
We have devised a high-quality frequency-domain audio coder based on the state-of-the-art monaural wide-band coder aiming at its use in low-delay and low-bit-rate conditions. The coder efficiently represents frequency spectral envelopes of the target signals with low computational complexity using optimally prepared non-negative sparse matrices. The experimental results reveal that this representation has positive effects on the objective and subjective quality of the coder resulting in the comparable quality to the same bit rate of 3GPP Extended Adaptive Multi-Rate WideBand (AMR-WB+), a coder which permits more than four times longer delay compared with the proposed coder. Consequently, this coder is suitable for applications in mobile communications, which require low delay and low complexity.
This research addresses an issue of wide band (WB) speech transmission (having cut-off frequency f(c) = 8 kHz) over standard narrow band (NB) communication link (supporting bandwidth of 300-3,400 Hz). A long transitio...
详细信息
This research addresses an issue of wide band (WB) speech transmission (having cut-off frequency f(c) = 8 kHz) over standard narrow band (NB) communication link (supporting bandwidth of 300-3,400 Hz). A long transition time for technological up-gradation from NB to WB systems eventually lead to development of backward compatible techniques such as artificial bandwidth extension (ABE) which is capable of providing bandwidth of 50-7,000 Hz, in turn contributing toll quality recovered speech at receiving end. This paper investigates a novel approach to compute high band (HB) features using linear predictive coding (LPC) technique at transmitter from given input WB speech corpus. These encoded features are embedded into bit stream of proposed GSM Full Rate 06.10 NB speech coder using joint source coding and data hiding technique and then transmitted to receiver. At receiver, these HB features are extracted to reproduce HB recovered speech using watermark extraction algorithm and for the same different extension of excitation techniques have been adopted and implemented. An e-test bench is created to implement this proposed ABE coder in MATLAB and series of simulations are carried out using Subjective (mean opinion score-MOS) and Objective (perceptual evaluation of speech quality-PESQ) analysis. Obtained results for both analyses advocate performance improvement of proposed ABE coder over legacy GSM 06.10 FRNB coder for various extension of excitation techniques.
In this paper, the application of artificial neural network clasifier to resolve pest birds in agricultural areas as a part of a comprehensive system of protection against vermin is demonstrated. Firstly, the idea of ...
详细信息
In this paper, the application of artificial neural network clasifier to resolve pest birds in agricultural areas as a part of a comprehensive system of protection against vermin is demonstrated. Firstly, the idea of the whole system is outlined. Then, the method of recognition is described, the process of artificial neural network design is illustrated and the classifier is validated using data gathered in the fields. Eventually, the results are compared to similar works.
This paper presents a real-time robust formant tracking system for speech using a real-time phase equalization-based autoregressive exogenous model (PEAR) with electroglottography (EGG). Although linearpredictive cod...
详细信息
This paper presents a real-time robust formant tracking system for speech using a real-time phase equalization-based autoregressive exogenous model (PEAR) with electroglottography (EGG). Although linear predictive coding (LPC) analysis is a popular method for estimating formant frequencies, it is known that the estimation accuracy for speech with high fundamental frequency F-0 would be degraded since the harmonic structure of the glottal source spectrum deviates more from the Gaussian noise assumption in LPC as its F-0 increases. In contrast, PEAR, which employs phase equalization and LPC with an impulse train as the glottal source signals, estimates formant frequencies robustly even for speech with high F-0. However, PEAR requires higher computational complexity than LPC. In this study, to reduce this computational complexity, a novel formulation of PEAR was derived, which enabled us to implement PEAR for a real-time robust formant tracking system. In addition, since PEAR requires timings of glottal closures, a stable detection method using EGG was devised. We developed the real-time system on a digital signal processor and showed that, for both the synthesized and natural vowels, the proposed method can estimate formant frequencies more robustly than LPC against a wider range of F-0.
In this study, we target to automatically detect behavioral patterns of patients with autism. Many stereotypical behavioral patterns may hinder their learning ability as a child and patterns such as self-injurious beh...
详细信息
In this study, we target to automatically detect behavioral patterns of patients with autism. Many stereotypical behavioral patterns may hinder their learning ability as a child and patterns such as self-injurious behaviors (SIB) can lead to critical damages or wounds as they tend to repeatedly harm one single location. Our custom designed accelerometer based wearable sensor can be placed at various locations of the body to detect stereotypical self-stimulatory behaviors (stereotypy) and self-injurious behaviors of patients with Autism Spectrum Disorder (ASD). A microphone was used to record sounds so that we may understand the surrounding environment and video provided ground truth for analysis. The analysis was done on four children diagnosed with ASD who showed repeated self-stimulatory behaviors that involve part of the body such as flapping arms, body rocking and self-injurious behaviors such as punching their face, or hitting their legs. The goal of this study is to devise novel algorithms to detect these events and open possibility for design of intervention methods. In this paper, we have shown time domain pattern matching with linear predictive coding (LPC) of data to design detection and classification of these ASD behavioral events. We observe clusters of pole locations from LPC roots to select candidates and apply pattern matching for classification. We also show novel event detection using online dictionary update method. We show that our proposed method achieves recall rate of 95.5% for SIB, 93.5% for flapping, and 95.5% for rocking which is an increase of approximately 5% compared to flapping events detected by using wrist worn sensors in our previous study.
Location template matching (LTM) is a source localization technique in solids that is robust to dispersion and multipath. This is possible since LTM compares the input with a database of signals made at known location...
详细信息
ISBN:
(纸本)9781424442959
Location template matching (LTM) is a source localization technique in solids that is robust to dispersion and multipath. This is possible since LTM compares the input with a database of signals made at known locations. With this in place, it is possible to employ LTM in situations where the surface of interest takes an irregular shape. However, one of the existing LTM approaches uses cross-correlation to compare the input and the database. It should be noted that if any two of the known locations stored in the database are too close, the cross-correlation method may have difficulties differentiating between signals generated from the neighboring points. To address this, we propose an algorithm which employs the linear predictive coding (LPC) that takes into account the dominant frequencies of a received signal. Using this approach, we show that the proposed algorithm is able to improve LTM's source localization accuracy under a real environment in the context of source localization for a touch interface.
Lag windowing has long been used for the auto-correlation method of linearpredictive (LP) analysis to prevent possible instability of the synthesis filter with the obtained coefficients. We have investigated the lag-...
详细信息
ISBN:
(纸本)9781479975914
Lag windowing has long been used for the auto-correlation method of linearpredictive (LP) analysis to prevent possible instability of the synthesis filter with the obtained coefficients. We have investigated the lag-window shape in terms of the trade-offs between stability and the coding efficiency. On the basis of these investigations, we have devised an adaptive selection scheme in which the window shape selected depends on the periodicity of the signal. This scheme has proven to be effective for LP analysis to enhance the coding efficiency in both time and frequency domains in general. This scheme has thus been included in the speech and audio coding schemes of the newly established 3GPP EVS codec standard.
The paper presents a novel approach to identify the singers using harmonic spectral envelope constructed from pitch of singing voice. This new representation of singing voice demonstrates that harmonic spectral envelo...
详细信息
ISBN:
(纸本)9781467377584
The paper presents a novel approach to identify the singers using harmonic spectral envelope constructed from pitch of singing voice. This new representation of singing voice demonstrates that harmonic spectral envelope exhibits certain acoustic qualities that can characterize the identity of the singer. Two different approaches are implemented to extract the pitch of singing voice;Cepstrum technique and linear predictive coding. Ten singers comprising of six male and four female singers are analyzed in this work. To have accurate analysis and estimation of acoustics of singing voice only cappella sections are investigated. Along with discussion on singer identification, the results include comparison of pitch extraction techniques and gender identification of singer. We achieve an average accuracy of 77% in identifying the singers, covering a large class of polyphonic recordings of Indian movie songs.
A large part of the latest research in speech coding algorithms is motivated by the need of obtaining secure military communications, to allow effective operation in a hostile environment. Since the bandwidth of the c...
详细信息
ISBN:
(纸本)9781467380324
A large part of the latest research in speech coding algorithms is motivated by the need of obtaining secure military communications, to allow effective operation in a hostile environment. Since the bandwidth of the communication channel is a sensitive problem in military applications, low bit-rate speech compression methods are mostly used. Several speech processing applications such as Mixed Excitation linear Prediction are characterized by very strict requirements in power consumption, size, and voltage supply. These requirements are difficult to fulfill, given the complexity and number of functions to be implemented, together with the real time requirement and large dynamic range of the input signals. To meet these constraints, careful optimization should be done at all levels, ranging from algorithmic level, through system and circuit architecture, to layout and design of the cell library. The key points of this optimization are among others, the choice of the algorithms, the modification of the algorithms to reduce computational complexity, the choice of a fixed-point arithmetic unit, the minimization of the number of bits required at every node of the algorithm, and a careful match between algorithms and architecture. This paper concentrates on low bit rate speech coding technology, mainly in MELP and solved the problem of optimizing the program of MELP on Digital Signal Processor platform. The algorithm was ported onto a fixed point DSP, Blackfin 537, and stage by stage optimization was performed to meet the real time requirements. The main functions involved were analysis, parameter encoding, parameter decoding and synthesis. The fixed point source code at the MELP front end was also thoroughly optimized at the C Level. Memory optimization techniques such as data placement and caching were also used to reduce the processing time. The results we obtained show that real-time implementations of a speech vocoder based on the MELP standard for low bit rate commu
暂无评论