This paper investigates the use of Neural Networks in recognizing Malay vowels of children in speaker-independent manner. Malay vowels are comprised of /a/, /e/, /./, /i/, /o/and /u/. Speech database is collected from...
详细信息
ISBN:
(纸本)9783642038815
This paper investigates the use of Neural Networks in recognizing Malay vowels of children in speaker-independent manner. Malay vowels are comprised of /a/, /e/, /./, /i/, /o/and /u/. Speech database is collected from 300 Malay children between seven and twelve years old. Each speaker contributes two samples per vowel sound. The speech database is organized equally into training set and test set. The speech sounds are sampled at 20 kHz with 16 bit resolution. A single frame of cepstral coefficients is extracted around the vowel onset point using linear predictive coding. Multi-Layer Perceptron (MLP) with one hidden-layer is used to train and recognize the vowel sounds. The output of the MLP consists of 6 neurons, which correspond to the 6 vowel sounds. Experiments are conducted to determine the optimal signal length of vowels, and hidden neuron number of MLP. A maximum recognition rate of 75.00% is achieved at signal length of 30ms and 35ms.
In this paper, an instantaneous total error based adaptive linear predictor is presented for linear predictive coding (LPC) of speech signals. In LPC, the speech signal is predicted by a linear combination of delayed ...
详细信息
ISBN:
(纸本)9781479954230
In this paper, an instantaneous total error based adaptive linear predictor is presented for linear predictive coding (LPC) of speech signals. In LPC, the speech signal is predicted by a linear combination of delayed input signals that are contaminated by noise. For this reason, total least mean squares (T-LMS) algorithm is used to decode the noisy input signals and to predict a speech signal. A compressed speech prediction is done when the mean squares total error is minimized, showing the efficiency of T-LMS based LPC model. Experimental results are recorded for different values of signal to noise ratio (SNR) of the input signals, and a comparative study is presented with instantaneous error squares based adaptive filter. These results show the preference of proposed predictor over the other.
In CELP coders, the past excitation signal used to build the adaptive codebook is known to be the main source of error propagation when a frame is lost. This paper presents a novel resynchronization technique using ve...
详细信息
ISBN:
(纸本)9781615673780
In CELP coders, the past excitation signal used to build the adaptive codebook is known to be the main source of error propagation when a frame is lost. This paper presents a novel resynchronization technique using very low bit rate side information to correct the past excitation signal after a frame erasure, the novelty being that the correction is computed in a closed loop fashion, based on the actual error introduced by the concealment. Subjective test results show that this approach is a promising area for future research on frame loss recovery.
A comparative performance study of seven pitch detection algorithms was conducted. A speech data base, consisting of eight utterances spoken by three males, three females, and one child was constructed. Telephone, clo...
详细信息
Vector precoding enables non-cooperative signal acquisition in the multi-user broadcast channel. The performance advantage with respect to the more straightforward linear precoding algorithms comes as a consequence of...
详细信息
ISBN:
(纸本)9781467350518
Vector precoding enables non-cooperative signal acquisition in the multi-user broadcast channel. The performance advantage with respect to the more straightforward linear precoding algorithms comes as a consequence of an added perturbation vector, which enhances the properties of the precoded signal. Nevertheless, the computation of the perturbation signal entails a search for the closest point in an infinite lattice, which is known to belong in the class of non-deterministic polynomial-time hard (NP-hard) problems. This contribution presents a novel tree search scheme that achieves an error-rate performance that is close to the optimum given by the sphere encoder, but with a significantly simpler tree-search structure that only considers the most promising nodes for expansion. With the aim of better showcasing the low-complexity and simple datapath of the proposed tree-search technique, its hardware implementation on a 65 nm ASIC target device has been performed.
The CD revolution of 1984 unleashed a massive campaign of rerelease of vintage analog recordings. To their collective horror, record labels found that many of the original master recordings had deteriorated. Often the...
详细信息
ISBN:
(纸本)9781424414369
The CD revolution of 1984 unleashed a massive campaign of rerelease of vintage analog recordings. To their collective horror, record labels found that many of the original master recordings had deteriorated. Often the only existing copy of a recording was a vinyl pressing. This rush to release beloved music in the new digital form spawned intense interest in digital methods of restoring these old recordings. A number of DSP techniques have been developed to give new life to these recordings. This paper is a survey of some of those methods.
Voiced-Unvoiced classification (V-UV) is a well understood but still not perfectly solved problem. It tackles the problem of determining whether a signal frame contains harmonic content or not. This paper presents a n...
详细信息
ISBN:
(纸本)9781509065363
Voiced-Unvoiced classification (V-UV) is a well understood but still not perfectly solved problem. It tackles the problem of determining whether a signal frame contains harmonic content or not. This paper presents a new approach to this problem using a conventional multi-layer perceptron neural network trained with linear predictive coding (LPC) coefficients. LPC is a method that results in a number of coefficients that can be transformed to the envelope of the spectrum of the input frame. As a spectrum is suitable for determining the harmonic content, so are the LPC-coefficients. The proposed neural network works reasonably well compared to other approaches and has been evaluated on a small dataset of 4 different speakers.
This paper presents a blind watermark detection scheme for additive watermark embedding model. The proposed estimation-correlation-based watermark detector first estimates the embedded watermark by exploiting non-Gaus...
详细信息
ISBN:
(纸本)9783642005985
This paper presents a blind watermark detection scheme for additive watermark embedding model. The proposed estimation-correlation-based watermark detector first estimates the embedded watermark by exploiting non-Gaussian of the real-world audio signal and the mutual independence between the host-signal and the embedded watermark and then a correlation-based detector is used to determine the presence or the absence of the watermark. For watermark estimation, blind Source separation (BSS) based on underdetermined independent component analysis (UICA) is used. Low watermark-to-signal ratio (WSR) is one to the limitations of blind detection for additive embedding model. The proposed detector uses two-stage processing to improve WSR at the blind detector;first stage removes the audio spectrum from the watermarked audio signal using linearpredictive (LP) filtering and the second stage uses resulting residue from the LP filtering stage to estimate the embedded watermark using BSS based on UICA. Simulation results show that the proposed detector performs significantly better than existing estimation-correlation-based detection schemes.
Low-complexity scalable methods are of importance to achieve multi-rate and variable rate speech coding. Adopting the CELP concept, we focus on the innovation coding, suggesting an adaptive companding VQ. We present a...
详细信息
ISBN:
(纸本)0780374029
Low-complexity scalable methods are of importance to achieve multi-rate and variable rate speech coding. Adopting the CELP concept, we focus on the innovation coding, suggesting an adaptive companding VQ. We present a scheme having essentially no increase in complexity as the rate is increased, yet having a competitive distortion performance. This is acheived by avoiding an explicit perceptual filtering step in the coding, still utilizing close to optimal VQ techniques in a perceptual domain. Subjective and objective distortion performance is better than, or in parity with, that of conventional white noise excitation methods or multipulse structures.
Lag windowing has long been used for the auto-correlation method of linearpredictive (LP) analysis to prevent possible instability of the synthesis filter with the obtained coefficients. We have investigated the lag-...
详细信息
ISBN:
(纸本)9781479975914
Lag windowing has long been used for the auto-correlation method of linearpredictive (LP) analysis to prevent possible instability of the synthesis filter with the obtained coefficients. We have investigated the lag-window shape in terms of the trade-offs between stability and the coding efficiency. On the basis of these investigations, we have devised an adaptive selection scheme in which the window shape selected depends on the periodicity of the signal. This scheme has proven to be effective for LP analysis to enhance the coding efficiency in both time and frequency domains in general. This scheme has thus been included in the speech and audio coding schemes of the newly established 3GPP EVS codec standard.
暂无评论