In this paper, we present a novel bitstream scalable audio coder. In the proposed coder, the full bandwidth of input audio is first split into two. A hybrid WLPC-wavelet representation is used to encode the low freque...
详细信息
In this paper, we present a novel bitstream scalable audio coder. In the proposed coder, the full bandwidth of input audio is first split into two. A hybrid WLPC-wavelet representation is used to encode the low frequency components (<11 kHz). In this method, the excitation to the WLPC synthesis filter is decomposed into subbands using a wavelet filterbank, and perceptually encoded. Two stage quantisation of the wavelet coefficients is used to provide scalability. The high frequency components of the input are assumed to be noisy, and efficiently encoded using an LPC noise model. The output bitstream is capable of being decoded at rates between 16 kbit/s and 80 kbit/s. As the bitrate increases, so too does the signal quality. At 80 kbit/s, the quality is near transparent. At the intermediate rates, the coder gives comparable performance to the MPEG layer III coder, when the MPEG coder operates at similar, but fixed, bitrates.
A segmentation algorithm for noisy speech signal is introduced. The combination of the time scaling technique and binary Walsh transform is the fundamental concept of this method. This approach can also detect the gap...
详细信息
A segmentation algorithm for noisy speech signal is introduced. The combination of the time scaling technique and binary Walsh transform is the fundamental concept of this method. This approach can also detect the gaps which are closely located among the signal components while segmenting the whole noisy waveform. The location of speech segments and gaps are determined from the reconstructed time-scaled signal. It is found that using Walsh transform at the reconstruction phase of time scaling process provides high capability in separating the signal components from narrow gaps. Although this method is less complex, the experimental results are satisfactory in the absence of a priori knowledge about speech and noise.
A vocoder based segmentation algorithm for audio signal is presented. The signal to be detected consists of multi-components, which are closely located in time and buried under high level of noise. While detecting the...
详细信息
A vocoder based segmentation algorithm for audio signal is presented. The signal to be detected consists of multi-components, which are closely located in time and buried under high level of noise. While detecting the signal components, this method can also estimate the duration of the narrow gaps between them. The combination of a vocoder and binary Walsh transform is the fundamental concept of this method. The location of audio segments and gaps are determined from the output waveform of the vocoder. It is found that using Walsh transform at the synthesis stage of a vocoder provides high capability in separating the signal components from narrow gaps. Although this method is computationally less expensive, the experimental results are satisfactory in the absence of any a priori knowledge about the signal and noise.
A modification of a classical predictive vector quantization (PVQ) technique with switched-adaptive prediction for line spectrum frequencies (LSF) quantization is proposed in this paper, enabling significant reduction...
详细信息
A modification of a classical predictive vector quantization (PVQ) technique with switched-adaptive prediction for line spectrum frequencies (LSF) quantization is proposed in this paper, enabling significant reduction in complexity. Lower complexity is achieved through use of higher number of switched prediction matrices but with reduced number of their nonzero elements. The structures of such matrices and optimal matrix elements are obtained to maximize the quantizer closed-loop prediction gain. A comparison of the proposed quantizer to the ones with full prediction matrices as well as to the quantizer incorporating diagonal matrices is given. The effectiveness of the proposed approach is shown and the trade-off between complexity and quality of the quantizer is analyzed.
Some natural sounds, such as speech parts can essentially be considered as noises. For instance, models suppose noisy parts of sounds as weak parts and apply basic approximations. But transformations such as time stre...
详细信息
Some natural sounds, such as speech parts can essentially be considered as noises. For instance, models suppose noisy parts of sounds as weak parts and apply basic approximations. But transformations such as time stretching do not preserve the noisy characteristics of sounds. Moreover, we show that those transformations introduce artificial intensity variations. In this paper we propose a spectral model for noise modeling which takes into account the statistical properties of such sounds. The analysis is based on the classical spectral models. The synthesis consists of randomly defining sinusoidal components. These components are then added using the adapted overlap-add method to keep statistical moments constant. Time scaling operations using this approach are described. Experiments on artificial sounds (filtered white noises) as well as natural sounds such as consonants and whispered vowels, show impressive enhancement in quality. Infinite time stretching transformations of such noises can be perfectly performed.
Room equalization is important for delivering high-quality audio in multiple listener environments and for improving speech recognition rates. Lower order equalization filters can be designed at perceptually relevant ...
详细信息
Room equalization is important for delivering high-quality audio in multiple listener environments and for improving speech recognition rates. Lower order equalization filters can be designed at perceptually relevant frequencies through warping. However, one of the major factors that affects multi-channel equalization performance is the reverberation of the room. In this paper, we compare the equalization performance of our method (S. Bharitkar et al., Nov. 2002) to the industry standard root-mean-square (RMS) method, through the image method. It is shown that our method outperforms the RMS method in terms of maintaining a lower spectral deviation, across multiple listener positions, when the reverberation time is increased.
The paper presents a particle-filtering method for estimating formant frequencies of speech signals from spectrograms. First, frequency bands corresponding to the analyzed formants are extracted via a two-step dynamic...
详细信息
The paper presents a particle-filtering method for estimating formant frequencies of speech signals from spectrograms. First, frequency bands corresponding to the analyzed formants are extracted via a two-step dynamic programming based algorithm. A particle-filtering method is then used to locate accurately formants in every formant area based on the posterior PDF described by a set of support points with associated weights. Formant trajectories of voiced frames of a group of 81 utterances were manually tracked and labeled, partly for model training and partly for algorithm evaluation. In the experiments, the proposed method obtains average estimation errors of 72, 115, and 113 Hz for the first three formants, respectively, whereas the LPC based method induces 118, 172, and 250 Hz deviations. The experimental results show that the formants estimated by the proposed method are quite reliable and the trajectories are more accurate than LPC.
General audio and video with fine granularity scalability (FGS) has become favored in next generation multimedia coding standards due to its high flexibility in channel rate adaption. However, the FGS phenomenon has n...
详细信息
ISBN:
(纸本)0780376633
General audio and video with fine granularity scalability (FGS) has become favored in next generation multimedia coding standards due to its high flexibility in channel rate adaption. However, the FGS phenomenon has not yet been fitted into existing speech codecs. We introduce the FGS feature to the code excited linear prediction (CELP) based speech coding algorithm by adjusting the amount of transmitted fixed excitation information. We further improve the algorithm by relaxing the constraints and re-ordering the sequence of pulses. To achieve this target, we need to make modifications to the conventional coding algorithm, but the computation overhead is little and affected modules are few. As a consequence, developers can, in a short time, easily migrate their existing codec to one with the FGS advantage.
The paper presents a new hand shape representation technique that characterises the finger-only topology of the hand, by adapting an existing technique from speech signal processing. From a moving hand sequence, the t...
详细信息
The paper presents a new hand shape representation technique that characterises the finger-only topology of the hand, by adapting an existing technique from speech signal processing. From a moving hand sequence, the tracking algorithm determines the centre of the largest convex subset of the hand, using a combination of pattern matching and condensation algorithms. A hand shape feature represents the topological formation of the finger-only regions of the hand using a linear predictive coding parameter set called cepstral coefficients. Experimental results demonstrate the effectiveness of detecting the shape feature from motion sequences.
In this paper, a modification to the group vector quantization (GVQ) discriminative training algorithm is proposed to train VQ codebooks for closed set speaker identification. The proposed algorithm, referred to as mo...
详细信息
In this paper, a modification to the group vector quantization (GVQ) discriminative training algorithm is proposed to train VQ codebooks for closed set speaker identification. The proposed algorithm, referred to as modified GVQ (MGVQ), shifts the decision surfaces between speakers smoothly toward the Bayes limits. This is achieved by varying the learning rate during training iterations. The proposed MGVQ algorithm achieves higher speaker identification rate compared to the standard GVQ
暂无评论