In this paper bootstrap resampling techniques are applied to assess speech quality and thereby evaluate performance of distinct speech enhancement algorithms, under the assumption that the speech segments can be appro...
详细信息
ISBN:
(纸本)9781424442959
In this paper bootstrap resampling techniques are applied to assess speech quality and thereby evaluate performance of distinct speech enhancement algorithms, under the assumption that the speech segments can be approximated by an autoregressive model. A bootstrap-based multiple hypotheses testing procedure is constructed to test a distance measure based on linear predictive coding, which is the log-likelihood ratio distance. It is shown that the multiple hypotheses test results correlate well with conventional numerical distance measures, which suggests the applicability of the proposed procedure in assessment of speech quality as well as speech enhancement algorithms.
This paper presents two low-complexity tools used for the new ITU-T recommendation G.711.0, which is the standard for lossless compression of G.711 (A-law/Mu-law logarithmic PCM) speech data. One is an algorithm for q...
详细信息
ISBN:
(纸本)9781424464258;9780769539942
This paper presents two low-complexity tools used for the new ITU-T recommendation G.711.0, which is the standard for lossless compression of G.711 (A-law/Mu-law logarithmic PCM) speech data. One is an algorithm for quantizing the PARCOR/reflection coefficients and the other is an estimation method for the optimal prediction order. Both tools are based on a criterion that minimizes the entropy of the prediction residual signals and can be implemented in a fixed-point low-complexity algorithm. G.711.0 with the developed practical tools will be widely used everywhere because it can losslessly reduce the data rate of G.711, the prevailing speech-coding technology.
This paper describes two low-complexity tools used for the new ITU-T recommendation G.711.0, the lossless coding of G.711 (A-law/mu-law logarithmic PCM) speech data. One is an algorithm for quantizing the PARCOR/refle...
详细信息
ISBN:
(纸本)9781424442959
This paper describes two low-complexity tools used for the new ITU-T recommendation G.711.0, the lossless coding of G.711 (A-law/mu-law logarithmic PCM) speech data. One is an algorithm for quantizing the PARCOR/reflection coefficients and the other is an estimation method for the optimal prediction order. Both tools are based on a criterion that minimizes the entropy of the prediction residual signals and can be implemented in a fixed-point low-complexity algorithm. G.711.0 with the developed practical tools will be widely used everywhere because it can losslessly reduce the data rate of G.711, the prevailing speech-coding technology.
Sliding a probe over a textured surface generates a rich collection of vibrations that one can easily use to create a mental model of the surface. Haptic virtual environments attempt to mimic these real interactions, ...
详细信息
Sliding a probe over a textured surface generates a rich collection of vibrations that one can easily use to create a mental model of the surface. Haptic virtual environments attempt to mimic these real interactions, but common haptic rendering techniques typically fail to reproduce the sensations that are encountered during texture exploration. Past approaches have focused on building a representation of textures using a priori ideas about surface properties. Instead, this paper describes a process of synthesizing probe-surface interactions from data recorded from real interactions. We explain how to apply the mathematical principles of linear predictive coding (LPC) to develop a discrete transfer function that represents the acceleration response under specific probe-surface interaction conditions. We then use this predictive transfer function to generate unique acceleration signals of arbitrary length. In order to move between transfer functions from different probe-surface interaction conditions, we develop a method for interpolating the variables involved in the texture synthesis process. Finally, we compare the results of this process with real recorded acceleration signals, and we show that the two correlate strongly in the frequency domain.
Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. Song and music discrimination play a significant role in multimedia applications...
详细信息
Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. The problem of identifying sections of singer voice and instrument signals is addressed in this paper. It must therefore be able to detect when a singer starts and stops singing. In addition, it must be efficient in all circumstances that the interpreter is a man or a woman or that he or she has a different register (soprano, alto, baritone, tenor or bass), different styles of music and independent of the number of instruments. Our approach does not assume a priori knowledge of song and music segments. We use simple and efficient threshold-based distance measurements for discrimination. Linde-Buzo-Gray vector quantization algorithm and Gaussian Mixture Models (GMMs) are used for comparison purposes. Our approach is validated on a large experimental dataset from the music genre database RWC that includes many styles (25 styles and 272 minutes of data).
In this paper, a new nonlinear feature extraction method based on the WTMM (wavelet transform modulus-maxima method) is proposed, which can greatly facilitate the extraction of the multifractal spectrum feature (MSF) ...
详细信息
In this paper, a new nonlinear feature extraction method based on the WTMM (wavelet transform modulus-maxima method) is proposed, which can greatly facilitate the extraction of the multifractal spectrum feature (MSF) from speech signals. The MSF combined with traditional linear features can obviously improve the performance of speaker recognition system. Experiment results show that 6-dimensional MSF combined with LPC make recognition accuracy increase 6.4 percentage points, and 6-dimensional MSF combined with MFCC, LPC make recognition accuracy increase 1.6 percentage points and reach 98.8% in short speech (2 seconds) speaker recognition.
Motivated by the rapid increase of VoIP services with G.711 for telephone speech, a new ITU-T recommendation, G.711.0 (frame-wise stateless lossless compression scheme for G.711 log PCM symbols), has been standardized...
详细信息
ISBN:
(纸本)9781424464258;9780769539942
Motivated by the rapid increase of VoIP services with G.711 for telephone speech, a new ITU-T recommendation, G.711.0 (frame-wise stateless lossless compression scheme for G.711 log PCM symbols), has been standardized. The standard scheme has several coding parts, each of which is adaptively selected depending on the characteristics of the input. Among them, the mapped domain prediction part is the one most frequently activated for normal speech signals. This part consists of linear prediction in the mapped domain and variable length coding of the prediction residual. It is useful for log-compressed/expanded signal, such as ITU-T G.711. This paper describes three newly devised enhancement tools for the coding of prediction residual signals: progressive order prediction, quantized prediction order, and adaptive and sub-frame base coding for separation parameters. The design criterion is the maximization of the averaged FoM (figure of merit) over frame lengths of 40, 80, 160, 240, and 320 samples. The first tool, progressive order prediction associated with the adaptive modification of the separation parameter for the first and second samples, enhances the compression ratio by 0.5 % with a negligible increase of the complexity. The second tool, quantized prediction order, improves the compression ratio by 0.2 % with even reduced complexity. The third tool, sub-frame base adaptive coding of separation parameters, gives a 0.2 % improvement in the compression ratio with comparable complexity. All three schemes are consistently and independently effective for improving the compression ratio, although the amount of improvement with each tool is small. At the same time, none of the tools have any significant impact on computational complexity. Therefore, all the devised tools improve the FoM and have been adopted in the mapped domain prediction part of the ITU-T G.711.0 standard.
In speech coding, segment vocoders offer good intelligibility at low bit rates. A segment vocoder has four basic components 1) Segmentation of input speech 2) Segment quantization 3) Residual quantization 4) Synthesis...
详细信息
ISBN:
(纸本)9781424463831;9781424463855
In speech coding, segment vocoders offer good intelligibility at low bit rates. A segment vocoder has four basic components 1) Segmentation of input speech 2) Segment quantization 3) Residual quantization 4) Synthesis of speech. Most segment vocoders use a recognition approach to segment quantization. In this paper, we assume a different approach to segment quantization. The segmental unit is a syllable and the segment codebook stores the sequence of LPC vectors. During the encoding process the speech segment is quantized using the sequence of LPC vectors that result in the smallest residual energy. PESQ scores indicate that this vocoder achieves better quality compared to that of a corresponding vocoder that uses a speech recognition framework.
Traditionally, linear Prediction is used to predict future values of a signal using past values. The goal is to minimize prediction errors. In this paper, we propose a novel method of utilizing prediction errors to ex...
详细信息
Traditionally, linear Prediction is used to predict future values of a signal using past values. The goal is to minimize prediction errors. In this paper, we propose a novel method of utilizing prediction errors to extract edges of images. In this method, smooth prediction errors are minimized while steep changes (larger errors) are amplified. Therefore, when applied to image edge detection, edge information can be accurately extracted. The proposed method is compared with predominant methods such as Sobel and Canny methods. While there is no mathematical proof that the proposed method outperforms predominant methods, however, examples presented in this paper may suggest that the proposed method may perform better for certain applications.
In conventional automatic speech recognition systems, linguistic information of the speech signal are usually acquired from short-time frames about 10-30 ms. In this paper we have proposed two novel methods extracting...
详细信息
ISBN:
(纸本)9781424481835
In conventional automatic speech recognition systems, linguistic information of the speech signal are usually acquired from short-time frames about 10-30 ms. In this paper we have proposed two novel methods extracting the long-term information of the speech signal. Both of the methods are based on "sub-band FDLP" which divides the long-time frame of signal into several sub-bands. Using the MFCC algorithm, we are able to represent the long-term temporal features of the each sub-band. Our results show that the proposed methods could improve the recognition ratio by %1.73. The proposed methods were evaluated using the FarsDat database and the method's robustness against different conditions of noise was experimented.
暂无评论