Konstantinides and Yao have considered the problem of rank determination by use of effective singular values. In this correspondence, we show how to use the minimum description length criterion of Rissanen to provide ...
详细信息
Konstantinides and Yao have considered the problem of rank determination by use of effective singular values. In this correspondence, we show how to use the minimum description length criterion of Rissanen to provide an alternative means of estimating the index of the smallest nonzero singular value of a matrix when given estimates of the singular values.
This paper provides an analytical derivation of a simple noniterative technique for extracting a multiple impulse excitation model for synthesized speech directly from the LPC residual sequence. While suboptimal with ...
详细信息
This paper provides an analytical derivation of a simple noniterative technique for extracting a multiple impulse excitation model for synthesized speech directly from the LPC residual sequence. While suboptimal with respect to "multipulse" techniques, this method is very applicable for speech enhancement where processor capability is limited. The results suggest an additional "orthogonality" requirement between the excitation sequence and the resulting prediction error, which aids in the intuitive understanding of the method.
The mixed excitation linear prediction (MELP) algorithm has been recently selected as the new federal standard for 2.4kbit/s coding of speech signals. The authors exploit the average residual inter-frame correlation a...
详细信息
The mixed excitation linear prediction (MELP) algorithm has been recently selected as the new federal standard for 2.4kbit/s coding of speech signals. The authors exploit the average residual inter-frame correlation and the error sensitivities of the bits in a MELP frame to enhance the robustness of the proposed joint MELP turbo coding schemes for Operations over Rayleigh fading channels.
The diagnostic rhyme test (DRT) is widely used to evaluate digital voice systems. Would-be users often have no reference frame for interpreting DRT scores in terms of performance measures that they can understand, e.g...
详细信息
The diagnostic rhyme test (DRT) is widely used to evaluate digital voice systems. Would-be users often have no reference frame for interpreting DRT scores in terms of performance measures that they can understand, e.g., how many operational words are correctly understood. This research was aimed at providing a better understanding of the effects of very poor quality speech on human communication performance. It is especially important to determine how successful communications are likely to be when the speech quality is severely degraded. This paper compares the recognition of ICAO spelling alphabet words (ALFA, BRAVO, CHARLIE, etc.) to DRT scores for the same conditions. Confusions among the spelling alphabet words are also given. The voice conditions included unprocessed speech, speech processed through the DoD standard linear predictive coding algorithm operating at 2400 bits/s with random bit error rates of 0, 2, 5, 8, and 12 percent, and an 800 bit/s pattern matching algorithm. The results suggest that with distinctive vocabularies like the ICAO spelling alphabet, word intelligibility can be expected to remain very high even when DRT scores fall into the poor range; but once the DRT scores fall below about 75, the intelligibility can be expected to fall off rapidly; and at scores below 50, less than half the words will also be understood.
linearpredictive hidden Markov models have proved to be an efficient way for statistically modeling speech signals. The possible application of such models to statistical characterization of the speaker himself is de...
详细信息
linearpredictive hidden Markov models have proved to be an efficient way for statistically modeling speech signals. The possible application of such models to statistical characterization of the speaker himself is described and evaluated. The results show that even with a short sequence of only four isolated digits, a speaker can be verified with an average equal-error rate of less than 3%. These results are slightly better than the results obtained using speaker dependent vector quantizers, with comparable numbers of spectral vectors. The small improvement over the vector quantization approach indicates the weakness of the Markovian transition probabilities for characterizing speaker dependent transitional information.
We address here the classical bearings-only tracking problem (BOT) for a single target, an issue that belongs to the general class of nonlinear filtering problems. Recently, algorithm-based sequential Monte-Carlo meth...
详细信息
We address here the classical bearings-only tracking problem (BOT) for a single target, an issue that belongs to the general class of nonlinear filtering problems. Recently, algorithm-based sequential Monte-Carlo methods (particle filtering) have been proposed. However, Fearnhead has observed that in practice this algorithm diverges. This problem is investigated further here. We show that this phenomenon is due to the unobservability of the distance between the observer and the target. We propose a new algorithm named hierarchical particle filter which takes into account this aspect of the BOT. We demonstrate that this novel filter architecture largely overperforms the classical one. Moreover, these results are confirmed when considering highly maneuvering target scenarios. Finally, we propose a general architecture based on Monte-Carlo methods for filtering initialization, able to accommodate poor prior and complex constraints.
The proliferation of the Internet and instant messaging has elevated the use of voice communication, posing challenges in legal and forensic contexts. This study delves into the impact of four social media platforms (...
详细信息
The proliferation of the Internet and instant messaging has elevated the use of voice communication, posing challenges in legal and forensic contexts. This study delves into the impact of four social media platforms (WhatsApp, Instagram, Snapchat, and Telegram) on the acoustic properties of vowel sounds. Forty participants, evenly split between 20 males and 20 females, were recorded producing English monophthongs vowels under five conditions: using a mobile phone, WhatsApp, Instagram, Telegram, and Snapchat. Utilizing Multispeech and Praat speech acoustic software, we analyzed formant frequencies, employed statistical F tests, and applied machine learning techniques to assess the influence of social media applications on formant frequency (F1, F2, F3, and F4). The study findings demonstrated that F1, F2, and F3 effectively distinguished vowels, with accuracy rates ranging from 100% to 75% when considering formant frequency. However, in the formant listing, F2, F3, and F4 emerged as reliable markers for identifying vowels, achieving similar accuracy rates. Notably, vowel spaces constructed from mean formant frequencies exhibited distinct patterns, thereby accentuating differences between male and female speakers. Random Forest proved to be the top -performing machine learning approach in gender differentiation, consistently delivering high accuracy rates (ranging from 88% to 93%) and Area Under the Curve (AUC) values between 0.92 and 0.96. Remarkably, Random Forest exhibited strong performance across various algorithms, maintaining its effectiveness despite formant variations.
This paper presents an acoustic model-based lossy pole-zero modeling for speech signals, which overcomes the limitation in the existing lossless pole-zero model that forced the numerator part of the pole-zero transfer...
详细信息
This paper presents an acoustic model-based lossy pole-zero modeling for speech signals, which overcomes the limitation in the existing lossless pole-zero model that forced the numerator part of the pole-zero transfer function to be symmetric, We derive the lossy pole-zero model and its transfer function by employing the wave digital filter (WDF) adaptor formulas and by converting the fixed termination value -1 to a loss factor mu(0)(c) is an element of (-1, 1). Then we discuss how to determine the reflections coefficients of the lossy pole-zero model, For this we first employ a well-performing ARMA modeling algorithm for a pole-zero type estimation of the given speech signal and then fit the transfer function of the lossy pole-zero model to that of the ARMA model under Euclidean cost function. This procedure is demonstrated by an example using the Steiglitz-McBride ARMA estimation method with a synthetic speech signal. The lossy pole-zero modeling yields a new filter structure-namely, three-branch lattice structure-which consists of three lattice branches with the third branch terminated by the loss factor mu(0)(c) is an element of [-1, 1] and with the three branches connected by a three-port wave adaptor characterized by the area ratio sigma is an element of [0, 1]. The three-branch lattice structure is a general filter structure which becomes tbe lossless pole-zero structure when mu(0)(c) = -1 and becomes the existing all-pole lattice structure when sigma = 0.
This paper describes the design of an underwater acoustic diver communication system controlled by a digital signal processor. The speech signal transmission rate is compressed by using linear predictive coding (LPC) ...
详细信息
This paper describes the design of an underwater acoustic diver communication system controlled by a digital signal processor. The speech signal transmission rate is compressed by using linear predictive coding (LPC) and the extracted parameters are transmitted through the water to a synchronized receiver by employing digital pulse position modulation (DPPM). The pulse position in each time frame is estimated by an energy detection and decision algorithm which enables the received LPC parameters to be recovered and used to synthesize the speech signal.
A VLSI processor is designed for the small-scale isolated speech recognition applications. It is a dedicated processor which detects endpoint, extracts LPC(linearpredictive Coefficient) cepstral coefficients from the...
详细信息
A VLSI processor is designed for the small-scale isolated speech recognition applications. It is a dedicated processor which detects endpoint, extracts LPC(linearpredictive Coefficient) cepstral coefficients from the speech signal, and computes the spectral distances using a dynamic time warping(DTW) technique. The designed chip can recognize 1000 isolated words per second with an average recognition accuracy of 90.3%. It is designed in a 0.8 mu m CMOS technology, includes 66,760 gates, and runs with a 10MHz clock.
暂无评论