This article reports the design and implementation of a graphical display that presents an approximation to vocal tract area in real time for voiced vowel articulation. The acoustic signal is digitally sampled by the ...
详细信息
This article reports the design and implementation of a graphical display that presents an approximation to vocal tract area in real time for voiced vowel articulation. The acoustic signal is digitally sampled by the system. From these data a set of reflection coefficients is derived using linear predictive coding. A matrix of area coefficients is then determined that approximates the vocal tract area of the user. From this information a graphical display is then generated. The complete cycle of analysis and display is repeated at approximately 20 times/s. Synchronised audio and visual sequences can be recorded and used as dynamic targets for articulatory development. Use of the system is illustrated by diagrams of system output for spoken cardinal vowels and for vowels sung in a trained and untrained style.
In this correspondence an efficient search algorithm for generating syllable hypotheses in continuous Mandarin speech recognizer is proposed. The fast syllable hypothesizing algorithm consists of a forward, time-synch...
详细信息
In this correspondence an efficient search algorithm for generating syllable hypotheses in continuous Mandarin speech recognizer is proposed. The fast syllable hypothesizing algorithm consists of a forward, time-synchronous, modified Viterbi search and a parallel backtracking procedure. This algorithm can generate a lattice of syllable hypotheses with very low time delay. The superiority of the new algorithm to other direct sentence hypothesis algorithms is also experimentally demonstrated in terms of search efficiency and accuracy.
We show that HMM word recognition using Deller and Snider's "any path" procedure makes an assumption of independence that is not made by either the forward or Viterbi algorithms. We also point out that a...
详细信息
We show that HMM word recognition using Deller and Snider's "any path" procedure makes an assumption of independence that is not made by either the forward or Viterbi algorithms. We also point out that additional savings in execution time can be achieved by precomputation.
In the real-time speech recognition, the predictor coefficients of speech signals are used as the recognizing features and should be computed faster than the sampling rate. Under such performance constraint, the objec...
详细信息
ISBN:
(纸本)0818628456
In the real-time speech recognition, the predictor coefficients of speech signals are used as the recognizing features and should be computed faster than the sampling rate. Under such performance constraint, the objective is to design this circuit as cheap as possible. Autocorrelation method is adopted for computing the coefficients because of its stability of the results and regular computations. A two-step pipelined functional unit is designed for calculating those regular computations. For the divisions needed in autocorrelation method, the prune-and-search approach is used to compute them. Because this approach can also be performed by the identical pipelined functional unit, the extra operator for the divisions is not required. Although the authors have designed this chip with least storage element and functional unit, however, the performance is still much faster than the real-time request.< >
The control system status features are extracted by using fast recursive least-squares lattice (RLSL) and block linear predictive coding algorithm (LPCA), respectively. A short time record of control system signal is ...
详细信息
The control system status features are extracted by using fast recursive least-squares lattice (RLSL) and block linear predictive coding algorithm (LPCA), respectively. A short time record of control system signal is captured and the future trend of the system is predicted by analyzing the variations of the estimated auto regressive and moving average (ARMA) model parameters. The comparison of two methods applied in system performance monitoring with real-time data cases are given and their performance are evaluated.
Several notational errors in the paper by Y.-T. Lee (see IEEE Trans. Signal Processing, vol.39, p.330-5, Feb. 1991) are corrected. The energy E/sub i/, i=t, r defined in equation (2), when used in equations (6) and (9...
详细信息
Several notational errors in the paper by Y.-T. Lee (see IEEE Trans. Signal Processing, vol.39, p.330-5, Feb. 1991) are corrected. The energy E/sub i/, i=t, r defined in equation (2), when used in equations (6) and (9), should be changed to E/sub i/=E*/sub i/. Similarly, in equation (19), it should be changed to E/sub i/=E˜/sub i/. In addition, the following note will help clarify the above paper. The distortion measures defined in equations (3), (4), and (5) are "marginal" distances (distortions). The property of the marginal distortions is explained in more detail.
linear predictive coding (LPC), and transformations of it, is currently the most popular way of analysing speech signals. Major limitations of using a frame-based technique are that each frame is analysed in isolation...
详细信息
linear predictive coding (LPC), and transformations of it, is currently the most popular way of analysing speech signals. Major limitations of using a frame-based technique are that each frame is analysed in isolation of the rest while assuming the excitation source to be a white noise process. In order to reduce computation time, an all pole model is usually employed. In the present project an adaptive algorithm is proposed for speech signal analysis. The algorithm is based on the recursive least squares method with a variable forgetting factor. A pole-zero model is used to estimate the anti-formants present in certain sounds (i.e. nasals and nasalized vowels). This method offers better detection of poles and zeros in stationary environments and faster tracking of pole and zero frequencies in nonstationary signals than other sequential methods. An effective input estimation algorithm eliminates the influence of pitch on the parameter estimates by assuming the input to be a white noise process or a pulse sequence.
A two-level method is proposed in this study for rapidly and accurately computing the line spectrum pair (LSP) frequencies. An efficient decimation-in-degree (DID) algorithm is also proposed in the first level which c...
详细信息
ISBN:
(纸本)0780324404
A two-level method is proposed in this study for rapidly and accurately computing the line spectrum pair (LSP) frequencies. An efficient decimation-in-degree (DID) algorithm is also proposed in the first level which can transform any symmetric or antisymmetric polynomial with real coefficients into the other polynomials with lower degrees and without any transcendental functions. The DID algorithm not only can avoid prior storage or large calculation of transcendental functions but can also be easily applied towards those fast root-finding methods. In the second level, the Newton-Raphson method is applied. The process of the Newton-Raphson method can be accelerated by adopting a deflation scheme along with the interlacing property of LSP frequencies for selecting the better initial values. A few conventional numerical methods are also implemented to make a comparison with the two-level method. Experimental results indicate that the two-level method is the fastest one.
Addresses the application of nonlinear prediction to speech enhancement, considering 3 common cases of speech degraded by distinctly nonlinear system (CELP coder), the addition of (Gaussian) noise, and convolution by ...
详细信息
Addresses the application of nonlinear prediction to speech enhancement, considering 3 common cases of speech degraded by distinctly nonlinear system (CELP coder), the addition of (Gaussian) noise, and convolution by a linear system. A time-domain nonlinear predictor, in the form of an MLP neural net structure, is applied as an enhancer and the performance examined in all three cases, with respect to the influence of nonlinearity and net topologies. Experimental results show that, in the case of low bit-rate CELP coder degradation, nets with multiple outputs give significant improvement over single-output structures. It is also clear that in this case, i.e. nonlinear degradation, the matching nonlinear enhancer is consistently better than the equivalent linear structures. In contrast, when the degradation is from additive noise, the (matching) linear enhancer is superior. Again, the multiple output cases give the best results. There is less consistence in the case of linear system degradation. It is observed that the most significant factor here is the number of input nodes which in turn reflects the chosen linear system characteristics.< >
A system has been developed allowing the simultaneous communication of full-duplex speech and multiple electrocardiograms in real time. A single bandlimited channel is used, a dial-up PSTN line providing a 3 KHz bandw...
详细信息
A system has been developed allowing the simultaneous communication of full-duplex speech and multiple electrocardiograms in real time. A single bandlimited channel is used, a dial-up PSTN line providing a 3 KHz bandwidth. The full-duplex speech is compressed to 2400 b/s using linear predictive coding whilst multiple electrographic signals are processed by a novel ECG compression technique. Extensive use of digital signal processing reduces the combined bit rate to less than 9600 b/s, allowing the use of a cost effective commercial modem. The communication system allows a hospital based clinician to provide diagnostic and treatment advice to a remote location, thus improving patient care.< >
暂无评论