We experimentally evaluated an active speech control scheme which reduces unnecessary speech radiated into the surrounding space. The intended application of this system, typically cellular phones, does not require sp...
详细信息
We experimentally evaluated an active speech control scheme which reduces unnecessary speech radiated into the surrounding space. The intended application of this system, typically cellular phones, does not require speech to be radiated into the surrounding space, but only into the microphone. We previously proposed to reduce speech by generating phase-inverted predicted speech from a secondary loudspeaker. We used LPC recursively to predict samples ahead of the associated processing delay, which could go up to a few milliseconds. First, predicted samples of recorded speech were prepared off line. Then, both the original and the phase-inverted predicted samples were played out simultaneously from two loud speakers. It was found that: 1) speech cancellation of 10 dB is possible, but is highly speaker dependent; 2) the secondary loudspeaker should be oriented in the same direction as the primary source, i.e., the mouth for maximum cancellation.
The conventional LPC spectral smoothing algorithm causes an evident degradation in the speech quality when the smoothing amount is large. To improve speech quality of the smoothed speech, we proposed a new spectral sm...
详细信息
The conventional LPC spectral smoothing algorithm causes an evident degradation in the speech quality when the smoothing amount is large. To improve speech quality of the smoothed speech, we proposed a new spectral smoothing algorithm. The source LPC spectral envelopes are first interpolated to generate the smoothed target spectra. Then the sinusoidal + all-pole modification is performed on the source speech to get the spectra of the modified speech which will coincide with the target spectra. Experimental results show that this method can get smooth spectral envelope even if the speech boundaries have large spectral distance. Experimental results prove that this algorithm is effective on avoiding degradation in quality of smoothed speech.
We study a modified version of a computational model of the human peripheral and central auditory system (Wang, K. and Shamma, S.A., 1995; Yang, X. et al., 1992), and examine the validity of its output from two practi...
详细信息
ISBN:
(纸本)0780388747
We study a modified version of a computational model of the human peripheral and central auditory system (Wang, K. and Shamma, S.A., 1995; Yang, X. et al., 1992), and examine the validity of its output from two practical perspectives. One considers the well-known Mel-frequency cepstral coefficients (MFCC) as an approximate representation of the physiology-based early auditory processing result. The other allows the derivation of feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition to confirming the relevancy of the model under an existing statistical speech recognition framework, we conduct a preliminary study of the cortical response in connection with known physiological studies, to find new possibilities in using the auditory model to perform cognitive functions based on a better understanding of the human auditory system. In particular, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. The results of this study encourage us to develop hierarchical, detection-based methods in which this mechanism may be utilized to simulate a variety of human perceptual and cognitive functions.
In packet voice communication systems, silence suppression algorithms are employed to achieve bandwidth efficiency by suppressing the inactive part of the speech. In this paper, we consider a Nortel fixed-point silenc...
详细信息
In packet voice communication systems, silence suppression algorithms are employed to achieve bandwidth efficiency by suppressing the inactive part of the speech. In this paper, we consider a Nortel fixed-point silence suppression algorithm to investigate means of reducing the computational complexity with a little or no loss in the performance. In view of the compact structure and superior performance of the existing Nortel algorithm, certain modifications are introduced without altering the structure of the original design. Based on these modifications, two algorithms are proposed. Results from objective and subjective quality tests show that the performance of each of the two modified algorithms is comparable to that of the Nortel algorithm. As for the average computational complexity, simulation tests based on TMS320C5402 DSK show that reduction of about 40% and 22% can be achieved for the two modified algorithms.
Intra-operative automated recognition of deep brain stimulation (DBS) targets from microelectrode recordings would improve the safety, efficiency, standardization, and accuracy of the surgical procedure. Our approach ...
详细信息
Intra-operative automated recognition of deep brain stimulation (DBS) targets from microelectrode recordings would improve the safety, efficiency, standardization, and accuracy of the surgical procedure. Our approach to the cellular classification problem is from a speech recognition perspective where linearpredictive coefficient (LPC) analysis is used to model segments of thalamic and subthalamic nucleus cellular activity. We then cluster the linear prediction coefficients for three Parkinson's disease patients and develop discriminant surfaces with an artificial neural network to generate the target classes. The methods presented here yielded a significant separation of the cell types within a two-dimensional prediction coefficient data space. The results indicate that LPC analysis for DBS targeting warrants additional study for a larger variety of deep brain structures and patients
Noise mitigation systems for speech coding and recognition have primarily focused on spectral subtraction techniques due to their well understood behavior and computational simplicity. As computation complexity become...
详细信息
Noise mitigation systems for speech coding and recognition have primarily focused on spectral subtraction techniques due to their well understood behavior and computational simplicity. As computation complexity becomes a smaller constraint, understanding the characteristics of different estimation schemes becomes more important. The merits of two algorithms based on direct estimation of the linear prediction spectrum of a speech signal are explored. These algorithms are maximum likelihood (ML) and minimum mean square error estimation (MMSE) of the autoregressive speech spectrum. The MMSE algorithm is able to improve objective quality effectively at low SNRs while also improving the speech recognition accuracy by 20-30% on the Aurora2 test set at the cost of requiring two orders of magnitude more operations than the ML method. Because of these improvements, autoregressive based algorithms should be considered in the future for noise robust speech processing tasks.
The objective of this paper is to illustrate the details of optimization and real-time implementation of ITU's G.728 on C64x DSPs. First using pseudo codes provided by CCITTs published documents, we implemented th...
详细信息
The objective of this paper is to illustrate the details of optimization and real-time implementation of ITU's G.728 on C64x DSPs. First using pseudo codes provided by CCITTs published documents, we implemented the algorithm in C language. This implementation was performed for G and H (12.8 and 9.6 kbit/s) annexes. Next we optimized the written codes for implementation on DSP. At first stage, using different techniques based on DSPs' hardware characteristics, we rewrote and changed the most time-consuming parts of our codes in order to reduce their execution time. At second stage, we balanced the computational load of G.728 coder algorithm by splitting the Durbin's recursion for synthesis filter between different input speech vectors. In each stage, we verified the correctness of our implementation by testing our codes against testing vectors provided by ITU. Applying the above mentioned methods enabled us to optimize the C codes into 22.7 MIPS in worst case. At the end we also implemented the optimized codes in real-time on a DSK6416
A Web service is a software system interoperating with each other in a platform independence manner. COM (component object model) enables components to interoperate with each other. Based on the similarities between a...
详细信息
A Web service is a software system interoperating with each other in a platform independence manner. COM (component object model) enables components to interoperate with each other. Based on the similarities between a Web service and COM, we provide a system that helps users who get started learning the concept of Web services by experiencing and seeing the process involved in using Web services.
An approach for estimating the perceptually-relevant pole locations is described. These "perceptual poles" are determined by using an auditory excitation pattern-matching method. The estimated perceptual pol...
详细信息
An approach for estimating the perceptually-relevant pole locations is described. These "perceptual poles" are determined by using an auditory excitation pattern-matching method. The estimated perceptual poles are then used to construct a perceptually-motivated all-pole (PMAP) filter for use in speech analysis/synthesis. The proposed PMAP approach is compared against some of the existing perceptually-based linear prediction (LP) methods, i.e., the perceptual LP and the warped LP. The PMAP approach compares well against the perceptual LP and the warped LP in terms of speech reconstruction quality and estimation of the formant frequencies.
暂无评论