We report on the synthesis of speech in the context of a phonetic vocoder operating at 100 b/s. With each phoneme, the vocoder transmits the duration and a single pitch value. The synthesizer uses a large inventory of...
详细信息
We report on the synthesis of speech in the context of a phonetic vocoder operating at 100 b/s. With each phoneme, the vocoder transmits the duration and a single pitch value. The synthesizer uses a large inventory of diphone "models" to synthesize a desired phoneme string. The diphone inventory has been selected to differentiate between prevocalic and postvocalic allophones of sonorants, to account for changes in vowel color conditioned by postvocalic liquids, to allow exact specification of voice onset time, and to permit synthesis of glottal stops alveolar flaps and syllabic consonants. The diphones are extracted from carefully constructed short utterances and are stored as a sequence of LPC parameters. During synthesis, the requisite diphone models are time-warped, abutted and smoothed to produce a complete sequence of LPC parameters that are used in the synthesis. The algorithms used are described and compared with more conventional methods. Examples of the synthesized speech will be played.
The goal of this study was to develop an effective and computationally inexpensive method of enhancing the linear prediction analysis/synthesis of noisy speech. To this end, a preprocessing filter has been proposed th...
详细信息
The goal of this study was to develop an effective and computationally inexpensive method of enhancing the linear prediction analysis/synthesis of noisy speech. To this end, a preprocessing filter has been proposed that is capable of perfectly removing the "expected" noise signal when the input speech spectrum is closely approximated by the noisy speech spectrum. The proposed filter has been evaluated by the linear prediction distance measure, perceptual listening, and spectrograms. This evaluation has demonstrated the effectiveness of the filter for broadband noise removal. The filter has also been implemented as a preprocessing filter in a real time LPC system. The total processing time for the filtering is only 2.6 msec per 22.5 msec frame. In this system, the LPC analysis and synthesis takes a combined time of 13 msec.
Recently we described a variable-frame-rate LPC vocoder designed to transmit good quality speech over 2400 bps fixed-rate noisy channels with bit-error probabilities ranging up to 5% [3]. The basic idea was to lower t...
详细信息
Recently we described a variable-frame-rate LPC vocoder designed to transmit good quality speech over 2400 bps fixed-rate noisy channels with bit-error probabilities ranging up to 5% [3]. The basic idea was to lower the data rate by transmitting LPC parameters only when speech characteristics have changed sufficiently since the last transmission, and to employ the resulting bit-rate savings for protecting important transmission data against channel noise. This paper describes our continuing efforts which have concentrated on minimizing loss of synchronization between the receiver and the transmitter. In one approach, we emphasize heavy protection of header, and rapid resynchronization. Alternatively, we apply constraints which guarantee synchronization at a cost of some freedom in the selection of data for transmission. Results from the first approach are presented; results from both methods will be compared at the conference.
This paper describes a linearpredictive coder (LPQ and its microprocessor fabrication. The LPC has an audio bandwidth of 3200 Hz, uses the autocorrelation formulation of LPC to determine the short term spectrum, and ...
详细信息
This paper describes a linearpredictive coder (LPQ and its microprocessor fabrication. The LPC has an audio bandwidth of 3200 Hz, uses the autocorrelation formulation of LPC to determine the short term spectrum, and an Average Magni- tude Difference Function (AMDF) to extract pitch. A two multiplier/stage lattice filter at the receiver recreates the speech. The 2400 b/s full duplex LPC is implemented in the firmware of a horizontally coded microprocessor having a 48-bit instruction word and 16-bit data word. The processor architecture uses a 4-bit TTL ALU slice and a hardwared 16 x 16 bit parallel multiplier to rapidly process the data with relatively slow multiplication circuitry. With the chosen architecture, the LPC requires only 60 percent of the proces sor's capacity, while the processor itself has fewer than 150 integrated components. The low cost of the voice digitizer has resulted in commercial sales in a market that appears to be growing.
Performance of narrowband speech communications systems, such as linear predictive coding (LPC), is often severely degraded by the presence of ambient acoustic noise in the input speech signal. Spectral subtraction te...
详细信息
Performance of narrowband speech communications systems, such as linear predictive coding (LPC), is often severely degraded by the presence of ambient acoustic noise in the input speech signal. Spectral subtraction techniques show promise in improving the overall performance of LPC in acoustic noise environments, but typically present annoying musical tones at the output. A spectral subtraction technique is described, which includes a biased estimate of the noise, that does not present musical tones at the output. In addition, an automatic speech activity detector is described and used to adapt the noise estimate to changing noise environments.
A speech bandwidth compression system is described which employs a hybrid processing approach combining a baseband system and a linear predictive coding system to produce high quality speech at a transmission rate of ...
详细信息
A speech bandwidth compression system is described which employs a hybrid processing approach combining a baseband system and a linear predictive coding system to produce high quality speech at a transmission rate of 7.2 Kbps. The system requires the extraction and transmission of excitation parameters, but is not very sensitive to errors in those parameters nor is it particularly sensitive to errors in the baseband portion of the processing. Since the system is composed of two independent processes, errors in one process have no effect on the other process and the system is remarkably robust. The system is essentially a modification of a similar system which was operated at 16 Kbps (1) which utilizes unique coding techniques to reduce the bit rate to 7.2 Kbps.
The paper describes the software architecture of an Italian text-to-speech synthesis system based on the joining of LPC coded diphones. The automatic voice response system is designed according to multichannel and rea...
详细信息
The paper describes the software architecture of an Italian text-to-speech synthesis system based on the joining of LPC coded diphones. The automatic voice response system is designed according to multichannel and real time criteria. For each output channel, the following operations are performed: pre-processing of the input string of characters, translation into the proper sequence of diphones, generation of prosodic contours and real-time control of a hardware speech synthesizer.
A monolithic CCD adaptive filter chip is described which implements the Widrow-Hoff "clipped-data" LMS adaptive algorithm. The chip can be used as a pre-filter noise canceller, analysis filter, or pre-whiten...
详细信息
A monolithic CCD adaptive filter chip is described which implements the Widrow-Hoff "clipped-data" LMS adaptive algorithm. The chip can be used as a pre-filter noise canceller, analysis filter, or pre-whitener for a pitch extractor in linear prediction coding (LPC) voice bandwidth reduction systems.
A new version of the Residual Excited linearpredictive (RELP) vocoder has been simulated. The objective has been to reduce the data rate required for good quality speech to 4.8 kbps. Results have indicated that it is...
详细信息
A new version of the Residual Excited linearpredictive (RELP) vocoder has been simulated. The objective has been to reduce the data rate required for good quality speech to 4.8 kbps. Results have indicated that it is possible to remove the hoarseness currently associated with low data rate RELP speech. Development of a pitch predictive ADPCM residual encoder and preliminary results on new harmonic generation techniques are discused. Taped demonstrations will be played at the conference.
A training sequence of speech data is used to design a two-step speech compression system, based upon either single speakers or multiple speakers. The system is designed to minimize an average spectral distortion over...
详细信息
A training sequence of speech data is used to design a two-step speech compression system, based upon either single speakers or multiple speakers. The system is designed to minimize an average spectral distortion over the training sequence, leading to an identification step using linear prediction techniques followed by a vector quantizer. The system is then used to compress test sequences of speech data, leading to much lower bit rates than obtained using scalar quantization for equivalent distortions. For the same numerical distortion, 20-bits/frame were required using "optimal" scalar bit allocation and quantization, whereas 8-bits/frame were required using vector quantization. Results are presented in the form of numerical distortion measures and analog tapes of synthesized speech.
暂无评论