The effect of filtering the time trajectories of spectral envelopes on speech intelligibility was investigated. Since the LPC cepstrum forms the basis of many automatic speech recognition systems, the authors filtered...
详细信息
The effect of filtering the time trajectories of spectral envelopes on speech intelligibility was investigated. Since the LPC cepstrum forms the basis of many automatic speech recognition systems, the authors filtered time trajectories of the LPC cepstrum of speech sounds, and the modified speech was reconstructed after the filtering. For processing, they applied low-pass, high-pass and band-pass filters. The accuracy results from the perceptual experiments for Japanese syllables show that speech intelligibility is not severely impaired as long as the filtered spectral components have 1) a rate of change faster than 1 Hz when high-pass filtered, 2) a rate of change slower than 24 Hz when low-pass filtered, and 3) a rate of change between 1 and 16 Hz when band-pass filtered.
The transmission of spectral information consumes a large portion of total bit rate in medium-to-low bit rate speech coding. The conventional coding methods of LSP parameters generate redundant spectral information or...
详细信息
The transmission of spectral information consumes a large portion of total bit rate in medium-to-low bit rate speech coding. The conventional coding methods of LSP parameters generate redundant spectral information or spectral distortion by the fixed update rate of LSP parameters independent of the order of coefficients and phonetic context. We propose a multiple type frame segmentation (MTFS) method which allows various types of two-dimensional segmentation of speech frames to save the transmission rate of the LSP parameters without increasing the spectral distortion. The intra-frame spectral distortion (IFSD) is defined to measure the spectral distortion of the reconstructed spectrum. The proposed method generates a less distorted spectrum with fewer bits compared with the conventional single type frame segmentation (STFS) method.
This paper presents a high quality speech coder based on the multi-band excitation (MBE) model operating at 2.4 kb/s and 1.2 kb/s. The features of this coder mainly focus on two aspects. One is an accurate and reliabl...
详细信息
ISBN:
(纸本)0780331923
This paper presents a high quality speech coder based on the multi-band excitation (MBE) model operating at 2.4 kb/s and 1.2 kb/s. The features of this coder mainly focus on two aspects. One is an accurate and reliable pitch estimation and voiced/unvoiced decision algorithm. The other is the efficient quantization of the variable dimension spectral amplitude vector. Besides representing the spectral envelope information using an all-pole model, we also encode error vectors of the spectral amplitude vector at several important positions. Informal listening tests indicate that the speech quality of this coder at 2.4 kb/s is comparable and even superior to the INMARSAT-M IMBE 4.15 kb/s coder. The coder has been implemented in real-time on a single TMS320C31 floating point DSP.
A flexible software based approach for implementing secure voice systems is presented. The system architecture is discussed as well as Implementation issues for speech coding and encryption algorithms. The latest adva...
详细信息
A flexible software based approach for implementing secure voice systems is presented. The system architecture is discussed as well as Implementation issues for speech coding and encryption algorithms. The latest advances in Digital Signal Processing (DSP) microprocessor technology are exploited to offer lower cost and reduced size secure voice systems. This approach was successfully adopted by Intracom S.A. in the design and implementation of the SecLine secure voice digital encryptor.
A two-dimensional discrete cosine transform (2-D DCT), often used for image coding, has been applied to sequences of speech spectra produced by the maximum likelihood method (MLM). The coded data was compressed by nea...
详细信息
A two-dimensional discrete cosine transform (2-D DCT), often used for image coding, has been applied to sequences of speech spectra produced by the maximum likelihood method (MLM). The coded data was compressed by nearly 90%, reducing it to a size smaller than that needed to store the coefficients of a 10th order linear predictive coding (LPC) model. The DCT-encoded data was then reconstructed and tested for intelligibility. It was found that the two-dimensional DCT method was significantly more intelligible and more natural-sounding than the LPC technique.
Summary form only given. Preliminary work has demonstrated that the temperature range of silicon components may be substantially extended by using SOI devices. However, the variation of the different device parameters...
详细信息
Summary form only given. Preliminary work has demonstrated that the temperature range of silicon components may be substantially extended by using SOI devices. However, the variation of the different device parameters in the high temperature range is not yet fully clarified, especially for small devices. We have already discussed the variation of the surface mobility with temperature and back gate bias. In this paper, we investigate the behaviour of the threshold voltage and leakage current, major parameters for reliable circuit operation, as a function of channel length and temperature. The fully depleted n- and p-channel SOI-MOSFETs were fabricated with standard CMOS technology on commercially available SIMOX substrates.
A neural nonlinear predictor for one dimensional signals is presented. It is based on a combination of linearization and QR decomposition that allows a fast adapting algorithm. The predictor is used in a speech compre...
详细信息
A neural nonlinear predictor for one dimensional signals is presented. It is based on a combination of linearization and QR decomposition that allows a fast adapting algorithm. The predictor is used in a speech compression algorithm that has proven to be superior to linear based models. The compression and training are done simultaneously, allowing the network to continually adapt to the signal. The results presented show that this algorithm outperforms a typical LPC coding algorithm.
Hot carrier effects are thoroughly investigated in deep submiron N-and P-channel SOI MOSFETs using photon emission measurements. A substantial enhancement of the emitted photon number is observed with increasing the d...
详细信息
Hot carrier effects are thoroughly investigated in deep submiron N-and P-channel SOI MOSFETs using photon emission measurements. A substantial enhancement of the emitted photon number is observed with increasing the drain bias in the low gate voltage range, showing the impact of the parasitic bipolar transistor action (PBT). For Vg close to zero, a significant increase of the photon emission is also obtained with reducing the gate length down to sub-0.1 μm for both N- and P-channel transistors. The maximal photon number is obtained for the lower gate bias in the case of a sufficiently high drain voltage and/or small gate length. For low Vd and/or long channels, N ph is maximum around Vg≈Vd/2. These results are in agreement with those obtained in hot-carrier-induced degradations, showing the strong correlation between the emitted photon number and the reliability of the SOI devices in the studied range of gate and drain biases, and highlighting the influence of the PBT action.
The authors present a design and study the performance of a text-dependent speaker verification system using general phrase passwords. The text of the password utterance and its phone transcription are assumed to be a...
详细信息
The authors present a design and study the performance of a text-dependent speaker verification system using general phrase passwords. The text of the password utterance and its phone transcription are assumed to be available. The problems that are addressed include the appropriate choice of units for building target speaker models and the choice of background models for likelihood-ratio scoring.
暂无评论