Speech processing systems are highly complex and teaching students in this subject matter with the underlying technologies can be a challenging task. Thus education software is needed to provide the students with more...
详细信息
Speech processing systems are highly complex and teaching students in this subject matter with the underlying technologies can be a challenging task. Thus education software is needed to provide the students with more convenient way to study, perform experiments and develop his/her first speech recognition application without an instructor. The Speech Recognition Virtual Classroom (SRVC) was developed as education software in teaching speech recognition. There are various modules, like signal analysis, dynamic time warping, vector quantization and hidden Markov models (HMM). These modules are integrated into a single environment that allows easy learning through experiments with real speech data.
In this work, we propose the partial-correlation (PARCOR) coefficients scheme to model the cross areas of the several cylinders from the vocal tract. By using the relationship of the acoustic impedance proportional to...
详细信息
In this work, we propose the partial-correlation (PARCOR) coefficients scheme to model the cross areas of the several cylinders from the vocal tract. By using the relationship of the acoustic impedance proportional to the reciprocal of cross areas, the ratios of cross areas between each neighboring cylinders are used to model a speaker's vocal tract. The autoregressive model (AR model) is performed on the speech residual signals, that are produced from the inverse vocal tract transform based on the PARCOR, to generate features. These features with the conventional features from the Mel-Frequency Cepstral Coefficient (MFCC) are used for the identification engine of the Gaussian Mixture Model (GMM). According to our computer analyses in the TIMIT speech database, the proposed system can yield better identification performance than the conventional approach.
In many speech coding systems, the LPC coefficients are transformed to the Line Spectrum Pairs (LSP) parameters which are very effective representation for quantization of the LPC information. LSP representation consu...
详细信息
ISBN:
(纸本)0780381165
In many speech coding systems, the LPC coefficients are transformed to the Line Spectrum Pairs (LSP) parameters which are very effective representation for quantization of the LPC information. LSP representation consumes a large part of the total bit rate of the coder. Typically, the LSP are highly correlated from one frame to the next one, and a considerable reduction in bit rate can be achieved by exploiting this interframe correlation. However, interframe LSP coding can cause error propagation when frame erasures occur. In this paper, we compare the erasure performance of a predictive split vector quantizer to that of split vector quantizer based on the ITU G723.1 standard coder. Our results show that a 25 bits/frame split vector quantizer improves the average of spectral distortion of the 24 bits/frame predictive split vector quantizer based on The ITU G.723.1 for different loss rates.
We have presented a multilayered and columnar competitive network involving competitive associative nets (CANs) and adaptive vector quantization nets (AVQNs) for spoken word recognition. Although the network has shown...
详细信息
We have presented a multilayered and columnar competitive network involving competitive associative nets (CANs) and adaptive vector quantization nets (AVQNs) for spoken word recognition. Although the network has shown good performance in recognition rate, it requires a relatively large calculation time owing to the CANs. So, here, we present a new network replacing the CANs by a conventional feature extractor and an additional AVQNs, where as the feature extractor we use one of the three conventional methods: RLS (recursive least squares) for extracting LPCs, LSL (least squares lattice) for PARCOR coefficients, and normalized LSL for normalized PARCOR coefficients. As a result of experiments, the normalized LSL shows almost the same performance as the original network in recognition rate while reducing the calculation time.
This paper describes a modified variable coder with an average bit-rate of 1200 bps for wireless messaging including voice paging and voice e-mail delivery to a PC or a hand held device. The coder uses LPC based analy...
详细信息
This paper describes a modified variable coder with an average bit-rate of 1200 bps for wireless messaging including voice paging and voice e-mail delivery to a PC or a hand held device. The coder uses LPC based analysis/synthesis with Zinc function excitation for voiced frames, a plosive and pitch detector to achieve good synthetic voice quality at this rate. Further, a frame deemed unvoiced has the RMS value of its LPC residual quantized and sent to the decoder. The proposed system achieved an MOS of 3.1, while conventional MELP achieved an MOS of 3.3 in a subjective quality test. Further, the coder has been simulated on a workstation and a laptop PC running windows NT. Hence the coder can be used from MS Outlook to send and receive voice e-mails. The performance of the coder under random bit errors is also presented. It has been found that only at error rates of 10/sup -2/ and higher does the degradation becomes objectionable.
NATO military communications require high quality voice communications for their military missions. Currently the performance of in-place voice coding algorithms is unacceptable in the harsh tactical acoustic conditio...
详细信息
NATO military communications require high quality voice communications for their military missions. Currently the performance of in-place voice coding algorithms is unacceptable in the harsh tactical acoustic conditions where NATO commanders operate, such as tracked vehicles or helicopters. A new generation of 2.4/1.2 kbps speech coding algorithms has been developed which far exceed the quality of service of the existing speech coding algorithms currently used by NATO. This has led to a decision by NATO to standardize a new algorithm to suit its need, through a competition. A candidate coder based on the split-band LPC (SB-LPC) principle has been developed to fit the requirements of the competition, and has been entered by Turkey in the NATO STANAG competition. Its details are presented in this paper.
Parametric vocoders are used to attain very low rate compression. A new method for segment coding of a speech spectral envelope is proposed, based on coding of constant length segments. It is a computationally efficie...
详细信息
Parametric vocoders are used to attain very low rate compression. A new method for segment coding of a speech spectral envelope is proposed, based on coding of constant length segments. It is a computationally efficient method for LPC (linear predictive coding) parameter coding based on TD (temporal decomposition). It attains average LSD (log-spectrum distortion) less than 1 dB at 11 bits/frame. Thus, it saves 56% of the LPC parameter bit budget which is 26% of the total NELP2400 vocoder bit budget. The method offers constant delay and relatively low complexity.
Power consumption became an important feature to be considered in system implementations. This work presents a methodology for dynamic power consumption estimation using hardware descriptions written in VHDL; a librar...
详细信息
Power consumption became an important feature to be considered in system implementations. This work presents a methodology for dynamic power consumption estimation using hardware descriptions written in VHDL; a library with information for transitions and power consumption for all components of the target library is created. A case study for the KASUMI cryptographic algorithm is reported. This algorithm was chosen to compose the 3rd Generation Partnership Project (3GPP) security functions for mobile systems. Restrictions imposed by the 3GPP to the hardware implementation of the KASUMI cryptographic algorithm were analyzed and satisfied; our dynamic power consumption estimation methodology is used. Only CMOS technologies are discussed in this paper.
We present a speech data hiding technique that utilizes the characteristics of multistage vector quantization (MSVQ) and subtractive dithering to maintain high speech reconstruction quality. The last stage of MSVQ is ...
详细信息
ISBN:
(纸本)0780375769
We present a speech data hiding technique that utilizes the characteristics of multistage vector quantization (MSVQ) and subtractive dithering to maintain high speech reconstruction quality. The last stage of MSVQ is used to store the data to be embedded. Similar to subtractive dithering, the noise-like hidden data is subtracted from the signal at the first stage of the encoder and added back at the MSVQ decoder. As a result, the degradation caused by hiding secret data is significantly reduced compared to the traditional simple substitution method.
暂无评论