In this paper, a novel voicing-driven adaptive packet loss recovery algorithm is proposed to lessen the possible voice degradation and error propagation for analysis-by-synthesis speech coders in Internet applications...
详细信息
In this paper, a novel voicing-driven adaptive packet loss recovery algorithm is proposed to lessen the possible voice degradation and error propagation for analysis-by-synthesis speech coders in Internet applications. After voicing classification, we adaptively adopt random noise generation, multiresolution excitation generation, or pulse tracking procedure to recover the lost packets. By applying the algorithm to the G.723.1 coder, simulation results show that the proposed algorithm is superior to the recovery algorithm embedded in the G.723.1 standard through the subjective evaluation.
The function of a speech coding algorithm is to convert an analogue speech signal into a digital form for efficient transmission over a digital path, or efficient storage on a digital storage medium, and to perform th...
详细信息
The function of a speech coding algorithm is to convert an analogue speech signal into a digital form for efficient transmission over a digital path, or efficient storage on a digital storage medium, and to perform the complementary function of converting a received digital signal back to analogue form. The article reviews those speech coding techniques which are already being extensively used in telecommunications applications. As well as explaining the basic principles employed by these speech coding algorithms to achieve efficient digital encoding, examples of telecommunications services which use these algorithms are presented.
The Internet has revolutionized the telecommunication systems by supporting new applications and services. Voice over Internet Protocol (VoIP) is one of the most prominent telecommunication services based on the Inter...
详细信息
The Internet has revolutionized the telecommunication systems by supporting new applications and services. Voice over Internet Protocol (VoIP) is one of the most prominent telecommunication services based on the Internet Protocol (IP). The signal quality of the VoIP system depends on several factors such as networking conditions, coding processes, speech content and error correction schemes. The work in the present paper reviewed these issues, used for providing toll-quality communication service to the users over VoIP system. From the very beginning of transferring the voice data over packet switched networks, the journey of the packet based communications to modern VoIP and advancements to improve the service of the VoIP system has been summarized in this work. (C) 2013 Elsevier Ltd. All rights reserved.
In digital communication networks, speech recognition systems conventionally first reconstruct speech and then extract feature parameters. In this paper, we consider a useful approach of incorporating speech coding pa...
详细信息
In digital communication networks, speech recognition systems conventionally first reconstruct speech and then extract feature parameters. In this paper, we consider a useful approach of incorporating speech coding parameters into the speech recognizer. Most speech coders employed in digital communication networks use line spectrum pairs (LSPs) as spectral parameters. We introduce two ways to improve the recognition performance of the LSP-based speech recognizer. One is to devise weighted distance measures of LSPs and the other is to transform LSPs into a new feature set, named pseudo-cepstrum (PCEP). The speaker-independent connected-digit recognition experiments based on the discrete hidden Markov model showed that the weighted distance measures provide better recognition accuracy than unweighted ones do. Additionally, a mel-scale PCEP gives an even better performance than the weighted distance measures do. To clarify the performance improvement of the proposed methods, a significance test is introduced. As a result, the proposed methods achieved higher performances in recognition accuracy, compared with the conventional methods employing mel-frequency cepstral coefficients. (C) 2000 Elsevier Science B.V. All rights reserved.
A new optimisation criterion for computing excitation sequences, typically used in analysis by synthesis linear prediction coders, is presented. This criterion not only relies on the error over the processed frame but...
详细信息
A new optimisation criterion for computing excitation sequences, typically used in analysis by synthesis linear prediction coders, is presented. This criterion not only relies on the error over the processed frame but also includes the excitation effects induced in future frames. This technique is effective in removing ticks that are otherwise audible.
In this paper, the latest wideband vocoder standard adopted by the cdma2000 standardization body, 3GPP2, is described. Christened Enhanced Variable Rate Codec- Wideband (EVRC-WB), the proposed codec encodes wideband s...
详细信息
ISBN:
(纸本)1424407281
In this paper, the latest wideband vocoder standard adopted by the cdma2000 standardization body, 3GPP2, is described. Christened Enhanced Variable Rate Codec- Wideband (EVRC-WB), the proposed codec encodes wideband speech (16 KHz sampling frequency) at a maximum bit-rate of 8.55 kbit/s. EVRC-WB is based on a split band coding paradigm in which two different coding models are used for the signal components in the low frequency (LF) (0-4 KHz) and the high frequency (HF) (3.5-7 KHz) bands. The coding model used for the former is based on the EVRC-B narrowband (0-4 KHz) codec, modified to encode the LF band signal at a maximum bitrate of 7.75 kbit/s. The HF band coding model is a LPC based coding scheme where the excitation is derived from the coded LF band excitation using non-linear processing. Mean opinion scores from 3GPP2 characterization tests are provided to demonstrate that the EVRC-WB codee (8.55 kbit/s, max.) performs statistically significantly better than the Adaptive Multirate Wideband (12.65 kbit/s, max.).
Transmission of data in Voice over Internet Protocol (VoIP) must be made secure and robust such that data should not be easily attacked by intruders. Main objective of the proposed system is to hide the secret informa...
详细信息
ISBN:
(纸本)9781538605691
Transmission of data in Voice over Internet Protocol (VoIP) must be made secure and robust such that data should not be easily attacked by intruders. Main objective of the proposed system is to hide the secret information in the silence part of speech signal for secure communication. Voice Activity Detection (VAD) Algorithm of ITU-T G.729B speech coder is performed to detect silence part of speech signal which is followed by Steganography for embedding and extraction of secret information. In order to evaluate the performance parameters for data hiding capacity in speech signal and for speech quality, the parameters like Perceptual Evaluation of speech Quality (PESQ), Absolute error (ABS), Root Mean Square Error (RAISE), Mean Square Error (MSE), Mean Optimum Score (MOS) are explored. Robustness for proposed hiding scheme is performed by introducing compression attack and resampling of speech signal.
Nowadays the number of mobile subscribers is increasing all over the world, so the system for the communication has to be improved. Mixed Excited Linear Prediction (MELP) algorithm is developed for reducing the bandwi...
详细信息
ISBN:
(纸本)9781479949816
Nowadays the number of mobile subscribers is increasing all over the world, so the system for the communication has to be improved. Mixed Excited Linear Prediction (MELP) algorithm is developed for reducing the bandwidth of the signal as well as transmit more data on a single channel. This results in increase in channel capacity. MELP is basically a speech coding method, relying on a speech Encoder and speech Decoder. The MELP speech coder reduces the redundancy of the signal and compresses it, which is represented by the MELP code. speech Decoder includes a Linear Predictive Coding (LPC) filter providing a synthesized speech at its output side in response to voice and unvoiced. MELP also reduces jitter voice. The bit rate of MELP is reducing the reserves of the code book and calculation complexity. This paper describes the bit rates of MELP coder can be reduced to as low as 2.4kbps without apparent damage to the synthetic speech quality.
Our work introduces a speech enhancement technique that can explicitly incorporate prior information about the gender or speaker time-frequency characteristics in its formalism. We approximate the multimodal, clean sp...
详细信息
ISBN:
(纸本)0780374029
Our work introduces a speech enhancement technique that can explicitly incorporate prior information about the gender or speaker time-frequency characteristics in its formalism. We approximate the multimodal, clean speech linear spectrum magnitude with a mixture of Gaussians pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE) we derive a closed form solution for the spectral magnitude estimation task adapted to the spectral characteristics and noise variance of each band. We suggest that 2-3 minutes of phonetically balanced non-degraded gender or speaker dependent speech is adequate to tune our algorithm. We demonstrate the benefit of using an enhancement technique tailored to a specific gender or speaker and propose its use in cases where message ambiguity is of critical importance. We evaluate of our algorithm using Lynx helicopter and White Gaussian noise on the task of improving the quality of speech and in combination with a speech coder and demonstrate its robustness at very low SNRs. Implementation code is available at: http://***/potamitis/***.
Voice Conversion(VC) consists in modifying a source voice to a target speaker voice. In our approach, we modified only the Code excited linear Predictive(CELP) coder by introducing a pre-processing before the coder fo...
详细信息
Voice Conversion(VC) consists in modifying a source voice to a target speaker voice. In our approach, we modified only the Code excited linear Predictive(CELP) coder by introducing a pre-processing before the coder for the voice conversion. The decoder part of CELP was not modified. This allows maintaining the transmission rate. Our approach for conversion consists in separating the voiced and unvoiced frames, and thus two different conversion functions are associated. The Spectral Frequency Parameters LSF parameters are adopted to represent the vocal tract and Gaussian Mixture Models(GMM) are used to calculate the conversion functions. The pitch for the voiced frames is transformed by linear conversion. The model was tested for conversions between male and female voices.
暂无评论