The analysis of three sets of feature vectors used in speaker identification (ID) systems for speech signals received in encoding-decoding process with AMR, SPEEX and MELP coders has been presented. We have analyzed f...
详细信息
ISBN:
(纸本)9788388309472
The analysis of three sets of feature vectors used in speaker identification (ID) systems for speech signals received in encoding-decoding process with AMR, SPEEX and MELP coders has been presented. We have analyzed feature sets for various speech coding bit rates using SVM-based speaker ID system. The results were compared with identification accuracy obtained with vectors where fundamental frequency was an additional feature. Performed experiments show that such feature contributes better identification accuracy for coded speech than uncoded one in most cases.
In this paper, an effective speech coder that is based on a sparse representation of speech by exploiting the strong dependencies between adjacent pitch cycles is proposed. In the proposed coder, a pitch-synchronous p...
详细信息
ISBN:
(纸本)9781479903566
In this paper, an effective speech coder that is based on a sparse representation of speech by exploiting the strong dependencies between adjacent pitch cycles is proposed. In the proposed coder, a pitch-synchronous processing that consists of pitch warping and a two-stage transformation is used to achieve a compact representation of the voiced speech. Power spectral density preserving quantization (PSD-PQ) is adopted for quantizing the transform coefficients. The result is a coder that is efficient over a wide range of bit rates: it approaches perfect reconstruction with increasing rate, and has a parametric signal representation at low rates. Both objective PESQ results and subjective A/B listening tests show that the proposed coder outperforms the ITU-T G. 722.1 codec.
We have studied experimentally the operational rate-distortion performance for very low bit-rate speech coding using linear inter-frame dependencies. We propose an algorithm that efficiently combines quantization and ...
详细信息
We have studied experimentally the operational rate-distortion performance for very low bit-rate speech coding using linear inter-frame dependencies. We propose an algorithm that efficiently combines quantization and linear interpolation procedures. With a maximum delay of 200 ms, for the spectral envelope information and using line spectrum pair (LSP) parameters as input space the proposed algorithm performs best at rates of between 200 and 300 b/s. For comparison's sake several other procedures such as the multi-frame encoder (Kemp D., Collura J., Tremain T., Multi-Frame coding of LPC Parameters at 600-800 bps. In: IEEE ICASSP-91, 1991, pp. 609-612) and matrix quantizer (Tsao C., Gray R., Matrix quantizer design for LPC speech using the generalized Lloyd algorithm. IEEE Transactions on Acoustics, speech, and Signal Processing ASSP-33, 1985, 537-545) are simulated. Furthermore, a mono-dimensional version of the proposed procedure is shown experimentally to provide the best operational rate-distortion trade-off when coding a parametric representation (pitch, gain and voicing information) of the excitation signal. (C) 2001 Elsevier Science B.V. All rights reserved.
The addition of noise to speech signals coded by an analogue multichannel cochlear implant has previously been shown in modelling studies to enhance the representation of speech cues by the fine time structure of evok...
详细信息
The addition of noise to speech signals coded by an analogue multichannel cochlear implant has previously been shown in modelling studies to enhance the representation of speech cues by the fine time structure of evoked nerve discharges. The enhancement, however, occurred only for a range of noise levels, and this range was stimulus dependent. Theoretically, fine optimization of the noise levels would be unnecessary if each implant channel stimulated a group of cochlear nerve fibres such that each fibre in the group received an independent noise waveform in addition to the same information-bearing signal. We present results from computer simulations that suggest that current spread in the cochlea may be exploited to obtain a high degree of independence between the noise waveforms that stimulate adjacent fibres. The model simulated monopolar stimulation of a cochlear nerve by 11, 21 or 41 electrodes in the scala tympani. The correlation between the effective stimuli for pairs of nerve fibres and the correlation between the corresponding evoked discharges were calculated for two noise strategies. In one strategy, an independent noise current was applied to each electrode. Less correlation between effective stimuli was obtained with the alternate strategy that used inhibition between the noise sources. (C) 2000 Elsevier Science Ltd. All rights reserved.
An excitation signal for a synthesis filter plays an important role in producing high quality speech at a low bit rate. This paper presents a new efficient excitation model, Adaptive Density Pulse (ADP), for Low bit-r...
详细信息
An excitation signal for a synthesis filter plays an important role in producing high quality speech at a low bit rate. This paper presents a new efficient excitation model, Adaptive Density Pulse (ADP), for Low bit-rate speech coding. This ADP is a pulse train whose density (spacing interval) is constant within a subframe but can be varied subframe by subframe. First, the ADP excitation signal is defined. A procedure for finding the optimal ADP excitation is presented. Some results on investigating the effects of the ADP parameters on the synthesized speech quality are discussed. ADP excitation is introduced to the CELP (Code Excited Linear Prediction) coding method to improve speech quality at bit rates around 4 kbps. A CELP coder with an ADP (ADP-CELP) is described. ADP excitation makes it possible for the CELP coder to follow transient portions of speech signals. Also ADP excitation can reduce computational complexity in selecting the best excitation from a codebook, which has been the primary drawback of CELP. The number of multiplications can be reduced to the order of 1/D-2 by utilizing the sparseness of ADP excitation, where D is the pulse interval. The authors evaluated the speech quality of a 4 kbps ADP-CELP coder by computer simulation. ADP excitation improved the performance of conventional CELP in segmental SNR.
This article reviews state-of-the-art in transport adaptation techniques for mobile networks. It discusses the mechanisms for rate adaptation to combat quality degradations of speech caused by the radio links. It begi...
详细信息
This article reviews state-of-the-art in transport adaptation techniques for mobile networks. It discusses the mechanisms for rate adaptation to combat quality degradations of speech caused by the radio links. It begins with a review of dynamic schemes for adaptation of speech encoders in cellular networks where we observe two distinct approaches to rate adaptation: network controlled and source controlled. The issues associated with adaptive voice over IP (VoIP) mechanisms are considered next. Here, the encoder detects some form of network congestion to judge how to behave itself for the good of the network. It is noted that this altruistic behavior will only benefit coordinated IP networks such as private intranets and its application to the public Internet is improbable.
Alternative methods for digitally transcodingspeech for radio transmission in an indoor environment have been investigated and compared to the CCITT standard, adaptive differential pulse code modulation (ADPCM).1 The...
详细信息
Alternative methods for digitally transcodingspeech for radio transmission in an indoor environment have been investigated and compared to the CCITT standard, adaptive differential pulse code modulation (ADPCM).1 These alternative coders-are designed to minimize the effects of transmission errors on the quality of the transcoded speech. The coders compared are CCITT standard G.721 ADPCM, adaptive sub-band coding, and two other non-standard versions of ADPCM. In general, when packets of data are lost the adaptive sub-band coder performs extremely well in terms of maintaining speech quality, as the sub-band synthesis filters fill out the gaps in speech. However, the sub-band coder requires the greatest levels of complexity and delay. The other ADPCM systems offer lower complexity and delay-at the expense of lower speech quality.
In this article, we have reviewed some of the existing subjective and objective measures used in the area of speech coding. The mean opinion score and the diagnostic acceptability measure are two of the widely used su...
详细信息
In this article, we have reviewed some of the existing subjective and objective measures used in the area of speech coding. The mean opinion score and the diagnostic acceptability measure are two of the widely used subjective measures. The most popular class of the time-domain measures is the signal-to-noise ratio (SNR) with its variants such as the segmental SNR, the granular segmentsal SNR etc. Among the spectral distortion measures, the log likelihood ratio measure, the lag area ratio measure, the log spectral distortion measure, the cepstral distance and the Itakura-Saito distortion measure are quite well-known. Some of the more recently proposed objective measures place emphasis on the perceptually significant aspects. Three such classes of the psychoacoustically-motivated measures are the information index, the Bark spectral distortion measure and the neural distance measure (e.g., the cochlear discrimination information, the cochlear hidden Markovian measues). The merit of considering important perceptual events is evident in the success of these measures.
This article presents new speech coding methods for real time application (telephone, videophone) or off-line applications (storage). speech quality is in the classical telephone range, with a 4 kHz bandwidth and a sa...
详细信息
This article presents new speech coding methods for real time application (telephone, videophone) or off-line applications (storage). speech quality is in the classical telephone range, with a 4 kHz bandwidth and a sampling at 8 kHz. An elementary approach leads to a 16 kbit/s codec and a 24 kbit/s codec, using integer codebooks and fast computations. The speech quality of the two codecs has been measured in comparison with more complex ones and in realistic conditions, with noisy telecommunication channels. The elementary approach is completed by a synthetic model, with a systematic generalization of the algorithms (e.g. for a generalized VSELP). Some methods for channel protection, which are already known by the speech coding researchers, are summed up in the Appendix. A change of representation for low density codes (less than 1 bit/sample) is proposed.
We describe a new signal processing technique for cochlear implants using a psychoacoustic-masking model. The technique is based on the principle of a so-called "NofM" strategy. These strategies stimulate fe...
详细信息
We describe a new signal processing technique for cochlear implants using a psychoacoustic-masking model. The technique is based on the principle of a so-called "NofM" strategy. These strategies stimulate fewer channels (N) per cycle than active electrodes (NofM;N < M). In "NofM" strategies such as ACE or SPEAK, only the N channels with higher amplitudes are stimulated. The new strategy is based on the ACE strategy but uses a psychoacoustic-masking model in order to determine the essential components of any given audio signal. This new strategy was tested on device users in an acute Study, with either 4 or 8 channels stimulated per cycle. For the first condition (4 channels), the mean improvement over the ACE strategy was 17%. For the second condition (8 channels), no significant difference was found between the two strategies.
暂无评论