In analysis-by-synthesis speech coders the computational complexity of the search for an optimum innovation is still high although transformations were proposed to decrease the complexity. This limits practical codebo...
详细信息
In analysis-by-synthesis speech coders the computational complexity of the search for an optimum innovation is still high although transformations were proposed to decrease the complexity. This limits practical codebook sizes and vector dimensions (block lengths). In this contribution two new structured frequency domain codebooks are proposed. The first one is a pulse shaped codebook with a reversed search order for the gain and the shape; the second one is a unity magnitude codebook with structured phase. The corresponding algorithms which are based on new insights, result in a drastically reduced search in the transformed domain. The computational complexity increases only proportional to the bit rate and not to the codebook size.< >
A new speech and audio codec has been submitted recently to ITU-T by a consortium of Huawei and ETRI as candidate proposal for the super-wideband and stereo extensions of ITU-T Rec. G.729.1 and G.718. This hierarchica...
详细信息
A new speech and audio codec has been submitted recently to ITU-T by a consortium of Huawei and ETRI as candidate proposal for the super-wideband and stereo extensions of ITU-T Rec. G.729.1 and G.718. This hierarchical codec with bit rates from 8-64 kbit/s relies on a subband splitting by means of a quadrature-mirror filter-bank (QMF-bank). For this, an allpass-based QMF-bank is used whose design and implementation is presented in this contribution. This IIR filter-bank allows to achieve a significantly lower signal delay in comparison to the traditional FIR QMF-bank solution without a compromise for the speech and audio quality.
In digital mobile communicationsystems there is the need for reducing the subjective effects of residual bit errors which have not been eliminated by channel decoding by the use of error concealment techniques. Due t...
详细信息
In digital mobile communicationsystems there is the need for reducing the subjective effects of residual bit errors which have not been eliminated by channel decoding by the use of error concealment techniques. Due to the fact that most standards do not specify these algorithms bit exactly, there is room for new solutions to improve the speech quality. This article develops a new approach for optimum estimation of the speech codec parameters. It can be applied to any speech codec standard if bit reliability information is provided by the demodulator (e.g. DECT), or by the channel decoder (e.g. soft-output Viterbi algorithm-SOVA in GSM). The proposed method includes an inherent muting mechanism leading to a graceful degradation of speech quality in case of adverse transmission conditions. Particularly the additional exploitation of the residual source redundancy, i.e. some a priori knowledge about the codec parameters gives a significant enhancement of the output speech quality. In the case of an error free channel, bit exactness as required by the standards can be preserved.
One of the most widely used gradient-based adaptation algorithms is the so called normalized least mean square (NLMS) algorithm. The rate of convergence, misadjustment and noise insensitivity of the NLMS-type algorith...
详细信息
One of the most widely used gradient-based adaptation algorithms is the so called normalized least mean square (NLMS) algorithm. The rate of convergence, misadjustment and noise insensitivity of the NLMS-type algorithm depend on the proper choice of the step size parameter, which controls the weighting applied to each coefficient update. Different step size methods have been proposed to improve the convergence of NLMS-type filters, while preserving the steady-state performance. The step size methods considered here use either a step size parameter which varies with time or a separate, tap-individual step size for each filter tap. The derivation of the respective step size methods is based on different optimization criteria. In this paper a step size parameter is proposed satisfying a combined optimization criterion leading to a time variant and individual step size parameter. The realization aspects of the new concept are discussed for an acoustic echo control application as an example.
We propose a split-band encoding scheme for 16 kbit/s wideband speech coding (50-7000 Hz), using 2 unequal subbands from 0-6 kHz and from 6-7 kHz. This approach was motivated by an experimental evaluation of the signa...
详细信息
We propose a split-band encoding scheme for 16 kbit/s wideband speech coding (50-7000 Hz), using 2 unequal subbands from 0-6 kHz and from 6-7 kHz. This approach was motivated by an experimental evaluation of the signal bandwidth of speech frames. The higher subband is simply represented by white noise with adjustment of the short term energy. For the lower subband code-excited linear prediction (CELP) is used. The analysis filter bank, which performs the unequal band splitting combined with critical subsampling of the sub-bands, is described. A bit error concealment technique and the bit allocation is also presented. By informal listening tests the speech quality was rated higher than the speech quality of the CCITT G.722 wideband codec operating at 48 kbit/s.
A beamformer for binaural speech enhancement systems in digital hearing aids is proposed. Its single modules for the estimation of the time-difference-of-arrival (TDOA) and time-alignment operate in the frequency-doma...
详细信息
A beamformer for binaural speech enhancement systems in digital hearing aids is proposed. Its single modules for the estimation of the time-difference-of-arrival (TDOA) and time-alignment operate in the frequency-domain and have a low computational complexity. The TDOA estimation is performed efficiently by a generalized cross-correlation with phase transform weighting. The estimation accuracy for filter-banks with a limited number of subbands, which are needed for hearing aids to meet tight delay constraints, is improved by a histogram-based TDOA estimation. The subsequent time-alignment is accomplished by a simple multiplication with spectral phase factors. A primary application of the proposed system are binaural cue preserving speech enhancement systems based on spectral weighting. The proposed beamformer can be used as delay-and-sum and/or delay-and-subtract beamformer to provide subband signals from which the power spectral densities of interfering sources can be estimated to drive the spectral weight calculation.
The contribution of this paper is two-fold. At first, we introduce a modification of the linear statistical signal model in acoustic echo control. In contrast to the traditional approach, the acoustic echo path is cha...
详细信息
The contribution of this paper is two-fold. At first, we introduce a modification of the linear statistical signal model in acoustic echo control. In contrast to the traditional approach, the acoustic echo path is characterized as a random process with statistical mean and covariance, while the echo path input is modeled as a deterministic signal. Based on the modified signal model, we then derive the linear MMSE estimator for the near-end speech components in the microphone signal. The result can be seen as a generalized Wiener filter that consists of an acoustic echo canceler and a post-filter for residual echo suppression. The presented theory entails several fundamental advantages: a) the new signal model better matches the practical applications of acoustic echo control, b) it proves the principal coexistence of echo canceler and postfilter in hands-free communicationsystems, c) the generalized Wiener solution simplifies the realization of acoustic echo controllers, and d) we obtain a better insight into the performance bounds of acoustic echo control.
In digital mobile communications several measures are taken beyond speech coding to enhance the perceived quality in the presence of acoustic background noise and transmission errors. In premium mobile phones meanwhil...
详细信息
In digital mobile communications several measures are taken beyond speech coding to enhance the perceived quality in the presence of acoustic background noise and transmission errors. In premium mobile phones meanwhile advanced algorithms for noise suppression, error concealment, and finally artificial bandwidth extension come to practical application. It will be shown in this contribution that theses three different concepts of speech enhancement are actually based on the same common principle of conditional estimation, taking statistical a priori knowledge into account. Recent developments in these three areas are presented.
暂无评论