The recently standardized 3GPP codec for Immersive Voice and Audio Services (IVAS) includes a parametric mode for efficiently coding multiple audio objects at low bit rates. In this mode, parametric side information i...
详细信息
In this paper, Methods for improved parametric coding of transients are presented. We propose a signal model for coding of transients consisting of a sum of simisoids each being amplitude-modulated by a different gamm...
详细信息
In this paper, Methods for improved parametric coding of transients are presented. We propose a signal model for coding of transients consisting of a sum of simisoids each being amplitude-modulated by a different gamma envelope. These envelopes are characterized by an onset time, an attack and a decay parameter. An efficient method for estimating these parameters is presented. Further, methods are proposed that combine this transient model with a constant-amplitude sinusoidal model in order to achieve efficient coding of both stationary and transient signal parts. By rate-distortion optimization using a perceptual distortion measure, we combine variable rate bit allocation and segmentation in an optimal way. Formal, as well as informal, listening tests show that significant improvements can be achieved with the proposed model as compared to a state-of-the-art sinusoidal coder by the combination of optimal segmentation and amplitude modulated sinusoidal audio coding.
In this letter, we present a decomposition for sinusoidal coding of audio, based on an amplitude modulation of sinusoids via a linear combination of arbitrary basis vectors. The proposed method, which incorporates a p...
详细信息
In this letter, we present a decomposition for sinusoidal coding of audio, based on an amplitude modulation of sinusoids via a linear combination of arbitrary basis vectors. The proposed method, which incorporates a perceptual distortion measure, is based on a relaxation of a nonlinear least-squares minimization. Rate-distortion curves and listening tests show that, compared to a constant-amplitude sinusoidal coder, the proposed decomposition offers perceptually significant improvements in critical transient signals.
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to gener...
详细信息
ISBN:
(纸本)9781538646588
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.
We, in this paper discuss the various basic speech coding techniques viz. Waveform coding, parametric coding and the Quantization schemes and review the 'Enhanced Waveform Interpolative coding' technique in de...
详细信息
ISBN:
(纸本)0780381149
We, in this paper discuss the various basic speech coding techniques viz. Waveform coding, parametric coding and the Quantization schemes and review the 'Enhanced Waveform Interpolative coding' technique in detail. The EWI coding technique for. Low bit rates with several enhancements. like Analysis by Synthesis (AbS) optimization of Slowly Evolving Waveform (SEW), Rapidly Evolving Waveform (REW) parametrization, REW quantization, etc. proves to be very efficient for mobile communications. Also briefed are an enhanced Post-Filtering and a. novel Pitch Search technique for speech enhancement The subjective test results have indicated that the quality of the 2.8 Kb/s EWI exceeds that of the G.723.1 at 5.3 Kb/s. Based on the results, we conclude that speech coding low bit-rates especially the EWI coder has enormous vistas in future 4G Mobile systems, Internet Telephony, LEO systems, etc...
A parametric stereo coder in the MDCT domain is introduced in this work. Psychoacoustic modeling, parameter estimation and stereo synthesis are implemented in the MDCT domain. The encoder requires only MDCT domain com...
详细信息
ISBN:
(纸本)9781424459490
A parametric stereo coder in the MDCT domain is introduced in this work. Psychoacoustic modeling, parameter estimation and stereo synthesis are implemented in the MDCT domain. The encoder requires only MDCT domain computations and hence results in lower computational complexity In addition, a low-complexity bit allocation algorithm is used for adaptive quantization of the MDCT coefficients. Perceptual evaluation shows that the proposed audio coder performance is comparable with that of MPEG-2 AAC coder with lower complexity.
Neural speech synthesis models have recently demonstrated the ability to synthesize high quality speech for text-to-speech and compression applications. These new models often require powerful GPUs to achieve real-tim...
详细信息
ISBN:
(纸本)9781479981311
Neural speech synthesis models have recently demonstrated the ability to synthesize high quality speech for text-to-speech and compression applications. These new models often require powerful GPUs to achieve real-time operation, so being able to reduce their complexity would open the way for many new applications. We propose LPCNet, a WaveRNN variant that combines linear prediction with recurrent neural networks to significantly improve the efficiency of speech synthesis. We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS. This makes it easier to deploy neural synthesis applications on lower-power devices, such as embedded systems and mobile phones.
When coding audio signals at low bitrates with a transform coder the most prominent artifacts are spectral holes resulting from spectral lines being quantized to zero. State of the art codecs circumvent this by Noise ...
详细信息
ISBN:
(纸本)9781479999880
When coding audio signals at low bitrates with a transform coder the most prominent artifacts are spectral holes resulting from spectral lines being quantized to zero. State of the art codecs circumvent this by Noise Filling [1] and Bandwidth Extension (BWE) [2]. Both methods have in common that they do not code parts of the waveform itself but code a coarse description of the signal. At decoder side a synthetic signal is generated and adjusted according to the coded parameters. The presented system called Intelligent Gap Filling (IGF) is a combination of both methods. Spectral holes are filled with random noise or with copied decoded signal components from lower frequency regions. In the latter case a control mechanism is required to adjust the tonality of the copied signal components to reach good audio quality. This paper describes the way of controlling the tonality of IGF. The presented approach is of low complexity and allows for selective application without producing additional algorithmic delay. IGF is part of the 3GPP standard Enhanced Voice Services (EVS) as well as MPEG-H standardized by Moving Picture Experts Group (MPEG).
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to gener...
详细信息
ISBN:
(纸本)9781538646595
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.
Evolutionary algorithms are a sub-discipline of artificial intelligence to solve various real-world problems. These algorithms are based on the Darwinian principle of evolution and so is the name evolutionary algorith...
详细信息
ISBN:
(数字)9781728106663
ISBN:
(纸本)9781728106670
Evolutionary algorithms are a sub-discipline of artificial intelligence to solve various real-world problems. These algorithms are based on the Darwinian principle of evolution and so is the name evolutionary algorithm. Differential Evolution (DE) algorithm is a kind of evolutionary algorithms which are used for optimizing a problem mostly for real-valued functions. It uses random solutions and creates new solutions from the previous or existing solutions. This population based algorithm applies three operators namely selection, crossover and, mutation. In this work, a new strategy has been developed to improve the performance of the basic DE algorithm. Also, the resultant performance is compared to other optimization algorithms which show that modified DE is performing better than other existing algorithms.
暂无评论