In this paper, a perceptual audio watermarking scheme for MPEG scalable lossless audio that operates directly in the compressed data is proposed. The watermarking scheme is implemented in the compressed stream by modi...
详细信息
In this paper, a perceptual audio watermarking scheme for MPEG scalable lossless audio that operates directly in the compressed data is proposed. The watermarking scheme is implemented in the compressed stream by modifying the quantized coefficients to satisfy the requirements of robustness, security and imperceptibility. The watermarking procedure directly exploits perceptual frequency masking of the human auditory system (HAS) to guarantee that the scheme is inaudible and robust. The watermark is constructed by selecting and quantizing the perceptually significant audio segments and coefficients. Extensive experimental results show that the high robustness and transparency of watermarked audio can be achieved simultaneously.
We consider the rate allocation problem for multiple-description quantization of the signal described by an adaptive model with a fixed structure. The source modeling in coding generally results in a two-stage descrip...
详细信息
We consider the rate allocation problem for multiple-description quantization of the signal described by an adaptive model with a fixed structure. The source modeling in coding generally results in a two-stage description of the data, where one of the stages describes the model parameters, and the other describes the signal. Such a setup implies the existence of a trade-off between the rate spent on the parameters and the rate spent on the signal. We optimize this trade-off analytically for the multiple-description case using a method inspired by Minimum Description Length principle. We also provide an algorithm for optimizing the rate allocation between the components of the model-based multiple description coder. Finally we experimentally confirm our results. Our method facilitates the rate-adaptive multiple-description coding.
We propose an audio codec that addresses the low-delay requirements of some applications such as network music performance. The codec is based on the modified discrete cosine transform (MDCT) with very short frames an...
详细信息
ISBN:
(纸本)9781617388767
We propose an audio codec that addresses the low-delay requirements of some applications such as network music performance. The codec is based on the modified discrete cosine transform (MDCT) with very short frames and uses gain-shape quantization to preserve the spectral envelope. The short frame sizes required for low delay typically hinder the performance of transform codecs. However, at 96 kbit/s and with only 4 ms algorithmic delay, the proposed codec out-performs the ULD codec operating at the same rate. The total complexity of the codec is small, at only 17 WMOPS for real-time operation at 48 kHz.
A VLSI design of complex Quadrature Mirror Filterbank (QMF) for MPEG-4 High Efficiency Advanced audio coding (MPEG-4 HE-AAC) decoder using resource-sharing technique is proposed. The algorithm that uses conventional d...
详细信息
A VLSI design of complex Quadrature Mirror Filterbank (QMF) for MPEG-4 High Efficiency Advanced audio coding (MPEG-4 HE-AAC) decoder using resource-sharing technique is proposed. The algorithm that uses conventional discrete cosine transform of type IV(DCT-IV) to optimize complex-QMF is derived in this paper. By using the proposed algorithm, the VLSI design of complex valued analysis quadrature mirror filterbank (complex-AQMF) and synthesis quadrature mirror filterbank (complex-SQMF) can improve resource efficiently by sharing the same DCT module. Experiment results show that the computational complexity of the complex-QMF can be reduced up to 8.59%, the VLSI architecture of the proposed algorithm can save about 53% of area and 50% memory due to the shared resources of DCT-IV.
Many existing MP3 encoder employ an FFT-based transform to derive a spectral decomposition of the audio signal into uniform subbands with equal bandwidths. The nonuniform spectral resolution of the auditory system is ...
详细信息
ISBN:
(纸本)9781424443970;9781424443987
Many existing MP3 encoder employ an FFT-based transform to derive a spectral decomposition of the audio signal into uniform subbands with equal bandwidths. The nonuniform spectral resolution of the auditory system is taken into account. The initial audio signal processing within the psycho-acoustic model consists of a spectral decomposition to account for the frequency selectivity of the auditory system. However, the auditory system performs a nonuniform spectral decomposition of the acoustic signal in the cochlea. This paper presents a psycho-acoustic model based on an efficient nonuniform cochlear filter bank following the Cambridge model. Results of the proposed psycho-acoustic model applied to audio coding show better performance in terms of compression ratio and sound quality in comparison with the classical FFT model.
We study the reuse of the bit allocation information in audio transcoding by exploiting the similarity in subband audio coding schemes. We show that important information can be deduced to reduce the encoder complexit...
详细信息
ISBN:
(纸本)9781424423538
We study the reuse of the bit allocation information in audio transcoding by exploiting the similarity in subband audio coding schemes. We show that important information can be deduced to reduce the encoder complexity even if the two coders employ different psychoacoustic model. We give a case study on MPEG AAC/Dolby AC-3 transcoding. The proposed algorithms can be extended to other audio transcoding schemes.
In digital broadcasting services, a difference in audio level among channels and contents causes users to modify audio output level whenever they change the channels or contents. To solve this problem, a novel method ...
详细信息
In digital broadcasting services, a difference in audio level among channels and contents causes users to modify audio output level whenever they change the channels or contents. To solve this problem, a novel method to control audio level of MPEG-2/4 AAC on the bitstream domain is proposed. It is verified that the proposed method has better performance than the level control in waveform domain.
We investigate the effect of audio coding on speaker identification and verification when training and testing conditions are matched and mismatched. Experiments use popular audio coding algorithms (Windows Media Audi...
详细信息
ISBN:
(纸本)9781424447749
We investigate the effect of audio coding on speaker identification and verification when training and testing conditions are matched and mismatched. Experiments use popular audio coding algorithms (Windows Media audio 9.1, Advanced audio coding, MPEG audio Layer III) and a speaker identification and verification system based on Gaussian mixture models. There is some loss in identification and verification performance for audio coding process without the change of sample rate, and a great loss when sample rate changes during audio coding process.
Frequency domain linear prediction (FDLP) represents an efficient technique for representing the long-term amplitude modulations (AM) of speech/audio signals using autoregressive models. For the proposed analysis tech...
详细信息
Frequency domain linear prediction (FDLP) represents an efficient technique for representing the long-term amplitude modulations (AM) of speech/audio signals using autoregressive models. For the proposed analysis technique, relatively long temporal segments (1000 ms) of the input signal are decomposed into a set of sub-bands. FDLP is applied on each sub-band to model the temporal envelopes. The residual of the linear prediction represents the frequency modulations (FM) in the sub-band signal. In this paper, we present several applications of the proposed AM-FM decomposition technique for a variety of tasks like wide-band audio coding, speech recognition in reverberant environments and robust feature extraction for phoneme recognition.
This paper presents a novel lifting factorization of discrete cosine transform type-II and IV (DCT-II and IV). Although some conventional integer DCT-IIs (IntDCT-IIs) with block size 8 have been proposed, they are not...
详细信息
ISBN:
(纸本)9781424436767
This paper presents a novel lifting factorization of discrete cosine transform type-II and IV (DCT-II and IV). Although some conventional integer DCT-IIs (IntDCT-IIs) with block size 8 have been proposed, they are not generalized as arbitrary block size M. Using block lifting factorization which has an efficient structure for lossless-to-lossy image coding, we present IntDCT-IIs and IVs with arbitrary block size M that is called block lifting-based DCT-IIs and IVs (BLDCT-IIs and IVs). Finally, the validity of our method is proved by showing the results of lossless-to-lossy image coding in the most general case of the block size 8 and the extended size 16.
暂无评论