This paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spati...
详细信息
This paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spatial sound sources within the 3D space as rendered by a multi-channel audio signal (in this case 16 channels). This analysis results in the derivation of a stereo downmix signal representing the original 16 channels. Alternatively, a mono-downmix signal with side information representing the location of sound sources within the 3D spatial scene can also be derived. The resulting downmix signals are then compressed with a traditional audio coder, resulting in a representation of the 3D soundfield at bit rates comparable with existing stereo audio coders while maintaining the perceptual quality produced from separate encoding of each channel.
In this paper, an audio denoising method is proposed for improving the quality of handheld audio recording devices. The proposed method reduces noise differently depending on the block size in the modified discrete co...
详细信息
In this paper, an audio denoising method is proposed for improving the quality of handheld audio recording devices. The proposed method reduces noise differently depending on the block size in the modified discrete cosine transform (MDCT) analysis of an audio coder. Specifically, denoising for a long block is performed by multi-band spectral subtraction (MBSS) with perceptually weighted scale-factor bands, while that for a short block is performed by sub-band power scaling to maintain coherence of power with the previously-denoised long block. In order to evaluate the performance of the proposed method, it is first embedded into MPEG-2 advanced audio coding (AAC) that is popularly used for audio recording devices. Then, its performance is compared with that of a conventional audio denoising method based on block thresholding in terms of cepstral distortion, subjective quality, and computational complexity. It is shown from performance comparison that the proposed method out-performs the block thresholding method in both objective and subjective measurements. Moreover, the complexity of the proposed method is sufficiently lowered to be implemented on most resource-constrained handheld audio recording devices, unlike the conventional method.(1)
Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently repres...
详细信息
ISBN:
(纸本)9781424423538
Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently represent both speech and music at low bitrates. This paper presents a unified speech and audio codec, which efficiently combines techniques from both worlds. This results in a codec that exhibits consistently high quality for speech. music and mixed audio content. The paper gives an overview of the codec architecture and presents results of formal listening tests comparing this new codec with HE-AAC(v2) and AMR-WB+. This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and audio coding.
Current audio coding standards employ the modified discrete cosine transform (MDCT) where overlapped frames of audio are windowed and transformed to the frequency domain. Encoding parameters are chosen so as to minimi...
详细信息
ISBN:
(纸本)9781424423538
Current audio coding standards employ the modified discrete cosine transform (MDCT) where overlapped frames of audio are windowed and transformed to the frequency domain. Encoding parameters are chosen so as to minimize a distortion measure subject to a rate constraint. At the decoder, inverse transformation involves additional windowing and overlap-add of frames. An analysis of the time domain error in the reconstructed frame reveals that distortion metrics based solely on the MDCT domain error are in fact unable to capture the effects of windowing and overlap-add at the decoder. The main contribution of this paper is a modified distortion metric that does capture these effects via modified discrete sine transform analysis. When incorporated into an Advanced audio Coder the proposed distortion metric significantly improves subjective quality of reconstructed audio.
Multi-terminal sources coding refers to separate lossy encoding and joint decoding of two or more correlated sources. Based on good output performance it can effectively reduce encoding complexity. With focus on the a...
详细信息
ISBN:
(纸本)9780769535227
Multi-terminal sources coding refers to separate lossy encoding and joint decoding of two or more correlated sources. Based on good output performance it can effectively reduce encoding complexity. With focus on the asymmetry case, This paper designs a asymmetry multi-terminal sources audio coding algorithm, then analyses and simulates it. The encouraging simulation results show multi-terminal sources audio coding is feasible, simple and can get higher acoustical effect.
This paper presents the two new ITU-T Recommendations G.722 Annex D and G.711.1 Annex F, which are stereo extensions of the wideband codecs ITU-T G. 722 and G.711.1 and their superwideband extensions (G. 722 Annex B a...
详细信息
ISBN:
(纸本)9781479903566
This paper presents the two new ITU-T Recommendations G.722 Annex D and G.711.1 Annex F, which are stereo extensions of the wideband codecs ITU-T G. 722 and G.711.1 and their superwideband extensions (G. 722 Annex B and G.711.1 Annex D). An embedded scalable structure is used to add stereo extension layers on top of the wideband or superwideband core coding. Wideband stereo modes are supported at the bit rates of 64/80 and 96/128 kbit/s for G.722 and G.711.1 (respectively), while superwideband stereo modes are supported at 8 0 /96/112/128 and 112/128/144/160 kbit/s. The parametric stereo coding model is based on a frequency domain downmix, wideband inter-channel differences estimation, quantization and synthesis, low complexity coherence analysis and synthesis, stereo transient detection and stereo post-processing. An overview of formal ITU-T characterization listening tests illustrates the performance of these codecs.
Parametric models are of great interest for representing and manipulating sounds. However, the quality of the resulting signals depends on the precision of the parameters. When the signals are available, these paramet...
详细信息
Parametric models are of great interest for representing and manipulating sounds. However, the quality of the resulting signals depends on the precision of the parameters. When the signals are available, these parameters can be estimated, but the presence of noise decreases the resulting precision of the estimation. Furthermore, the Cram,r-Rao bound shows the minimal error reachable with the best estimator, which can be insufficient for demanding applications. These limitations can be overcome by using the coding approach which consists in directly transmitting the parameters with the best precision using the minimal bitrate. However, this approach does not take advantage of the information provided by the estimation from the signal and may require a larger bitrate and a loss of compatibility with existing file formats. The purpose of this article is to propose a compromised approach, called the 'informed approach,' which combines analysis with (coded) side information in order to increase the precision of parameter estimation using a lower bitrate than pure coding approaches, the audio signal being known. Thus, the analysis problem is presented in a coder/decoder configuration where the side information is computed and inaudibly embedded into the mixture signal at the coder. At the decoder, the extra information is extracted and is used to assist the analysis process. This study proposes applying this approach to audio spectral analysis using sinusoidal modeling which is a well-known model with practical applications and where theoretical bounds have been calculated. This work aims at uncovering new approaches for audio quality-based applications. It provides a solution for challenging problems like active listening of music, source separation, and realistic sound transformations.
In this paper, we propose an object audio system on digital signal processor (DSP) that consists of MPEG-4 audio lossless coding (ALS) to provide high-quality audio. The complexity reduction in the designed object aud...
详细信息
ISBN:
(纸本)9781424444618
In this paper, we propose an object audio system on digital signal processor (DSP) that consists of MPEG-4 audio lossless coding (ALS) to provide high-quality audio. The complexity reduction in the designed object audio system is very critical issue because the system requires several MPEG-4 ALS decoders, as many as the number of objects. A method to efficiently use internal memory on DSP is suggested to overcome the high-complexity situation that happens with the use of external memory. A low-complexity finite impulse response (FIR) filter is also proposed because the short-term prediction filter in the MPEG-4 ALS decoder has the highest complexity in MPEG-4 ALS decoder blocks. A method for efficient use of internal memory is designed so that the critical data of MPEG-4 ALS decoders use internal memory as much as the size of the data for a decoder, and the internal memory is shared with the MPEG-4 ALS decoders. The proposed FIR filter reduces the complexity of the short-term prediction filter by 25% compared to direct convolution. A proposed method for an object audio system is evaluated on DSP; it consists of 12 objects. The proposed audio system has a reduction of complexity by 83% with the application of the two proposed methods, and the audio system operates in real time on DSP. This means that high-quality object audio can be serviced for multimedia products.
We present a new audio steganographic technique based on empirical mode decomposition and Hilbert Transform. The audio signal is decomposed into several intrinsic mode functions to be the addressee for the payload of ...
详细信息
ISBN:
(纸本)9781467313636
We present a new audio steganographic technique based on empirical mode decomposition and Hilbert Transform. The audio signal is decomposed into several intrinsic mode functions to be the addressee for the payload of a QR code. Our results show that the proposed method is robust against common audio processing attacks.
This paper proposes a new MPEG-2 AAC Huffman decoding algorithm which is designed to find multiple symbols in a single search. The analysis and experimental results show that the computational complexity of the propos...
详细信息
ISBN:
(纸本)9781467313636
This paper proposes a new MPEG-2 AAC Huffman decoding algorithm which is designed to find multiple symbols in a single search. The analysis and experimental results show that the computational complexity of the proposed method is lower by more than 46% when compared with those of the up-to-date methods.
暂无评论