The MPEG-H 3D audio standard applies singular value decomposition (SVD) to the input higher order ambisonics data, then encodes each predominant (foreground) sound component independently using a standard core audio c...
详细信息
ISBN:
(纸本)9781479981311
The MPEG-H 3D audio standard applies singular value decomposition (SVD) to the input higher order ambisonics data, then encodes each predominant (foreground) sound component independently using a standard core audio codec. The residual (background) signal is encoded in the ambisonic domain. This paper is motivated by the observations: i) separate coding of SVD components ignores spatial inter channel masking effects;ii) compression in both SVD and ambisonic domains is difficult to perceptually optimize;iii) Only few predominant components are encoded due to the prohibitive side information cost of specifying SVD basis vectors. The proposed coding architecture overcomes the first two concerns by performing all compression in the SVD domain with a masking threshold that is calculated jointly for all encoded components, thereby accounting for cross-component masking. The third shortcoming is circumvented by a novel method for extending a given set of SVD basis vectors at no side information cost, by computing (at both encoder and decoder) basis vectors to span the null space of the transmitted basis vectors. Experimental results provide evidence for substantial objective and subjective gains.
spatial audio coding (SAC) is an extremely high compact representation of encoded multi-channel audio material. This paper suggests a multi-channel audio service in the terrestrial digital multimedia broadcasting (T-D...
详细信息
spatial audio coding (SAC) is an extremely high compact representation of encoded multi-channel audio material. This paper suggests a multi-channel audio service in the terrestrial digital multimedia broadcasting (T-DMB) system using a novel SAC tool, which is called a virtual source location information (VSLI)-based SAC tool. Intensive experiments are presented to evaluate the validity of the proposed VSLI-based SAC tool, and prototypical systems are also presented to demonstrate the reliability of the proposed multi-channel T-DMB system in real applications.
Advances in virtual reality have generated substantial interest in accurately reproducing and storing spatialaudio in the higher order ambisonics (HOA) representation, given its rendering flexibility. Recent standard...
详细信息
ISBN:
(纸本)9781538616321
Advances in virtual reality have generated substantial interest in accurately reproducing and storing spatialaudio in the higher order ambisonics (HOA) representation, given its rendering flexibility. Recent standardization for HOA compression adopted a framework wherein HOA data are decomposed into principal components that are then encoded by standard audiocoding, i.e., frequency domain quantization and entropy coding to exploit psychoacoustic redundancy. A noted shortcoming of this approach is the occasional mismatch in principal components across blocks, and the resulting suboptimal transitions in the data fed to the audio coder. Instead, we propose a framework where singular value decomposition (SVD) is performed after transformation to the frequency domain via the modified discrete cosine transform (MDCT). This framework not only ensures smooth transition across blocks, but also enables frequency dependent SVD for better energy compaction. Moreover, we introduce a novel noise substitution technique to compensate for suppressed ambient energy in discarded higher order ambisonics channels, which significantly enhances the perceptual quality of the reconstructed HOA signal. Objective and subjective evaluation results provide evidence for the effectiveness of the proposed framework in terms of both higher compression gains and better perceptual quality, compared to existing methods.
Closed-loop configuration has been introduced to spatial audio coding (SAC) which make it possible to minimise distortion introduced during the quantisation and encoding processes. Significant performance improvement ...
详细信息
ISBN:
(纸本)9781479977116
Closed-loop configuration has been introduced to spatial audio coding (SAC) which make it possible to minimise distortion introduced during the quantisation and encoding processes. Significant performance improvement has been shown and reported in some papers. However, implementation of closed-loop system directly to MPEG Surround (MPS) is still problematic due to the unbalanced-delay filterbank that is used in the MPEG standard which is not appropriate for a closed-loop system that needs synchronisation between the original audio signals with the target audio signals. In this paper, we investigate the delay characteristic of the Quadrature Mirror Filterbank (QMF) which is used in several MPEG standards. Based on this study, a balanced-delay QMF is proposed and tested in a closed-loop MPS system. The results of the experiments show that approximately 8 dB of SNR improvement is achieved when applying balanced-delay filterbank compared to the unbalanced-delay filterbank as specified in MPS.
Closed-loop spatial audio coding is a compression technique, developed based on MPEG Surround (MPS) standard, having an advantage of minimising distortion due to quantisation process of spatial parameters. Despite the...
详细信息
ISBN:
(纸本)9781479953035
Closed-loop spatial audio coding is a compression technique, developed based on MPEG Surround (MPS) standard, having an advantage of minimising distortion due to quantisation process of spatial parameters. Despite the MPS is developed based on filterbank, however, this closed-loop system performs better on Modified Discrete Cosine Transform (MDCT). Considering its high performance over the open-loop system, this paper presents further investigation on objective performance of closed-loop spatial audio coding against various quantisers of spatial parameters. Experiments have been conducted to measure signal to noise ratio (SNR) across different types of uniform spatial quantisers at various operating bitrates. The results show that the SNR achieved by the open-loop approach is strongly affected by the type of the quantiser while, in contrast, the SNR achieved by the closed-loop approach is relatively constant regardless the number of bits used in the quantisers. Moreover, the results also show that the closed-loop configuration can consistently improve SNR in any quantisation scheme.
A binaural audio synthesis system based on sinusoidal modeling is proposed for spatial, low-bitrate audiocoding utilized for example in teleconference applications. The system transmits monaural sinusoidal parameters...
详细信息
ISBN:
(纸本)9781424442966
A binaural audio synthesis system based on sinusoidal modeling is proposed for spatial, low-bitrate audiocoding utilized for example in teleconference applications. The system transmits monaural sinusoidal parameters of a downmix signal, from which the left and right binaural signals are synthesized according to the directional metadata at the receiver. Typical sinusoidal synthesis methods, as well as the effectiveness of a monaural frequency masking model, are evaluated in binaural context. Furthermore, a method for binaural noise residual synthesis and efficiency improvements for HRTF parameter acquisition are suggested. Tests utilizing speech signals indicate that sinusoidal modeling is an attractive technique for applications such as the proposed one.
spatial audio coding (SAC) is an emerging technology with a distinguishing feature of delivering good even excellent audio quality at monotonic or stereo bitrate of conventional perceptual transform coders. By a syste...
详细信息
ISBN:
(纸本)9780769540184
spatial audio coding (SAC) is an emerging technology with a distinguishing feature of delivering good even excellent audio quality at monotonic or stereo bitrate of conventional perceptual transform coders. By a systematic exploitation of spatial hearing, Binaural Cue coding illustrates the power and potentials of SAC in the future for intelligent multimedia services. MPEG Surround, receiving cumulative efforts from industry and academy, strives to build a SAC system with great versatility and high quality. The initial test results of MPEG Surround show its performance advantage over conventional state-of-the-art coders in a wide range of coding configurations.
Parametric spatial audio coding schemes, such as advanced joint channel coding in Dolby's next-generation audiocoding system AC-4, achieve a higher data compression ratio as a result of a lower-dimensional interm...
详细信息
ISBN:
(纸本)9781538663189
Parametric spatial audio coding schemes, such as advanced joint channel coding in Dolby's next-generation audiocoding system AC-4, achieve a higher data compression ratio as a result of a lower-dimensional intermediate signal representation, known as the downmix. During the inverse process, the upmix, which is guided by side information, the covariance between the source signals is reconstructed to preserve perceptually important cues such as ambience or source width. In this manuscript, a systematic approach for the construction of ambience bases from weighing matrices is presented. Furthermore, the basis vectors are generalized to accommodate for nonunitary mixing weights, and a new basis is derived. Round figures from internal listening tests are shared to underpin the utility of the approach.
spatial audio coding and enhancement address the growing commercial need to store and distribute multichannel audio and to render content optimally on arbitrary reproduction systems. In this paper, we discuss a spatia...
详细信息
ISBN:
(纸本)9781424407286
spatial audio coding and enhancement address the growing commercial need to store and distribute multichannel audio and to render content optimally on arbitrary reproduction systems. In this paper, we discuss a spatial analysis-synthesis scheme which applies principal component analysis to an STFT-domain representation of the original audio to separate it into primary and ambient components, which are then respectively analyzed for cues that describe the spatial percept of the audio scene on a per-tile basis;these cues are used by the synthesis to render the audio appropriately on the available playback system. The proposed framework can be tailored for robust spatial audio coding, or it can be applied directly to enhancement scenarios where there are no rate constraints on the intermediate spatial data and audio representation.
A binaural audio synthesis system based on sinusoidal modeling is proposed for spatial, low-bitrate audiocoding utilized for example in teleconference applications. The system transmits monaural sinusoidal parameters...
详细信息
ISBN:
(纸本)9781424442959;9781424442966
A binaural audio synthesis system based on sinusoidal modeling is proposed for spatial, low-bitrate audiocoding utilized for example in teleconference applications. The system transmits monaural sinusoidal parameters of a downmix signal, from which the left and right binaural signals are synthesized according to the directional metadata at the receiver. Typical sinusoidal synthesis methods, as well as the effectiveness of a monaural frequency masking model, are evaluated in binaural context. Furthermore, a method for binaural noise residual synthesis and efficiency improvements for HRTF parameter acquisition are suggested. Tests utilizing speech signals indicate that sinusoidal modeling is an attractive technique for applications such as the proposed one.
暂无评论