In interactive audio services, users can render audioobjects rather freely to match their desires and the spatial audio object coding (SAOC) scheme is fairly good both in the sense of bitrate and audio quality. But r...
详细信息
In interactive audio services, users can render audioobjects rather freely to match their desires and the spatial audio object coding (SAOC) scheme is fairly good both in the sense of bitrate and audio quality. But rather perceptible audio quality degradation can occur when an object is suppressed or played alone. To complement this, the SAOC scheme with Two-Step coding (SAOC-TSC) was proposed. But the bitrate of the side information increases two times compared to that of the original SAOC due to the bitrate needed for the residual coding used to enhance the audio quality. In this paper, an efficient residual coding method of the SAOC-TSC is proposed to reduce the side information bitrate without audio quality degradation or complexity increase.
This paper presents the mastering signal processing with the residual coding scheme in spatial audio object coding. The proposed method can eliminate the difference between the original down-mix signal and the compens...
详细信息
ISBN:
(纸本)9781479906048;9781479906024
This paper presents the mastering signal processing with the residual coding scheme in spatial audio object coding. The proposed method can eliminate the difference between the original down-mix signal and the compensated down-mix signal and enhance the sound quality, successfully. Experimental result shows that the proposed method can greatly improve the performance of the original mastering signal processing and the sound quality of the proposed method is almost same with that of the output signal decoded with the original down-mix signal.
An interactive audio service is a new conceptual audio service that provides the users with opportunities for a variety of experiences on the alternative and advanced audio services. In the interactive audio service, ...
详细信息
An interactive audio service is a new conceptual audio service that provides the users with opportunities for a variety of experiences on the alternative and advanced audio services. In the interactive audio service, users can freely control various audioobjects to make their own audio sounds. A spatial audio object coding (SAOC) is a useful technology that can support most parts of the interactive audio service with a relatively low bit-rate, but is very poor to perfect gain control of a certain audioobject, i.e., the target audioobject. In this paper, the SAOC with a two-step coding structure is proposed to efficiently handle the target audioobject as well as the normal audioobjects. A transform coded excitation (TCX) based residual coding scheme is presented in the context of the sound quality enhancement. From experimental results, it can be noted that the various audioobjects can be successfully handled with respect to the bit-rate and the sound quality by using the proposed two-step coding structure SAOC.
Parametric audioobjectcoding employs principles of informed source separation for obtaining object reconstructions from the mixture signal used in the transport enabling flexible output signal rendering into output ...
详细信息
ISBN:
(纸本)9789082797015
Parametric audioobjectcoding employs principles of informed source separation for obtaining object reconstructions from the mixture signal used in the transport enabling flexible output signal rendering into output scenes unknown at the encoder. Information of the object level in the rendered output is important for loudness and dynamic range control applications, e.g., in broadcast. This paper proposes a method for estimating the object level in an arbitrary output scene based on the downmix signal level that is then projected through the combined un-mixing and rendering matrix. This avoids explicit reconstruction of the objects only for the level estimation offering computational complexity savings. In the evaluations, the proposed method shows a high estimation accuracy with a root-mean squared error of 0.26 LUFS (loudness units relative to full scale) compared to 3.7 LUFS of the baseline with object reconstructions.
In this paper, we show that tensor compression techniques based on randomization and partial observations are very useful for spatial audio object coding. In this application, we aim at transmitting several audio sign...
详细信息
ISBN:
(纸本)9781509041176
In this paper, we show that tensor compression techniques based on randomization and partial observations are very useful for spatial audio object coding. In this application, we aim at transmitting several audio signals called objects from a coder to a decoder. A common strategy is to transmit only the downmix of the objects along some small information permitting reconstruction at the decoder. In practice, this is done by transmitting compressed versions of the objects spectrograms and separating the mix with Wiener filters. Previous research used nonnegative tensor factorizations in this context, with bitrates as low as 1 kbps per object. Building on recent advances on tensor compression, we show that the computation time for encoding can be extremely reduced. Then, we demonstrate how the mixture can be exploited at the decoder to avoid the transmission of many parameters, permitting bitrates as low as 0 : 1 kbps per object for comparable performance.
An interactive audio service provides audio editing functionality to users. In the service, the users can control the wanted audioobjects to make their own audio sound using a spatial audio object coding (SAOC) schem...
详细信息
ISBN:
(纸本)9781617821233
An interactive audio service provides audio editing functionality to users. In the service, the users can control the wanted audioobjects to make their own audio sound using a spatial audio object coding (SAOC) scheme. However, the vocal object cannot be removed perfectly from the down-mix signal in Karaoke mode of the SAOC. Thus, in this paper, a modified SAOC scheme with harmonic extraction and elimination structures are proposed. The proposed scheme perfectly removes vocal object using the harmonic information of the vocal object. Subjective and objective evaluation results show the proposed scheme is superior to the conventional ones.
In this paper, we show that tensor compression techniques based on randomization and partial observations are very useful for spatial audio object coding. In this application, we aim at transmitting several audio sign...
详细信息
ISBN:
(纸本)9781509041183
In this paper, we show that tensor compression techniques based on randomization and partial observations are very useful for spatial audio object coding. In this application, we aim at transmitting several audio signals called objects from a coder to a decoder. A common strategy is to transmit only the downmix of the objects along some small information permitting reconstruction at the decoder. In practice, this is done by transmitting compressed versions of the objects spectrograms and separating the mix with Wiener filters. Previous research used nonnegative tensor factorizations in this context, with bitrates as low as 1 kbps per object. Building on recent advances on tensor compression, we show that the computation time for encoding can be extremely reduced. Then, we demonstrate how the mixture can be exploited at the decoder to avoid the transmission of many parameters, permitting bitrates as low as 0.1 kbps per object for comparable performance.
Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources an...
详细信息
Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a side-information may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art.
Interactive audio services (IASs) usually provide users with audio editing functionality and they can render their own sounds according to their preference. For IASs, the spatial audio object coding (SAOC) is an appro...
详细信息
Interactive audio services (IASs) usually provide users with audio editing functionality and they can render their own sounds according to their preference. For IASs, the spatial audio object coding (SAOC) is an appropriate multichannel coding tool that satisfies most of the required functionalities with relatively low bit rate. Nevertheless, the SAOC usually fails to remove a specific object successfully, especially the vocal object in the case of the Karaoke service. In addition, to expand the service to mobile environments, lower bit rate and complexity are required. Thus, we propose a new SAOC vocal harmonic coding technique to improve the background music quality in the Karaoke service. Namely, utilizing the harmonic information of the vocal object, we removed the harmonics of the vocal object remaining in the background music. Our experimental results confirm that the background music quality is improved by the proposed algorithm even with the low bit rate and complexity.
Demixing consists in recovering the sounds that compose a multichannel mix. Important applications include karaoke or respatialization. Several approaches to this problem have been proposed in a coding/decoding framew...
详细信息
ISBN:
(纸本)9781479903566
Demixing consists in recovering the sounds that compose a multichannel mix. Important applications include karaoke or respatialization. Several approaches to this problem have been proposed in a coding/decoding framework, which are denoted either as spatial audio object coding or informed source separation. They assume that the constituent sounds are available at an encoding stage and used to compute a side-information transmitted to the end-user. At a decoding stage, only the mixtures and the side information are used to recover the sources. Here, we propose an advanced model, which encompasses many practical scenarios and permits to reach bitrates as low as 0.5 kbps/source. First, the sources may be mono or multichannel. Second, the mixing process is assumed to be diffuse, generalizing the usual linear-instantaneous or convolutive cases and permitting professional mixes to be processed. Third, the signals to be recovered may either be the original sources or their spatial images.
暂无评论