Application of a high-efficiency voice communication systems based on broadband over power line-power line communication (BPL-PLC) technology in medium voltage networks, including hazardous areas (like the oil and min...
详细信息
Application of a high-efficiency voice communication systems based on broadband over power line-power line communication (BPL-PLC) technology in medium voltage networks, including hazardous areas (like the oil and mining industry), as a redundant mean of wired communication (apart from traditional fiber optics and electrical wires) can be beneficial. Due to the possibility of utilizing existing electrical infrastructure, it can significantly reduce deployment costs. Additionally, it can be applied under difficult conditions, thanks to battery-powered devices. During an emergency situation (e.g., after coal dust explosion), the medium voltage cables are resistant to mechanical damage, providing a potentially life-saving communication link between the supervisor, rescue team, paramedics, and the trapped personnel. The assessment of such a system requires a comprehensive and accurate examination, including a number of factors. Therefore, various models were tested, considering: different transmission paths and types of coupling (inductive and capacitive), as well as various lengths of transmitted data packets. Next, a subjective quality evaluation study was carried out, considering speech signals from a number of languages (English, German, and Polish). Based on the obtained results, including both simulations and measurements, appropriate practical conclusions were formulated. Results confirmed the applicability of BPL-PLC technology as an efficient voice communication system for the oil and mining industry.
Generative audio models based on neural networks have led to considerable improvements across fields including speech enhancement, source separation, and text-to-speech synthesis. These systems are typically trained i...
详细信息
ISBN:
(纸本)9781450368896
Generative audio models based on neural networks have led to considerable improvements across fields including speech enhancement, source separation, and text-to-speech synthesis. These systems are typically trained in a supervised fashion using simple element-wise l(1) or l(2) losses. However, because they do not capture properties of the human auditory system, such losses encourage modelling perceptually meaningless aspects of the output, wasting capacity and limiting performance. Additionally, while adversarial models have been employed to encourage outputs that are statistically indistinguishable from ground truth and have resulted in improvements in this regard, such losses do not need to explicitly model perception as their task;furthermore, training adversarial networks remains an unstable and slow process. In this work, we investigate an idea fundamentally rooted in psychoacoustics. We train a neural network to emulate an MP3 codec as a differentiable function. Feeding the output of a generative model through this MP3 function, we remove signal components that are perceptually irrelevant before computing a loss. To further stabilize gradient propagation, we employ intermediate layer outputs to define our loss, as found useful in image domain methods. Our experiments using an autoencoding task show an improvement over standard losses in listening tests, indicating the potential of psychoacoustically motivated models for audio generation.
This paper compares the coding efficiency between the range coder in the Opus coder and the Huffman coder used in the MP-3 (MPEG-I Layer 3) and MPEG-2 AAC. The results show that the range coder has efficiency advantag...
详细信息
This paper compares the coding efficiency between the range coder in the Opus coder and the Huffman coder used in the MP-3 (MPEG-I Layer 3) and MPEG-2 AAC. The results show that the range coder has efficiency advantage of about 9 % at a rate of 128 kbps. The simulation, in a sense, indicates that transcoding from the Opus format to MP-3 or AAC format will lead to quality degradation.
Spectral envelope modelling is a central part of speech and audio codecs and is traditionally based on either vector quantization or scalar quantization followed by entropy coding. To bridge the coding performance of ...
详细信息
ISBN:
(纸本)9781538646595
Spectral envelope modelling is a central part of speech and audio codecs and is traditionally based on either vector quantization or scalar quantization followed by entropy coding. To bridge the coding performance of vector quantization with the low complexity of the scalar case, we propose an iterative approach for entropy coding the spectral envelope parameters. For each parameter, a univariate probability distribution is derived from a Gaussian mixture model of the joint distribution and the previously quantized parameters used as a-priori information. Parameters are then iteratively and individually scalar quantized and entropy coded. Unlike vector quantization, the complexity of proposed method does not increase exponentially with dimension and bitrate. Moreover, the coding resolution and dimension can be adaptively modified without retraining the model. Experimental results show that these important advantages do not impair coding efficiency compared to a state-of-art vector quantization scheme.
The Predictive Vector Quantized Variational AutoEncoder is proposed to improve the reconstruction error of the conventional VQ-VAE. The proposed model can predict the current data from the previous data. The performan...
详细信息
ISBN:
(数字)9781728162898
ISBN:
(纸本)9781728162904
The Predictive Vector Quantized Variational AutoEncoder is proposed to improve the reconstruction error of the conventional VQ-VAE. The proposed model can predict the current data from the previous data. The performance of the quantized spectral envelope parameters of the high-quality 48 kHz WORLD vocoder is evaluated. The results indicate that the Predictive Vector Quantized Variational AutoEncoder has a lower distortion with four target bitrates in term of log-spectral distortion, compared with the conventional VQ-VAE.
High Resolution Envelope Processing (HREP) is a new tool for improved perceptual coding of audio signals that predominantly consist of many den se transient events, such as applause, rain drop sounds, etc. These signa...
详细信息
ISBN:
(纸本)9781509041176
High Resolution Envelope Processing (HREP) is a new tool for improved perceptual coding of audio signals that predominantly consist of many den se transient events, such as applause, rain drop sounds, etc. These signals have traditionally been very difficult to code for perceptual audio codecs, particularly at low bit rates. Based on the gain control principle, HREP acts as a pre-/post-processor pair to perceptual audio codecs and preserves the temporal fine structure and subjective quality of applause-like signals. Subjective tests have shown a significant improvement in audio quality of around 12 MUSHRA points by HREP processing at 48 kbps stereo when used together with an MPEG-H 3D audio codec. The new coding tool has been adopted as part of MPEG-H 3D audio Second Edition.
Current methods for immersive playback of spatial sound content aim at flexibility in terms of encoding and decoding, abstracting the two from the recording or playback setup. Ambisonics constitutes such a method, tha...
详细信息
ISBN:
(纸本)9781538646588
Current methods for immersive playback of spatial sound content aim at flexibility in terms of encoding and decoding, abstracting the two from the recording or playback setup. Ambisonics constitutes such a method, that is however signal-independent, and at low spatial resolutions fails to provide appropriate spatialization cues to the listener, with potential severe colouration effects and localization ambiguity. We present a new signal-dependent method for parametric analysis and synthesis of ambisonic sound scenes that takes advantage of the flexibility of Ambisonics as a spatial audio format, while improving reproduction. The proposed approach considers a more general acoustic model than previous proposals, with multiple source signals and a non isotropic ambient component. According to a listening test using headphones, the method is perceived closer to binaural reference sound scenes than ambisonic playback.
A low delay audio coding scheme with good perceptual audio quality for a desired limited bit rate is presented. The proposed audio coding scheme is based on differential pulse code modulation (DPCM) and block compande...
详细信息
A low delay audio coding scheme with good perceptual audio quality for a desired limited bit rate is presented. The proposed audio coding scheme is based on differential pulse code modulation (DPCM) and block companded (BC) quantization. Prediction is realized as a FIR filter in lattice structure. DPCM performs in feedback manner, therefore no transmission of prediction filter coefficients is needed. The incorporation of BC quantization in the DPCM relies on a prediction error recalculation scheme. The use of BC quantization in the DPCM allows to accurately follow the prediction error signal. This improves the perceptual audio quality significantly compared to a plain DPCM with an adaptive quantizer. An algorithmic delay below a half millisecond and an overhead of less than a half bit per sample is introduced due to the short fixed block length of the BC quantizer. Therefore, a real time bidirectional audio application is achievable.
A new scalable audio coding scheme is introduced in this paper. Its core idea is to create one additional scalability dimension during the encoding process for the purpose of generating a plural of scalable sub-bitstr...
详细信息
ISBN:
(纸本)9781479903573
A new scalable audio coding scheme is introduced in this paper. Its core idea is to create one additional scalability dimension during the encoding process for the purpose of generating a plural of scalable sub-bitstreams. Based on the multiple sub-streams, a smart truncator is designed that can truncate these sub-bitstreams with optimal rate-distortion (R-D) tradeoff. Benefited from the flexible R-D trade-off, the proposed new scheme could, within a wide bitrate range, outperform those traditional scalable coding schemes, which usually provides a fixed R-D relationship designed at a specified bitrate. To verify the performance, the proposed scheme is further implemented based on a prior art scalable audio codec. Significant quality improvement is observed from the new codec via a series of subjective listening tests.
A scalable audio coding method is proposed using a technique, Quantization Index Modulation, borrowed from watermarking. Some of the information of each layer output is embedded (watermarked) in the previous layer. Th...
详细信息
ISBN:
(纸本)9781479900145
A scalable audio coding method is proposed using a technique, Quantization Index Modulation, borrowed from watermarking. Some of the information of each layer output is embedded (watermarked) in the previous layer. This approach leads to a saving in bitrate while keeping the distortion almost unchanged. This makes the scalable coding system more efficient in terms of Rate-Distortion. The results show that the proposed method outperforms the scalable audio coding based on reconstruction error quantization which is used in practical systems such as MPEG-4 AAC.
暂无评论