检索结果-内蒙古大学图书馆

IEEE Spoken Language Technology Workshop

作者： Xiao-Hang Jiang Yang Ai Rui-Chen Zheng Hui-Peng Du Ye-Xin Lu Zhen-Hua Ling National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei P. R. China

ISBN: (数字)9798350392258

ISBN: (纸本)9798350392265

In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which is then discretized by a residual vector quantizer (RVQ). Subsequently, the decoder decodes the MDCT spectrum from the quantized latent code and reconstructs audio via inverse MDCT. During the training phase, a novel multi-resolution MDCT-based discriminator (MR-MDCTD) is adopted to discriminate the natural or decoded MDCT spectrum for adversarial training. Experimental results confirm that, in scenarios with high sampling rates and low bitrates, the MDCTCodec exhibited high decoded audio quality, improved training and generation efficiency, and compact model size compared to baseline codecs. Specifically, the MDCTCodec achieved a ViSQOL score of 4.18 at a sampling rate of 48 kHz and a bitrate of 6 kbps on the public VCTK corpus.

关键词： Training Codecs Codes Quantization (signal) audio coding Conferences Bit rate Vectors Decoding Discrete cosine transforms

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of Spike coding Sound Sensor

Design and Implementation of Spike Coding Sound Sensor

引用

Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD), International Conference on

作者： Fanxing Yang Youdong Zhang Yue Wu Lingfei Mo School of Instrument Science and Engineering Southeast University Nanjing China

Compared with the existing speech recognition technology, speech recognition based on a spiking neural network has higher robustness, lower cost, lower power consumption, and more biological basis. Spiking neural network highly depends on spiking signals as input, so an audio coding method for spiking neural networks is necessary. The audio coding circuit in this paper is based on the principle of the human cochlea and is completely constructed by analog signals. It enables audio coding to achieve faster response speed and higher robustness. Based on the principle of human cochlea and the experience of existing audio coding methods, this paper designs a new spike coding circuit for the audio signal. After the audio signal is input into the circuit, the channels of different frequencies have spiking signal output.

关键词： Wiring Costs Power demand audio coding Welding Neural networks Speech recognition

来源：评论

学校读者我要写书评

暂无评论

LOW-COMPLEXITY SEMI-PARAMETRIC JOINT-STEREO audio TRANSFORM coding

LOW-COMPLEXITY SEMI-PARAMETRIC JOINT-STEREO AUDIO TRANSFORM ...

引用

European Signal Processing Conference

作者： Christian R. Helmrich Andreas Niedermeier Stefan Bayer Bernd Edler International Audio Laboratories Erlangen Fraunhofer Institut fur Integrierte Schaltungen (IIS)

ISBN: (纸本)9781479988518

Traditional audio codecs based on real-valued transforms utilize separate and largely independent algorithmic schemes for parametric coding of noise-like or high-frequency spectral components as well as channel pairs. It is shown that in the frequency-domain part of coders such as Extended HE-AAC, these schemes can be unified into a single algorithmic block located at the core of the modified discrete cosine transform path, enabling greater flexibility like semi-parametric coding and large savings in codec delay and complexity. This paper focuses on the stereo coding aspect of this block and demonstrates that, by using specially chosen spectral configurations when deriving the parametric side-information in the encoder, perceptual artifacts can be reduced and the spatial processing in the decoder can remain real-valued. Listening tests confirm the benefit of our proposal at intermediate bit-rates.

关键词： audio coding Decorrelation MDCT Stereo

来源：评论

学校读者我要写书评

暂无评论

An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction

引用

EURASIP JOURNAL ON audio SPEECH AND MUSIC PROCESSING 2022年第1期2022卷 10-10页

作者： Cobos, Maximo Ahrens, Jens Kowalczyk, Konrad Politis, Archontis Univ Valencia Dept Comp Sci Burjassot 46100 Spain Chalmers Univ Technol Div Appl Acoust S-41296 Gothenburg Sweden AGH Univ Sci & Technol Inst Elect PL-30059 Krakow Poland Tampere Univ Dept Informat Technol & Commun Sci FI-33720 Tampere Finland

The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial information carried by audio signals. This is in contrast to model-based methods, which impose spatial information from, for example, metadata like the intended position of a source onto signals that are otherwise free of spatial information. Signal processing has traditionally been at the core of spatial audio systems, and it continues to play a very important role. The irruption of deep learning in many closely related fields has put the focus on the potential of learning-based approaches for the development of data-based spatial audio applications. This article reviews the most important application domains of data-based spatial audio including well-established methods that employ conventional signal processing while paying special attention to the most recent achievements that make use of machine learning. Our review is organized based on the topology of the spatial audio pipeline that consist in capture, processing/manipulation, and reproduction. The literature on the three stages of the pipeline is discussed, as well as on the spatial audio representations that are used to transmit the content between them, highlighting the key references and elaborating on the underlying concepts. We reflect on the literature based on a juxtaposition of the prerequisites that made machine learning successful in domains other than spatial audio with those that are found in the domain of spatial audio as of today. Based on this, we identify routes that may facilitate future advancement.

关键词： Spatial audio Machine learning Deep learning Array processing Ambisonics Virtual reality Binaural audio audio coding Scene analysis

来源：评论

学校读者我要写书评

暂无评论

ASSD: Synthetic Speech Detection in the AAC Compressed Domain

ASSD: Synthetic Speech Detection in the AAC Compressed Domai...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Amit Kumar Singh Yadav Ziyue Xiang Emily R. Bartusiak Paolo Bestagini Stefano Tubaro Edward J. Delp Video and Image Processing Lab (VIPER) School of Electrical and Computer Engineering Purdue University West Lafayette Indiana USA Dipartimento di Elettronica Informazione e Bioingegneria Politecnico di Milano Milano Italy

Synthetic human speech signals have become very easy to generate given modern text-to-speech methods. When these signals are shared on social media they are often compressed using the Advanced audio coding (AAC) standard. Our goal is to study if a small set of coding metadata contained in the AAC compressed bit stream is sufficient to detect synthetic speech. This would avoid decompressing of the speech signals before analysis. We call our proposed method AAC Synthetic Speech Detection (ASSD). ASSD extracts information from the AAC compressed bit stream without decompressing the speech signal. ASSD analyzes the information using a transformer neural network. In our experiments, we compressed the ASVspoof2019 dataset according to the AAC standard using different data rates. We compared the performance of ASSD to a time domain based and a spectrogram based synthetic speech detection methods. We evaluated ASSD on approximately 71k compressed speech signals. The results show that our proposed method typically only requires 1000 bits per speech block/frame from the AAC compressed bit stream to detect synthetic speech. This is much lower than other reported methods. Our method also had a 9.7 percentage points higher detection accuracy compared to existing methods.

关键词： Social networking (online) audio coding Neural networks Metadata Transformers Speech synthesis Data mining

来源：评论

学校读者我要写书评

暂无评论

Fast Randomization for Distributed Low-Bitrate coding of Speech and audio

引用

IEEE-ACM TRANSACTIONS ON audio SPEECH AND LANGUAGE PROCESSING 2018年第1期26卷 19-30页

作者： Backstrom, Tom Fischer, Johannes Aalto Univ Dept Signal Proc & Acoust Espoo 02150 Finland Int Audio Labs Erlangen D-91058 Erlangen Germany

Efficient coding of speech and audio in a distributed system requires that quantization errors across nodes are uncorrelated. Yet, with conventional methods at low bitrates, quantization levels become increasingly sparse, which does not correspond to the distribution of the input signal and, importantly, also reduces coding efficiency in a distributed system. We have recently proposed a distributed speech and audio codec design, which applies quantization in a randomized domain such that quantization errors are randomly rotated in the output domain. Similar to dithering, this ensures that quantization errors across nodes are uncorrelated and coding efficiency is retained. In this paper, we improve this approach by proposing faster randomization methods, with a computational complexity of O(N log N). The presented experiments demonstrate that the proposed randomizations yield uncorrelated signals, that perceptual quality is competitive, and that the complexity of the proposed methods is feasible for practical applications.

关键词： Orthonormal matrix superfast algorithm randomization distributed coding speech coding audio coding

来源：评论

学校读者我要写书评

暂无评论

Secure echo-hiding audio watermarking method based on improved PN sequence and robust principal component analysis

引用

IET SIGNAL PROCESSING 2020年第4期14卷 229-242页

作者： Wang, Shengbei Wang, Chao Yuan, Weitao Wang, Lin Wang, Jianming Tianjin Polytech Univ Tianjin Key Lab Autonomous Intelligence Technol S Tianjin 300387 Peoples R China Techfantasy Co Ltd Tianjin 300387 Peoples R China

Echo-hiding has been widely studied for audio watermarking. This study proposes a more secure echo-hiding method based on modified pseudo-noise (PN) sequence and robust principal component analysis (RPCA). In the proposed method, the RPCA is used to decompose the original audio signal into low-rank and sparse parts and then a pair of opposite modified PN sequences is employed to embed watermarks. The modified PN sequence improves the robustness of watermark detection by providing additional correlation peaks. Meanwhile, benefit from the RPCA and the opposite PN sequences, the security of the proposed method is improved since watermarks cannot be detected from the whole signal even if the PN sequence is known, which is an obvious improvement compared with the previous PN-based echo-hiding methods. In the watermark detection process, the authors make use of the low-rank and sparse characteristics of the watermarked signal to detect watermarks from the low-rank and sparse parts, respectively. Based on this basic framework, they also propose a multi-bit embedding scheme, which obtains a doubled embedding capacity compared with the previous PN-based echo-hiding methods. The proposed method was evaluated with respect to inaudibility, security, and robustness. The experiment results verified the effectiveness of the proposed method.

关键词： principal component analysis image coding audio watermarking audio coding watermark detection process watermarked signal secure echo-hiding audio watermarking method improved PN sequence robust principal component analysis pseudonoise sequence RPCA audio signal opposite modified PN sequences PN-based echo-hiding methods echo-hiding method

来源：评论

学校读者我要写书评

暂无评论

On the Consumption of Multimedia Content Using Mobile Devices: a Year to Year User Case Study

引用

ARCHIVES OF ACOUSTICS 2020年第2期45卷 321-328页

作者： Falkowski-Gilski, Przemyslaw Gdansk Univ Technol Fac Elect Telecommun & Informat Narutowicza 11-12 PL-80233 Gdansk Poland

In the early days, consumption of multimedia content related with audio signals was only possible in a stationary manner. The music player was located at home, with a necessary physical drive. An alternative way for an individual was to attend a live performance at a concert hall or host a private concert at home. To sum up, audio-visual effects were only reserved for a narrow group of recipients. Today, thanks to portable players, vision and sound is at last available for everyone. Finally, thanks to multimedia streaming platforms, every music piece or video, e.g. from one's favourite artist or band, can be viewed anytime and everywhere. The background or status of an individual is no longer an issue. Each person who is connected to the global network can have access to the same resources. This paper is focused on the consumption of multimedia content using mobile devices. It describes a year to year user case study carried out between 2015 and 2019, and describes the development of current trends related with the expectations of modern users. The goal of this study is to aid policymakers, as well as providers, when it comes to designing and evaluating systems and services.

关键词： audio coding broadcasting mobile devices multimedia signal processing streaming services

来源：评论

学校读者我要写书评

暂无评论

Temporal Tile Shaping for spectral gap filling in audio transform coding in EVS

Temporal Tile Shaping for spectral gap filling in audio tran...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： S. Disch C. Neukam K. Schmidt Fraunhofer Inst. for Integrated Circuits (IIS) Erlangen Germany

ISBN: (纸本)9781467369985

At low bitrates, next generation audio coders apply waveform preserving transform coding only for the perceptually most relevant parts of the signal. The resulting spectral gaps are filled in the decoder through techniques like Intelligent Gap Filling (IGF). IGF is currently being standardized in MPEG-H 3D-audio and also in 3GPP Enhanced Voice Service (EVS). In IGF processing, spectral tiles are copied from a spectral source location into a target location and subsequently adapted by parameter steered post-processing to best match relevant properties of the original signal. Important properties include the spectral and temporal envelope. Since IGF operates on Modified Discrete Cosine Transform (MDCT) spectra of rather long time blocks, temporal envelope shaping is not trivial. In this paper, Temporal Tile Shaping (TTS) is presented. TTS is based on linear prediction in the MDCT domain for shaping the temporal structure of the gap filling signal in the target tiles with sub-block granularity. A listening test demonstrates the advantage of the proposed method.

关键词： audio coding Enhanced Voice Service Intelligent Gap Filling Noise Filling Temporal Noise Shaping

来源：评论

学校读者我要写书评

暂无评论

Advancement of 22.2 Multichannel Sound Broadcasting Based on MPEG-H 3D audio

引用

IEEE TRANSACTIONS ON BROADCASTING 2020年第2期66卷 365-371页

作者： Sugimoto, Takehiro Aoki, Shuichi Hasegawa, Tomomi Komori, Tomoyasu NHK Japan Broadcasting Corp Sci & Technol Res Labs Tokyo 1578510 Japan

This study proposes improvements to 22.2 multichannel (22.2 ch) sound broadcasting service. 22.2 ch sound is currently used in the 8K satellite broadcasting in Japan. In this study, the audio system is migrated from channel-based audio to object-based audio. The object-based audio equips 22.2 ch sound with alternative and adaptive functionalities: the alternative functionality is related to dialogue controls such as multilingual services, while the adaptive functionality enables 22.2 ch sound to be adapted to the audio format of the playback equipment. Moving Picture Experts Group (MPEG)-H 3D audio (3DA), which is the latest audio coding standard, is used as the audio coding scheme. A real-time encoder and decoder based on 3DA was developed to verify the practicability of the proposed system. The encoded audio data is packetized and transmitted by MPEG-H MPEG Media Transport (MMT) to be multiplexed with video data. A transmission experiment with 8K video was carried out in which the proposed system was proved to operate as designed in this study.

关键词： Transform coding audio systems audio coding Satellite broadcasting Rendering (computer graphics) Real-time systems 222 multichannel sound system terrestrial broadcasting object-based audio audio coding MPEG-H 3D audio MPEG-H MMT

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：