检索结果-内蒙古大学图书馆

VOICED speech coding AT VERY-LOW BIT RATES BASED ON FORWARD-BACKWARD WAVE-FORM PREDICTION

IEEE TRANSACTIONS ON speech AND AUDIO PROCESSING 1995年第1期3卷 40-47页

作者： YANG, G LEICH, H BOITE, R Lernout and Hauspie Speech Products N. V. Belgium Faculté Polytechnique de Mons Laboratory T. C. T. S. Mons Belgium

Techniques for coding voiced speech at very low bit rates are investigated and a new algorithm, designed to produce high quality speech with low complexity, is proposed. This algorithm encodes and transmits partial representative waveforms (RW's) from which the complete speech waveforms are reconstructed by using a method called forward-backward waveform prediction (FBWP). The RW is encoded at 20-30 ms intervals with a low complexity approach, taking into account the special initial conditions of short- and long-term biters. The basic idea of FBWP is essentially consistent with that of the PWI algorithm, which was reported to be capable of producing high-quality voiced speech at a bit rate of between 3.0 and 4.0 kb/s. By implementing the FBWP in the time domain, fast computation is thereby made possible while high-quality speech can be obtained at bit rate of about 3 kb/s. As in the PWI method, the proposed algorithm may be combined with an LP-based speech coder which uses a noise-like excitation to reproduce unvoiced speech.

关键词： speech coding Bit rate Vocoders Distortion Algorithm design and analysis Filters Reverberation Quantization Signal generators

来源：评论

学校读者我要写书评

暂无评论

Scalable and Efficient Neural speech coding: A Hybrid Design

引用

IEEE-ACM TRANSACTIONS ON AUDIO speech AND LANGUAGE PROCESSING 2022年 30卷 12-25页

作者： Zhen, Kai Sung, Jongmo Lee, Mi Suk Beack, Seungkwon Kim, Minje Indiana Univ Dept Comp Sci Bloomington IN 47408 USA Indiana Univ Cognit Sci Program Bloomington IN 47408 USA Elect & Telecommun Res Inst Daejeon 34129 South Korea Indiana Univ Dept Intelligent Syst Engn Bloomington IN 47408 USA

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model components to NWC, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where each NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. The hybrid design integrates LPC and NWC by redefining LPC's quantization as a differentiable process, making the system training an end-to-end manner. The decoder of proposed system is with either one NWC (0.12 million parameters) in low to medium bitrate ranges (12 to 20 kbps) or two NWCs in the high bitrate (32 kbps). Although the decoding complexity is not yet as low as that of conventional speech codecs, it is significantly reduced from that of other neural speech coders, such as a WaveNet-based vocoder. For wide-band speech coding quality, our system yields comparable or superior performance to AMR-WB and Opus on TIMIT test utterances at low and medium bitrates. The proposed system can scale up to higher bitrates to achieve near transparent performance.

关键词： speech coding Bit rate Encoding Decoding Vocoders Complexity theory speech codecs Neural speech coding waveform coding representation learning model complexity

来源：评论

学校读者我要写书评

暂无评论

A NEW EFFICIENT ALGORITHM TO COMPUTE THE LSP PARAMETERS FOR speech coding

引用

SIGNAL PROCESSING 1992年第2期28卷 201-212页

作者： SAOUDI, S BOUCHER, JM LEGUYADER, A Départment Mathématiques et Systèmes de Communications ENST-Br BP 832 29285 Brest France Départment Codage et Modèles de Communications CNET Route de Trégastel BP 40 22301 Lannion France

In this paper, the split Levinson algorithm is used to develop an efficient algorithm to compute the Line Spectrum Pairs (LSP) in Linear Predictive coding (LPC) of speech. We propose two new real functions defined from the reciprocal and antireciprocal parts of the predictor polynomials obtained from the split Levinson algorithm. These functions are shown to obey three-term recurrence relations. Thus the LSP parameters are directly available from the eigenvalues of tridiagonal matrices, the entries of which are computed from only one version of the split Levinson algorithm. When compared with other existing methods, this algorithm is better in terms of complexity.

关键词： SPLIT LEVINSON ALGORITHM LSP PARCOR LPC LOW BIT-RATE EIGENVALUE TRIDIAGONAL MATRIX speech ANALYSIS speech coding

来源：评论

学校读者我要写书评

暂无评论

An I-phone system design and implementation with a portable speech coding coprocessor

引用

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS 1997年第4期43卷 1262-1269页

作者： Chen, RX Chen, LG Chen, MJ Tsai, TH Natl Taiwan Univ Dept Elect Engn Taipei 10764 Taiwan

This paper presents a high quality, low bit rate, and portable Internet-phone system. The system consists of a mixed implementation of software and hardware, The hardware includes a portable box that can be plugged into the conventional parallel port. Three major parts are considered in this box: the speech compression unit, the host interface, and the speakerphone module. A low-cost non-delicate speech coprocessor is embedded to process the heavy job of speech coding, a CPLD device is employed to control the host access timing, a 16-bits PCM CODEC and an audio amplifier with acoustic echo cancellation features are introduced to optimize the speakerphone module. The experimental coding rate is 8.5kbps. In such rate, the popular modems can conform to offer full-duplex speech in real time. Modern applications of this system are dropped on the digital simultaneous voice data (DSVD). Such as Net-game's talking and Video-conferencing.

关键词： Videoconferencing computer software Audio amplifiers Computer hardware Coprocessors speech coding host interface speech computer utility System design

来源：评论

学校读者我要写书评

暂无评论

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第11期82卷 16195-16212页

作者： Krobba, Ahmed Debyeche, Mohamed Selouani, Sid Ahmed Univ USTHB Fac Elect Engn Speech Commun & Signal Proc Lab Algiers Algeria Univ Moncton LARIHS Lab Campus Shappaing Moncton NB Canada

Currently, the majority of the state-of-the-art speaker recognition systems predominantly use short-term cepstral feature extraction approaches to parameterize the speech signals. In this paper, we propose new auditory features based Caelen auditory model that simulate the external, middle and inner parts of the ear and Gammtone filter for speaker recognition system, called Caelen Auditory Model Gammatone Cepstral Coefficients (CAMGTCC). The performances evaluations of the proposed feature are carried by the TIMIT and NIST 2008 corpus. The speech coding represent by Adaptive Multi-Rate wideband (AMR-WB) and noisy conditions using various noises SNR levels which are extracted from NOISEX-92. Speaker recognition system using GMM-UBM and i-vector-GPLDA modelling. The experimental results demonstrate that the proposed feature extraction method performs better compared to the Gammatone Cepstral Coefficients (GTCC) and Mel Frequency Cepstral Coefficients (MFCC) features. For speech coding distortion, the features extraction proposed improve the robustness of codec-degraded speech at different bit rates. In addition, when the test speech signals are corrupted with noise at SNRs ranging from (0 dB to 15 dB), we observe that CAMGTCC achieves overall equal error rate (EER) reduction of 10.88% to 6.8% relative, compared to baselines.

关键词： Speaker recognition Caelen auditory model Gammtone filter speech coding Noise environment GMM-UBM I-vetcor G-PLDA

来源：评论

学校读者我要写书评

暂无评论

Latent-Domain Predictive Neural speech coding

引用

IEEE-ACM TRANSACTIONS ON AUDIO speech AND LANGUAGE PROCESSING 2023年 31卷 2111-2123页

作者： Jiang, Xue Peng, Xiulian Xue, Huaying Zhang, Yuan Lu, Yan Commun Univ China Sch Informat & Commun Engn Beijing 100024 Peoples R China Microsoft Res Asia Beijing 100080 Peoples R China Commun Univ China State Key Lab Media Convergence & Commun Beijing 100024 Peoples R China

Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind features with a convolutional neural network for encoding, by which there are still temporal redundancies within encoded features. This article introduces latent-domain predictive coding into the VQ-VAE framework to fully remove such redundancies and proposes the TF-Codec for low-latency neural speech coding in an end-to-end manner. Specifically, the extracted features are encoded conditioned on a prediction from past quantized latent frames so that temporal correlations are further removed. Moreover, we introduce a learnable compression on the time-frequency input to adaptively adjust the attention paid to main frequencies and details at different bitrates. A differentiable vector quantization scheme based on distance-to-soft mapping and Gumbel-Softmax is proposed to better model the latent distributions with rate constraint. Subjective results on multilingual speech datasets show that, with low latency, the proposed TF-Codec at 1 kbps achieves significantly better quality than Opus at 9 kbps, and TF-Codec at 3 kbps outperforms both EVS at 9.6 kbps and Opus at 12 kbps. Numerous studies are conducted to demonstrate the effectiveness of these techniques.

关键词： speech coding Predictive coding Decoding Bit rate Codecs Termination of employment Audio coding Neural audio/speech coding auto-encoder predictive coding

来源：评论

学校读者我要写书评

暂无评论

Low bit-rate speech coding based on an improved sinusoidal model

引用

speech COMMUNICATION 2001年第4期34卷 369-390页

作者： Ahmadi, S Spanias, AS Arizona State Univ Dept Elect Engn Ctr Telecommun Res Tempe AZ 85287 USA Nokia Mobile Phones Inc San Diego CA 92131 USA

This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. Tn addition, subjective quality of the reproduced speech is considerably improved. (C) 2001 Elsevier Science B,V. All rights reserved.

关键词： speech coding sinusoidal model phase modeling speech classification linear prediction frame interpolation

来源：评论

学校读者我要写书评

暂无评论

Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4 kb/s speech coding

引用

IEEE TRANSACTIONS ON speech AND AUDIO PROCESSING 1993年第4期1卷 373-385页

作者： LeBlanc, W. P. Bhattacharya, B. Mahmoud, S. A. Cuperman, V. Carleton Univ Dept Syst & Comp Engn Ottawa ON K1S 5B6 Canada Simon Fraser Univ Sch Engn Sci Burnaby BC V5A 1S6 Canada

This paper presents a tree-searched multi-stage vector quantization scheme for LPC parameters which achieves spectral distortion lower than 1 dB with low complexity and good robustness using rates as low as 22 bits/frame. The M-L search is used and it is shown that it achieves performance close to that of the optimal search for a relatively small M. A new joint codebook design strategy for multi-stage VQ is presented which improves convergence speed and the VQ performance measures. The best performance/complexity trade-offs are obtained with relatively small size codebooks cascaded in a 3 4 stage configuration. It is shown experimentally that as the number of stages is increased above the optimal performance/complexity trade-off, the quantizer robustness and outlier performance can be improved at the expense of a slight increase in rate. Results for LAR and U P parameters are presented. A training technique that reduces outliers at the expense of a slight average performance degradation is introduced. The robustness across different languages, input spectral shapings, and in the presence of independent random channel errors is studied. Experimental results show that tree-searched multi-stage VQ significantly outperforms the split codebook approach.

关键词： Robustness Linear predictive coding speech coding Vector quantization Computational complexity Convergence Velocity measurement Noise shaping Communication channels Councils

来源：评论

学校读者我要写书评

暂无评论

On memoryless quantization in speech coding

引用

IEEE SIGNAL PROCESSING LETTERS 1996年第8期3卷 228-230页

作者： Kleijn, WB Hagen, R ERICSSON RADIO SYST SPEECH CODING RESS-16480 STOCKHOLMSWEDEN

In memoryless quantization, neither the encoder nor the decoder has memory, and quantization noise shaping is not used, We show that, by constraining the parameter dynamics during quantization at the encoder, the performance of speech coders can be enhanced significantly without adding to the delay, The proposed method retains the advantages of memoryless quantization, including channel-error robustness.

关键词： Quantization speech coding Decoding Noise shaping Bit rate Distortion Steady-state speech enhancement Added delay Noise robustness

来源：评论

学校读者我要写书评

暂无评论

REDUCING THE COMPLEXITY AND STORAGE OF CELP speech coding USING A SELF-ORTHOGONAL CODEBOOK

引用

ELECTRONICS LETTERS 1993年第10期29卷 928-930页

作者： LAW, KW LEUNG, WF CHAN, CF Department of Electrical Engineering City Polytechnic of Hong Kong Kowloon Hong Kong

An algorithm is proposed to reduce the complexity and memory requirement of coded-excited linear prediction (CELP) speech coding. The new algorithm is based on the concept of designing a special codebook such that each codeword is orthogonal to its shifting entries. With this orthogonal property, the algorithm reduces the codeword searching complexity of CELP coding significantly. Besides, by rearranging the codeword, only 12.5% of the conventional codebook storage is required. Both segmental SNR and informal listening showed that the performance of the algorithm is equivalent to that of the original CELP algorithm.

关键词： speech coding SIGNAL PROCESSING

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：