Writer identification is carried out using handwritten text. The feature vector is derived by means of morphologically processing the horizontal profiles (projection functions) of the words. The projections are derive...
详细信息
Writer identification is carried out using handwritten text. The feature vector is derived by means of morphologically processing the horizontal profiles (projection functions) of the words. The projections are derived and processed in segments in order to increase the discrimination efficiency of the Feature vector. Extensive study of the statistical properties of the feature space is provided. Both Bayesian classifiers and neural networks are employed to lest the efficiency of the proposed feature. The achieved identification success using a long word exceeds 95%. (C) 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
This paper proposes a new complex autoencoder suitable for learning spectrally efficient, constant envelope waveform coding. In contrast to prior work, we model the encoder output layer as a phase modulation layer wit...
详细信息
ISBN:
(纸本)9781728197944
This paper proposes a new complex autoencoder suitable for learning spectrally efficient, constant envelope waveform coding. In contrast to prior work, we model the encoder output layer as a phase modulation layer with a complex exponential activation function. In addition, we model the decoder with a complex-valued feature detection layer that may be coherent or noncoherent. The complex topology leads to noncoherent waveform coding methods not obtained in prior studies. The paper provides a mathematical framework for training the proposed autoencoder along with illustrative examples that demonstrate its ability to learn improved spectral efficiency relative to traditional orthogonal and biorthogonal modulations.
This paper presents an overview of research activities in Japan in the field of very low bit-rate video coding. Related research based on the concept of ''intelligent image coding'' started in the mid-...
详细信息
This paper presents an overview of research activities in Japan in the field of very low bit-rate video coding. Related research based on the concept of ''intelligent image coding'' started in the mid-1980's. Although this concept originated from the consideration of a new type of image coding, it can also be applied to other interesting applications such as human interface and psychology. On the other hand, since the beginning of the 1990's, research on the improvement of waveform coding has been actively performed to realize very low bit-rate video coding. Key techniques employed here are improvement of motion compensation and adoption of region segmentation. In addition to the above, we propose new concepts of image coding, which have the potential to open up new aspects of image coding, e.g. ideas of interactive image coding, integrated 3-D visual communication and coding of multimedia information considering mutual relationship amongst various media.
We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding an...
详细信息
We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model components to NWC, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where each NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. The hybrid design integrates LPC and NWC by redefining LPC's quantization as a differentiable process, making the system training an end-to-end manner. The decoder of proposed system is with either one NWC (0.12 million parameters) in low to medium bitrate ranges (12 to 20 kbps) or two NWCs in the high bitrate (32 kbps). Although the decoding complexity is not yet as low as that of conventional speech codecs, it is significantly reduced from that of other neural speech coders, such as a WaveNet-based vocoder. For wide-band speech coding quality, our system yields comparable or superior performance to AMR-WB and Opus on TIMIT test utterances at low and medium bitrates. The proposed system can scale up to higher bitrates to achieve near transparent performance.
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcod...
详细信息
In this work, we develop a new method for quantization in multistage audio coding. Given a (perceptual) distortion measure and a bit-rate constraint, we analytically derive the optimal rate distribution between subcoders (stages) and the corresponding optimal quantizers using high-rate theory. The analytical solutions for optimal quantizers allow a coder to easily adapt to changes in bit-rate requirements. As an illustration of the new method, we consider quantization in a two-stage sinusoidal/wave form coder that is a widely used combination in audio coding. We show that at low total rates most of the rate should be assigned to the sinusoidal (model-based, subspace) subcoder, while at high total rates most of the rate should be assigned to the waveform (full-space) subcoder. We compare the new method to a reference quantization method that does not use rate-distortion optimization. A significantly higher performance of the new method is shown by means of a listening test.
The primary motivation of the paper is to investigate waveform coding of speech signal. The paper presents a new signal analyzing tool - nonlinear discrete Fourier transform (NDFT) which has an improved signal analysi...
详细信息
ISBN:
(纸本)0780343654
The primary motivation of the paper is to investigate waveform coding of speech signal. The paper presents a new signal analyzing tool - nonlinear discrete Fourier transform (NDFT) which has an improved signal analysis performance. By virtue of the NDFT, waveform coding of the speech signal with a long segment (for ex. a segment with 512 or 1024 samples) is studied. The new coding method provides an improved performance of the speech coding at as low as 4 kbit/s, the feature of reproduced signal is kept more significant than that of the linear predictor coding.
In this paper, an encoding technique called Hi-BIN (High Band Injection), which can be combined with any narrowband coder to achieve good quality wideband speech, is described. The principle behind this technique is t...
详细信息
ISBN:
(纸本)0780362934
In this paper, an encoding technique called Hi-BIN (High Band Injection), which can be combined with any narrowband coder to achieve good quality wideband speech, is described. The principle behind this technique is to model frequencies above 4 kHz by noise with an appropriate spectral shape. This simple way of injecting synthetic noise in the higher frequencies gives surprisingly good quality when compared to very widely used computationally intensive waveform coding techniques such as CELP. We will Show that Hi-BIN offers a low bit-rate representation of the higher band and is backwards compatible with existing narrowband speech coding systems.
An iterative descent algorithm based on a Lagrangian formulation for designing vector quantizers having minimum distortion subject to an entropy constraint is discussed. These entropy-constrained vector quantizers (EC...
详细信息
An iterative descent algorithm based on a Lagrangian formulation for designing vector quantizers having minimum distortion subject to an entropy constraint is discussed. These entropy-constrained vector quantizers (ECVQs) can be used in tandem with variable-rate noiseless coding systems to provide locally optimal variable-rate block source coding with respect to a fidelity criterion. Experiments on sampled speech and on synthetic sources with memory indicate that for waveform coding at low rates (about 1 bit/sample) under the squared error distortion measure, about 1.6 dB improvement in the signal-to-noise ratio can be expected over the best scalar and lattice quantizers when block entropy-coded with block length 4. Even greater gains are made over other forms of entropy-coded vector quantizers. For pattern recognition, it is shown that the ECVQ algorithm is a generalization of the k-means and related algorithms for estimating cluster means, in that the ECVQ algorithm estimates the prior cluster probabilities as well. Experiments on multivariate Gaussian distributions show that for clustering problems involving classes with widely different priors, the ECVQ outperforms the k-means algorithm in both likelihood and probability of error.
The failures of rolling bearings usually cause the breakdown of rotating machinery. Therefore, bearing fault diagnosis is receiving more and more attentions. In this paper, a new coding-statistic feature is proposed f...
详细信息
The failures of rolling bearings usually cause the breakdown of rotating machinery. Therefore, bearing fault diagnosis is receiving more and more attentions. In this paper, a new coding-statistic feature is proposed for bearing fault diagnosis. Firstly, a waveform coding matrix (WCM) is drawn from each signal using a coding algorithm then a statistical feature is extracted from the WCM with a pre-defined dictionary. Secondly, all statistical features are processed using two-dimensional principal component analysis (2DPCA) to reduce redundant information and dimensionality. Finally, a nearest neighbor classifier (NNC) is employed to classify the bearing faults. Two bearing fault classification problems are utilized to demonstrate the effectiveness of the proposed scheme. Experimental results show that an excellent performance could be accomplished with the proposed scheme.
Neural fields, also known as coordinate-based representations, are an emerging signal representation framework. This approach has also been used to represent audio signals, but the generated audio often contains noise...
详细信息
暂无评论