New refinement schemes for voice conversion are proposed in this paper. We take mel-frequency cepstral coefficients (MFCC) as the basic feature and adopt cepstral mean subtraction to compensate the channel effects. We...
详细信息
New refinement schemes for voice conversion are proposed in this paper. We take mel-frequency cepstral coefficients (MFCC) as the basic feature and adopt cepstral mean subtraction to compensate the channel effects. We propose S/U/V (silence/unvoiced/voiced) decision rule such that two sets of codebooks are used to capture the difference between unvoiced and voiced segments of the source speaker. Moreover, we apply three schemes to refine the synthesized voice, including pitch refinement with PSOLA, energy equalization, and frame concatenation based on synchronized pitch marks. The satisfactory performance of the voice conversion system can be demonstrated through ABX listening test and MOS grade.
In this paper new theoretical expressions are derived for the reference shear LPC and for all reference NLPC related to shear stresses that allow us to characterize the influence of temperature and of doping. A simula...
详细信息
In this paper new theoretical expressions are derived for the reference shear LPC and for all reference NLPC related to shear stresses that allow us to characterize the influence of temperature and of doping. A simulator PIEZOSIM was developed where computations of primed LPC and NLPC involve reference piezoresistance coefficients related to uniaxial and/or shear stresses. This simulator seems to be a convenient tool to investigate advantages of piezoresistive elements with special orientations.
This work presents a two-tier approach through sequential application of intensity contours and formant tracks for accurate Arabic phoneme identification. The recognition system developed is based on data sets of 40 s...
详细信息
This work presents a two-tier approach through sequential application of intensity contours and formant tracks for accurate Arabic phoneme identification. The recognition system developed is based on data sets of 40 speakers for each Arabic phonetic sound. As a first step towards recognition of phonemes, the sound is sampled and then preprocessed to get formant frequencies and intensity contours. In order to automate the intensity and formant based feature extraction, a generalized regression neural network has been implemented, trained and validated on 21 input features.
This paper presents the recognition of speech commands using a modified neural-fuzzy network. To train the parameters of the network, an improved genetic algorithm is proposed. As an application example, the proposed ...
详细信息
This paper presents the recognition of speech commands using a modified neural-fuzzy network. To train the parameters of the network, an improved genetic algorithm is proposed. As an application example, the proposed speech recognition approach is implemented in an Electronic Book experimentally to illustrate the design and its merits.
In this paper we present a neural network for detection of fish, from light detection and ranging (LIDAR) data and have described a classification method for distinguishing between water-layer, bottom and fish. Four m...
详细信息
In this paper we present a neural network for detection of fish, from light detection and ranging (LIDAR) data and have described a classification method for distinguishing between water-layer, bottom and fish. Four multi-layer perceptrons (MLP) were developed for the classification purpose, where classes include fish, bottom and water-layer. The LIDAR data gives a sequence of intensity of laser backscatters obtained from laser shots at various heights above the Earth surface. The data is preprocessed to remove the high frequency noise and then a window of the sample is selected for further processing to extract features for classification purposes. We have used linear predictive coding (LPC) analysis for the feature detection purpose. The results show that the detection technique is effective and can do the required classification with a high degree of accuracy. We have tried our approach with four different MLPs and are presenting the data obtained from each of them.
A general packet loss correction/concealment signal recovery framework is proposed for parametric speech coders. Both redundancy-based forward error correction (FEC) and receiver only (RO) techniques are considered in...
详细信息
A general packet loss correction/concealment signal recovery framework is proposed for parametric speech coders. Both redundancy-based forward error correction (FEC) and receiver only (RO) techniques are considered in conjunction with the adaptive multi-rate (AMR) coder. The robust, low bit rate, high communications speech quality Manchester pitch synchronous (MPS) coder is employed in the proposed systems as a secondary coding process. Thus the performance of AMR/MPS-FEC/RO packetised speech transmission systems is considered. Subjective and objective computer simulation results clearly indicate the superiority of the proposed schemes over conventional AMR based systems, particularly at relatively high (>5%) packet loss rates.
The use of clusters in unstructured peer-to-peer systems such as Gnutella is effective for reducing traffic generated by broadcasts. Introducing semi-structured peer-to-peer systems, this paper first details the modif...
详细信息
The use of clusters in unstructured peer-to-peer systems such as Gnutella is effective for reducing traffic generated by broadcasts. Introducing semi-structured peer-to-peer systems, this paper first details the modification required for Gnutella nodes to form locality proximate clusters, which groups nodes by the clustering criterion of geographical closeness. It then presents the generic general clusters that supports multiple clustering criteria and unfixed number of clusters. Two clustering topologies are proposed for general clusters, and the characteristics of the topology formation are investigated through simulation. The use of caching in topology formation is also investigated.
When distributed generation (e.g., PV) is installed for feeder imbalance, it is difficult to maintain a proper voltage range. In such cases, a loop distribution system has an advantage at voltage fluctuation. A loop p...
详细信息
When distributed generation (e.g., PV) is installed for feeder imbalance, it is difficult to maintain a proper voltage range. In such cases, a loop distribution system has an advantage at voltage fluctuation. A loop power flow controller (LPC) may be expected to control loop distribution systems without any increase in short-circuit current. In this paper, we describe the relationship between loop power flow control and voltage characteristics for distributed generation, and propose a simple control method using local voltage information. The result of our simulation shows that the proposed control method for the LPC balances the power flow and regulates the voltage with stable operation.
The residual excited linear prediction (RELP) vocoder is a speech codec of good voice quality and a moderate bit rate of 9.6 Kbps for digital communication. However, the RELP does not positively utilize the parameteri...
详细信息
The residual excited linear prediction (RELP) vocoder is a speech codec of good voice quality and a moderate bit rate of 9.6 Kbps for digital communication. However, the RELP does not positively utilize the parameterized speech information to identify speech contents and to determine what word was spoken. This paper proposes a method to classify the vowels in human speech by the RELP vocoder. The method analyzes the frequency response of the LPC filter whose parameters are obtained from a segment of the vowel contained in speech signal. Using the average vectors of frequency response with respect to each vowel sound, mutual Euclidian distances among the vowels were studied. Observed clear separation among the vowels suggests added capability of speech recognition to the RELP vocoder.
Feature extraction from speech representation is one of the processes in speech recognition. Parametric modeling is a dominant approach to model speech signals. Within a localized interval, speech representation is eq...
详细信息
Feature extraction from speech representation is one of the processes in speech recognition. Parametric modeling is a dominant approach to model speech signals. Within a localized interval, speech representation is equivalent to a noise driven output from an all-pole system that can be estimated using linear prediction. Besides the characteristics of speech, temporal variability of speech signal model is also due to the computation of linear prediction coefficients. Thus, an alternative representation is proposed based on the Gabor coefficients. In this paper, a comparison is made with the linear prediction coefficients to show the consistency of the parameters that are generated for implementation in the speech recognition system.
暂无评论