Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semi-tied covariance hidden Markov models (HMMs) and sy...
详细信息
Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semi-tied covariance hidden Markov models (HMMs) and systems based on factor analysis. All these schemes have been shown to improve the speech recognition performance without dramatically increasing the number of model parameters compared to standard diagonal covariance Gaussian mixture HMMs. This paper introduces a general form of acoustic model, the factor analysed HMM. A variety of configurations of this model and parameter sharing schemes, some of which correspond to standard systems, were examined. An EM algorithm for the parameter optimisation is presented along with a number of methods to increase the efficiency of training. The performance of FAHMMs on medium to large vocabulary continuous speech recognition tasks was investigated. The experiments show that without elaborate complexity control an equivalent or better performance compared to a standard diagonal covariance Gaussian mixture HMM system can be achieved with considerably fewer parameters. (C) 2003 Elsevier Ltd. All rights reserved.
Test of independence for 2 x 2 contingency tables with nonignorable nonresponses is discussed. Dependency assumption between two observed outcomes is required to achieve identification in many models with nonignorable...
详细信息
Test of independence for 2 x 2 contingency tables with nonignorable nonresponses is discussed. Dependency assumption between two observed outcomes is required to achieve identification in many models with nonignorable nonresponses in the analysis of 2 x 2 contingency tables (e.g., [Ma, W.-Q., Geng, Z., Li, X.-T., 2003. Identification of nonresponse mechanisms for two-way contingency tables. Behaviormetrika 30, 125-144]). The assumption is, however, violated under the null hypothesis when implementing the test of independence. In this article, we introduce a new simple assumption to achieve identification. The assumption involves pre-specified parameters. EM algorithms for finding the MLE are numerically unstable when there are nonlinear constraints, which are created by models treating nonignorable nonresponses. In the analysis of contingency tables, estimated values often fall outside the admissible region. We propose a new EM type algorithm to stably calculate the constrained MLE, and apply it to make the test of independence for a real data set (crime data). We compare empirical performance among several testing procedures for independence. It turns out that the new EM type algorithm works well to calculate the MLE, and that the nonignorable model with the correctly specified parameters performs best while the conventional chi-square test of independence works fairly well. (c) 2008 Elsevier B.V. All rights reserved.
A vital research concern for a personalised recommender system is to target items in the long tail. Studies have shown that sales of the e-commerce platform possess a long-tail character, and niche items in the long t...
详细信息
A vital research concern for a personalised recommender system is to target items in the long tail. Studies have shown that sales of the e-commerce platform possess a long-tail character, and niche items in the long tail are challenging to be involved in the recommendation list. Since niche items are defined by the niche market, which is a small market segment, traditional recommendation algorithms focused more on popular items promotion and they do not apply to the niche market. In this article, we aim to find the best users for each niche item and proposed a topic-based hierarchical Bayesian linear regression model for niche item recommendation. We first identify niche items and build niche item subgroups based on descriptive information of items. Moreover, we learn a hierarchical Bayesian linear regression model for each niche item subgroup. Finally, we predict the relevance between users and niche items to provide recommendations. We perform a series of validation experiments on Yahoo Movies dataset and compare the performance of our approach with a set of representative baseline recommender algorithms. The result demonstrates the superior performance of our recommendation approach for niche items.
The performance of both autofocusing and imaging resolution degrades using the traditional autofocusing and range-Doppler algorithm for bistatic inverse synthetic aperture radar (Bi-ISAR) with sparse apertures. A Bi-I...
详细信息
The performance of both autofocusing and imaging resolution degrades using the traditional autofocusing and range-Doppler algorithm for bistatic inverse synthetic aperture radar (Bi-ISAR) with sparse apertures. A Bi-ISAR sparse imaging algorithm based on complex Gaussian scale mixture (CGSM) prior is proposed to jointly achieve the high-resolution imaging and autofocusing. First, a sparse basis matrix with the time-varying bistatic angle is constructed to represent the sparse echo data and the Bi-ISAR joint with autofocusing imaging model is established based on compressed sensing from sparse apertures. Second, the elements of the target image and the noise are assumed to be a CGSM prior with Gaussian distribution, respectively. Finally, the sparse image reconstruction and phase autofocusing are accomplished by the variational Bayesian expectationmaximisation method. The proposed algorithm with the full Bayesian inference can obtain a well-focused image without manual adjustments of regularisation parameters. Meanwhile, it can avoid the local minimum and structural errors, due to utilising the statistical information of a posterior. Simulated results of electromagnetic numerical data verify the superiority of the algorithm in autofocusing, sparse imaging and noise suppression performance.
Remote sensing images are widely used for different areas from mineral exploration to agricultural applications and poor quality of hyperspectral (HS) images will directly have adverse effect on these applications. In...
详细信息
Remote sensing images are widely used for different areas from mineral exploration to agricultural applications and poor quality of hyperspectral (HS) images will directly have adverse effect on these applications. In this study, a method is proposed to restore degraded HS images. To achieve this aim, another multispectral (MS) observation of the same scene is supposed to be available and restoration is fulfilled by fusion of HS images and MS images. The proposed method gains maximum a posteriori estimation and is based on expectationmaximisationalgorithm. Deblurring and denoising are performed separately. Deblurring is done in spatial domain via non-overlapping blocks, whereas denoising is implemented in wavelet domain. To represent the coefficients in wavelet domain, instead of multinormal model, Gaussian scale mixture is exploited. The proposed method is validated on airborne visible/infrared imaging spectrometer (AVIRIS) and HS digital imagery collection experiment (HYDICE) databases and experimental results signify that the proposed method outperforms state-of-the-art techniques cited in the literature and signal-to-noise ratio is improved as much as 15.71dB for Moffett database and 16.26dB for HYDICE database.
This study addresses the achievable rate of single cell and sum rate of multi-cell orthogonal frequency division multiplexing (OFDM) index modulation (IM). The single-cell achievable rate of OFDM-IM with Gaussian inpu...
详细信息
This study addresses the achievable rate of single cell and sum rate of multi-cell orthogonal frequency division multiplexing (OFDM) index modulation (IM). The single-cell achievable rate of OFDM-IM with Gaussian input is calculated using a multi-ary symmetric channel. Then, the cumulative distribution function of multi-cell OFDM-IM is investigated by stochastic geometry. Furthermore, it is proved in this study that the probability density function of noise plus inter-cell interference in multicell OFDM-IM with quadrature-amplitude modulation follows a mixture of Gaussians (MoGs) distribution. Next, parameters of the MoG distribution are estimated using a simplified expectationmaximisationalgorithm. Upper bound of sum rates of multi-cell OFDM-IM is derived. Furthermore, analytic and simulated results are compared and discussed.
This study proposes a novel data-based approach for estimating the parameters of a stochastic hybrid model describing the traffic flow in an urban traffic network with signalized intersections. The model represents th...
详细信息
This study proposes a novel data-based approach for estimating the parameters of a stochastic hybrid model describing the traffic flow in an urban traffic network with signalized intersections. The model represents the evolution of the traffic flow rate, measuring the number of vehicles passing a given location per time unit. This traffic flow rate is described using a mode-dependent first-order autoregressive (AR) stochastic process. The parameters of the AR process take different values depending on the mode of traffic operation - free flowing, congested or faulty - making this a hybrid stochastic process. Mode switching occurs according to a first-order Markov chain. This study proposes an expectation-maximization (EM) technique for estimating the transition matrix of this Markovian mode process and the parameters of the AR models for each mode. The technique is applied to actual traffic flow data from the city of Jakarta, Indonesia. The model thus obtained is validated by using the smoothed inference algorithms and an online particle filter. The authors also develop an EM parameter estimation that, in combination with a time-window shift technique, can be useful and practical for periodically updating the parameters of hybrid model leading to an adaptive traffic flow state estimator.
Head pose estimation is a key task for visual surveillance, HCI and face recognition applications. In this paper, a new approach is proposed for estimating 3D head pose from a monocular image. The approach assumes the...
详细信息
Head pose estimation is a key task for visual surveillance, HCI and face recognition applications. In this paper, a new approach is proposed for estimating 3D head pose from a monocular image. The approach assumes the full perspective projection camera model. Our approach employs general prior knowledge of face structure and the corresponding geometrical constraints provided by the location of a certain vanishing point to determine the pose of human faces. To achieve this, eye-lines, formed from the far and near eye corners, and mouth-line of the mouth corners are assumed parallel in 3D space. Then the vanishing point of these parallel lines found by the intersection of the eye-line and mouth-line in the image can be used to infer the 3D orientation and location of the human face. In order to deal with the variance of the facial model parameters, e.g. ratio between the eye-line and the mouth-line, an EM framework is applied to update the parameters. We first compute the 3D pose using some initially learnt parameters (such as ratio and length) and then adapt the parameters statistically for individual persons and their facial expressions by minimizing the residual errors between the projection of the model features points and the actual features on the image. In doing so, we assume every facial feature point can be associated to each of features points in 3D model with some a posteriori probability. The expectation step of the EM algorithm provides an iterative framework for computing the a posterori probabilities using Gaussian mixtures defined over the parameters. The robustness analysis of the algorithm on synthetic data and some real images with known ground-truth are included. (C) 2006 Elsevier B.V. All rights reserved.
Multimodal image-to-image translation is a class of vision and graphics problems where the goal is to learn a one-to-many mapping between the source domain and target domain. Given an image in the source domain, the m...
详细信息
Multimodal image-to-image translation is a class of vision and graphics problems where the goal is to learn a one-to-many mapping between the source domain and target domain. Given an image in the source domain, the model aims to produce as many diverse results as possible. It is an important and challenging problem in the task of image translation. To this end, recent works utilise Gaussian vectors to produce diverse results but with a small difference. It is because of the special probabilistic nature of Gaussian distribution. In this work, the authors propose linearly distributed latent codes instead of conventional Gaussian vectors, which control the style of generated images. Taking advantage of linear distribution, their model can produce much more diverse results and outperform the state-of-the-art baselines in terms of diversity. Qualitative and quantitative comparisons against baselines demonstrate the effectiveness and superiority of their method.
The two-dimensional (2D) geometry of extended objects is modelled as a discrete constellation of a small number of scattering centres. Each scattering centre is characterised by its 2D location and extension, as well ...
详细信息
The two-dimensional (2D) geometry of extended objects is modelled as a discrete constellation of a small number of scattering centres. Each scattering centre is characterised by its 2D location and extension, as well as by its observability from aspect angles spanning a wide angular interval. The considered input information of the modelling consists of high range resolution (HRR) data sets, which are measured by a network of scanning surveillance radars from a limited set of aspect angles of the object. During the short time-on-target of the antennas within each radar scan, inverse synthetic aperture radar (ISAR) imaging is performed. The expectation-maximisation (ER) algorithm is applied in order to fit each ISAR image to a Gaussian mixture model (GMM), pertaining to the aspect angle applicable to the respective radar. A merging algorithm for clustering all the partial GMMs into a unique 2D model of the extended object is designed. Cluster validation criteria are applied for correct alignment of the multi-radar multi-scan object model estimates. A Mixture of Gaussians classifier (MOGC), which is based on representation of each object by the estimated 2D model, is discussed. Results from application of the algorithms to two measured HRR radar data sets are presented.
暂无评论