In this paper, we study the estimation and inference for a class of semiparametric mixtures of partially linear models. We prove that the proposed models are identifiable under mild conditions, and then give a PL-em a...
详细信息
In this paper, we study the estimation and inference for a class of semiparametric mixtures of partially linear models. We prove that the proposed models are identifiable under mild conditions, and then give a PL-em algorithm estimation procedure based on profile likelihood. The asymptotic properties for the resulting estimators and the ascent property of the PL-em algorithm are investigated. Furthermore, we develop a test statistic for testing whether the non parametric component has a linear structure. Monte Carlo simulations and a real data application highlight the interest of the proposed procedures.
Global maps of total-column carbon dioxide (CO2) mole fraction (in units of parts per million) are important tools for climate research since they provide insights into the spatial distribution of carbon intake and em...
详细信息
Global maps of total-column carbon dioxide (CO2) mole fraction (in units of parts per million) are important tools for climate research since they provide insights into the spatial distribution of carbon intake and emissions as well as their seasonal and annual evolutions. Currently, two main remote sensing instruments for total-column CO2 are the Orbiting Carbon Observatory-2 (OCO-2) and the Greenhouse gases Observing SATellite (GOSAT), both of which produce estimates of CO2 concentration, called profiles, at 20 different pressure levels. Operationally, each profile estimate is then convolved into a single estimate of column-averaged CO2 using a linear pressure weighting function. This total-column CO2 is then used for subsequent analyses such as Level 3 map generation and colocation for validation. In principle, total-column CO2 in these applications may be more efficiently estimated by making optimal estimates of the vector-valued CO2 profiles and applying the pressure weighting function afterwards. These estimates will be more efficient if there is multivariate dependence between CO2 values in the profile. In this article, we describe a methodology that uses a modified Spatial Random Effects model to account for the multivariate nature of the data fusion of OCO-2 and GOSAT. We show that multivariate fusion of the profiles has improved mean squared error relative to scalar fusion of the column-averaged CO2 values from OCO-2 and GOSAT. The computations scale linearly with the number of data points, making it suitable for the typically massive remote sensing datasets. Furthermore, the methodology properly accounts for differences in instrument footprint, measurement-error characteristics, and data coverages.
Cutaneous melanoma is thought to be triggered by intense, occasional exposure to ultraviolet radiation, either from the sun or tanning beds, especially in people who are genetically predisposed to the disease. When sk...
详细信息
Cutaneous melanoma is thought to be triggered by intense, occasional exposure to ultraviolet radiation, either from the sun or tanning beds, especially in people who are genetically predisposed to the disease. When skin cells are damaged by ultraviolet light in this way, often showing up as a sunburn, they are more prone to genetic defects that cause them to rapidly multiply and form potentially fatal (malignant) tumors. Melanoma originates in a type of skin cell called a melanocyte, such cells help produce the pigments of our skin, hair, and eyes. We propose a new cure rate survival regression model for predicting cutaneous melanoma. We assume that the unknown number of competing causes that can influence the survival time is governed by a power series distribution and that the time until the tumor cells are activated follows the Pareto IV distribution. The parameter estimation is based on the em algorithm which for this model can be implemented in a simple way in computational terms. Simulation studies are presented, showing the good performance of the proposed estimation procedure. Finally, two real applications related to a cutaneous melanoma and melanoma data sets are presented.
The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimatin...
详细信息
The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimating an HMM from its observation process is often addressed via the Baum-Welch algorithm, which is known to be susceptible to local optima. In this paper, we first give a general characterization of the basin of attraction associated with any global optimum of the population likelihood. By exploiting this characterization, we provide non-asymptotic finite sample guarantees on the Baum-Welch updates and show geometric convergence to a small ball of radius on the order of the minimax rate around a global optimum. As a concrete example, we prove a linear rate of convergence for a hidden Markov mixture of two isotropic Gaussians given a suitable mean separation and an initialization within a ball of large radius around (one of) the true parameters. To our knowledge, these are the first rigorous local convergence guarantees to global optima for the Baum-Welch algorithm in a setting where the likelihood function is nonconvex. We complement our theoretical results with thorough numerical simulations studying the convergence of the Baum-Welch algorithm and illustrating the accuracy of our predictions.
We consider the problem of estimating the lifetime distributions of survival times subject to a general censoring scheme called "middle censoring". The lifetimes are assumed to follow a parametric family of ...
详细信息
We consider the problem of estimating the lifetime distributions of survival times subject to a general censoring scheme called "middle censoring". The lifetimes are assumed to follow a parametric family of distributions, such as the Gamma or Weibull distributions, and is applied to cases when the lifetimes come with covariates affecting them. For any individual in the sample, there is an independent, random, censoring interval. We will observe the actual lifetime if the lifetime falls outside of this censoring interval, otherwise we only observe the interval of censoring. This censoring mechanism, which includes both right-and left censoring, has been called "middle censoring"(see Jammalamadaka and Mangalam, 2003). Maximum-likelihood estimation of the parameters as well as their large-sample properties are studied under this censoring scheme, including the case when covariates are available. We conclude with an application to a dataset from Environmental Economics dealing with ContingentValuation of natural resources.
This article considers inference for the log-normal distribution based on progressive Type I interval censored data by both frequentist and Bayesian methods. First, the maximum likelihood estimates (MLEs) of the unkno...
详细信息
This article considers inference for the log-normal distribution based on progressive Type I interval censored data by both frequentist and Bayesian methods. First, the maximum likelihood estimates (MLEs) of the unknown model parameters are computed by expectation-maximization (em) algorithm. The asymptotic standard errors (ASEs) of the MLEs are obtained by applying the missing information principle. Next, the Bayes' estimates of the model parameters are obtained by Gibbs sampling method under both symmetric and asymmetric loss functions. The Gibbs sampling scheme is facilitated by adopting a similar data augmentation scheme as in em algorithm. The performance of the MLEs and various Bayesian point estimates is judged via a simulation study. A real dataset is analyzed for the purpose of illustration.
Allopolyploids are a group of polyploids with more than two sets of chromosomes derived from different species. Previous linkage analysis of allopolyploids is based on the assumption that different chromosomes pair ra...
详细信息
Allopolyploids are a group of polyploids with more than two sets of chromosomes derived from different species. Previous linkage analysis of allopolyploids is based on the assumption that different chromosomes pair randomly during meiosis. A more sophisticated model to relax this assumption has been developed for allotetraploids by incorporating the preferential pairing behavior of homologous over homoeologous chromosomes. Here, we show that the basic principle of this model can be extended to perform linkage analysis of higher-ploidy allohexaploids, where multiple preferential pairing factors are used to characterize chromosomal-pairing meiotic features between different constituent species. We implemented the extended model into an R package, called AlloMap6, allowing the recombination fractions and preferential pairing factors to be estimated simultaneously. Allomap6 has two major functionalities, computer simulation and real-data analysis. By analyzing a real data from a full-sib family of allohexaploid persimmon, we tested and validated the usefulness and utility of this package. AlloMap6 lays a foundation for allohexaploid genetic mapping and provides a new horizon to explore the chromosomal kinship of allohexaploids.
In this paper it is shown that orthogonal deviation increases the ranging error of navigation signals, and a method for the estimation of orthogonal and phase deviation of the I and Q components of a constant-envelope...
详细信息
In this paper it is shown that orthogonal deviation increases the ranging error of navigation signals, and a method for the estimation of orthogonal and phase deviation of the I and Q components of a constant-envelope signal is proposed. A measurement process is introduced and the measurement accuracy for different signal-to-noise ratios and data lengths is provided. Moreover, in the process of evaluation, a very important conclusion is made: when a digital signal is filtered, the filter bandwidth does not affect measurement accuracy. The corresponding proof is given. The accuracy of the proposed method and measurement is verified by simulations. In simulations, an expectation-maximum (em) algorithm was used to estimate the constellation coordinates, and the shortcomings of the em algorithm in high-precision parameter estimation were determined and the corresponding corrections made. Finally, the proposed method was used to estimate both orthogonal deviation and phase deviation of a real satellite signal.
Microarray technologies and related methods coupled with appropriate mathematical and statistical models have made it possible to identify dynamic regulatory networks by measuring time course expression levels of many...
详细信息
Microarray technologies and related methods coupled with appropriate mathematical and statistical models have made it possible to identify dynamic regulatory networks by measuring time course expression levels of many genes simultaneously. However one of the challenges is the high-dimensional nature of such data coupled with the fact that these gene expression data are known not to include various biological process. As genomic interactions are highly structured, the aim was to derive a method for inferring a sparse dynamic network in a high dimensional data setting. The paper assumes that the observations are noisy measurements of gene expression in the form of mRNAs, whose dynamics can be described by some partially observed process.
Voiceprint is an important component of creating a user portrait. Voiceprint Recognition can determine user's identification. However, speech signals in the customer service system are processed by encoded with co...
详细信息
Voiceprint is an important component of creating a user portrait. Voiceprint Recognition can determine user's identification. However, speech signals in the customer service system are processed by encoded with compression for effective transmission and storage. The low-bit rate codec results that the performance of Voiceprint Recognition system dramatically reduces. What is more, the speech number of each customer is not adequate. In order to solve the problem, this paper proposes a model compensation method. The method uses a test utterance with expectation maximization (em) algorithm to estimate the distortion model and the UBM is adjusted to match the codec type of the test utterance. Voiceprint Recognition experiments are conducted. The results show that the proposed method is able to dramatically improve the performance of the system.
暂无评论