Previously, a method was proposed for calculating a reconstructed coefficient of determination in the case of right-censored regression using the expectation-maximization (em) algorithm. This measure is assessed via s...
详细信息
Previously, a method was proposed for calculating a reconstructed coefficient of determination in the case of right-censored regression using the expectation-maximization (em) algorithm. This measure is assessed via simulation study for the purpose of evaluating the utility of model fit. Further, several reconstructed adjusted coefficients of determination are proposed and compared via simulation study for the purpose of model selection. The application of these proposed measures is illustrated on a real dataset.
In this article, a non-iterative posterior sampling algorithm for linear quantile regression model based on the asymmetric Laplace distribution is proposed. The algorithm combines the inverse Bayes formulae, sampling/...
详细信息
In this article, a non-iterative posterior sampling algorithm for linear quantile regression model based on the asymmetric Laplace distribution is proposed. The algorithm combines the inverse Bayes formulae, sampling/importance resampling, and the expectation maximization algorithm to obtain independently and identically distributed samples approximately from the observed posterior distribution, which eliminates the convergence problems in the iterative Gibbs sampling and overcomes the difficulty in evaluating the standard deviance in the em algorithm. The numeric results in simulations and application to the classical Engel data show that the non-iterative sampling algorithm is more effective than the Gibbs sampling and em algorithm.
Train integrity whilst in service establishes the foundation for railway safety. This study investigates train integrity detection which reliably deduces whether the train consists remain intact. A switching linear dy...
详细信息
Train integrity whilst in service establishes the foundation for railway safety. This study investigates train integrity detection which reliably deduces whether the train consists remain intact. A switching linear dynamic system (SLDS) based train integrity detection method is proposed for Global Navigation Satellite System (GNSS) based train integrity Monitoring System (TIMS) using the relative distance, velocity and acceleration of the locomotive and the last van. There, Expectation Maximisation (em) algorithm estimates the parameters of SLDS model while the Gaussian Sum Filter infers train integrity state. After that, to cope with false detection and misdetection, a verification procedure and train parting time estimation are designed. The approach is evaluated with both field trials and simulated data. Results show that the false alarm rate and misdetection rate of SLDS-based integrity detection approach are 0 and 0.09% respectively, which proves better than the estimated train length based detection model and Hidden Markov Model (HMM).
In order to effectively extract the hidden information from the patent texts and to further provide this information to support the product innovation design process, this paper proposed an automatic patent classifica...
详细信息
In order to effectively extract the hidden information from the patent texts and to further provide this information to support the product innovation design process, this paper proposed an automatic patent classification method based on the functional basis and Naive Bayes theory. The functions of products are regarded as the innovation attributes, and the function co-reference relations of the patents in different areas are established. Patent classification methods are proposed based on the functions of products and the general steps of the patent classification process are proposed. In addition, three training methods are studied in the experiments, including multi-classification fully supervised training, multiple dichotomous supervised training and semi-supervised training. Through comparing and analyzing the experimental results, a patent text classifier is developed. In summary, this paper provides a general idea and the relevant technologies on how to build a patent knowledge space by automatically extracting and expanding the patent texts. (C) 2017 Published by Elsevier Ltd.
Common diseases including cancer are heterogeneous. It is important to discover disease subtypes and identify both shared and unique risk factors for different disease subtypes. The advent of high-throughput technolog...
详细信息
Common diseases including cancer are heterogeneous. It is important to discover disease subtypes and identify both shared and unique risk factors for different disease subtypes. The advent of high-throughput technologies enriches the data to achieve this goal, if necessary statistical methods are developed. Existing methods can accommodate both heterogeneity identification and variable selection under parametric models, but for survival analysis, the commonly used Cox model is semiparametric. Although finite-mixture Cox model has been proposed to address heterogeneity in survival analysis, variable selection has not been incorporated into such semiparametric models. Using regularization regression, we propose a variable selection method for the finite-mixture Cox model and select important, subtype-specific risk factors from high-dimensional predictors. Our estimators have oracle properties with proper choices of penalty parameters under the regularization regression. An expectation-maximization algorithm is developed for numerical calculation. Simulations demonstrate that our proposed method performs well in revealing the heterogeneity and selecting important risk factors for each subtype, and its performance is compared to alternatives with other regularizers. Finally, we apply our method to analyze a gene expression dataset for ovarian cancer DNA repair pathways. Based on our selected risk factors, the prognosis model accounting for heterogeneity consistently improves the prediction for the survival probability in both training and test datasets.
In this paper, a novel active contours method, which combines with the Student's-t mixture model via Expectaton-Maximizaton (em) algorithm, is proposed to segment complicated two-phase images. Firstly, we rewrite ...
详细信息
In this paper, a novel active contours method, which combines with the Student's-t mixture model via Expectaton-Maximizaton (em) algorithm, is proposed to segment complicated two-phase images. Firstly, we rewrite the cost function and derive a novel updating of level set function based on probabilistic principles. Secondly, we put forward two novel geometric priors from the level-set-based curve evolution;and both of them have advantages, the suitable one is selected by personalized need to obtain level set function in em framework with the aim of reducing the computational cost. Therefore, the level set function is derived from latent variables and served as a feedback to the estimation of the latent variables in next iteration. Finally, in order to enhance the robustness to the outliers, Student's-t mixture model with heavy tail has been applied in our algorithm. Experimental results obtained by employing the proposed method on many synthetic, medical and real-world images to demonstrate its robustness, accuracy and effectiveness. (C) 2016 Elsevier Ltd. All rights reserved.
With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing in...
详细信息
With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods-a way in which one can represent heterogeneous data sets into a single form: as kernel matrices. However, among the many data completion techniques available in the literature, studies about mutually completing several incomplete kernel matrices have not been given much attention yet. In this paper, we present a new method, called Mutual Kernel Matrix Completion (MKMC) algorithm, that tackles this problem of mutually inferring the missing entries of multiple kernel matrices by combining the notions of data fusion and kernel matrix completion, applied on biological data sets to be used for classification task. We first introduced an objective function that will be minimized by exploiting the em algorithm, which in turn results to an estimate of the missing entries of the kernel matrices involved. The completed kernel matrices are then combined to produce a model matrix that can be used to further improve the obtained estimates. An interesting result of our study is that the E-step and the M-step are given in closed form, which makes our algorithm efficient in terms of time and memory. After completion, the ( completed) kernel matrices are then used to train an SVM classifier to test how well the relationships among the entries are preserved. Our empirical results show that the proposed algorithm bested the traditional completion techniques in preserving the relationships among the data points, and in accurately recovering the missing kernel matrix entries. By far, MKMC offers a promising solution to the problem of mutual estimation of a number of relevant incomplete kernel matrices.
Effectively solving the label switching problem is critical for both Bayesian and Frequentist mixture model analyses. In this article, a new relabeling method is proposed by extending a recently developed modal cluste...
详细信息
Effectively solving the label switching problem is critical for both Bayesian and Frequentist mixture model analyses. In this article, a new relabeling method is proposed by extending a recently developed modal clustering algorithm. First, the posterior distribution is estimated by a kernel density from permuted MCMC or bootstrap samples of parameters. Second, a modal em algorithm is used to find the m! symmetric modes of the KDE. Finally, samples that ascend to the same mode are assigned the same label. Simulations and real data applications demonstrate that the new method provides more accurate estimates than many existing relabeling methods.
In this article, the profile maximal likelihood estimate (PMLE) is proposed for non linear mixed models (NLMMs) with longitudinal data where the variance components are estimated by the expectation-maximization (em) a...
详细信息
In this article, the profile maximal likelihood estimate (PMLE) is proposed for non linear mixed models (NLMMs) with longitudinal data where the variance components are estimated by the expectation-maximization (em) algorithm. Strong consistency and the asymptotic normality of the estimators are derived. A simulation study is conducted where the performance of the PLME and the Fishing scoring estimate (FSE) in literatures are compared. Moreover, a real data is also analyzed to investigate the empirical performance of the procedure.
Estimation of link delay densities in a computer network, from source-destination delay measurements, is of great importance in analyzing and improving the operation of the network. In this paper, we develop a general...
详细信息
Estimation of link delay densities in a computer network, from source-destination delay measurements, is of great importance in analyzing and improving the operation of the network. In this paper, we develop a general approach for estimating the density of the delay in any link of the network, based on continuous-time bivariate Markov chain modeling. The proposed approach also provides the estimates of the packet routing probability at each node, and the probability of each source-destination path in the network. In this approach, the states of one process of the bivariate Markov chain are associated with nodes of the network, while the other process serves as an underlying process that affects statistical properties of the node process. The node process is not Markov, and the sojourn time in each of its states is phase-type. Phase-type densities are dense in the set of densities with non-negative support. Hence, they can be used to approximate arbitrarily well any sojourn time distribution. Furthermore, the class of phase-type densities is closed under convolution and mixture operations. We adopt the expectation-maximization (em) algorithm of Asmussen, Nerman, and Olsson for estimating the parameter of the bivariate Markov chain. We demonstrate the performance of the approach in a numerical study.
暂无评论