Due to a lack of a gold standard objective marker, the current practice for diagnosing a neurological disorder is mostly based on clinical symptoms, which may occur in the late stage of the disease. Clinical diagnosis...
详细信息
Due to a lack of a gold standard objective marker, the current practice for diagnosing a neurological disorder is mostly based on clinical symptoms, which may occur in the late stage of the disease. Clinical diagnosis is also subject to high variance due to between- and within-subject variability of patient symptomatology and between-clinician variability. Effectively modeling disease course and making early prediction using biomarkers and subtle clinical signs are critical and challenging both for improving diagnostic accuracy and designing preventive clinical trials for neurological disorders. Leveraging the domain knowledge that certain biological characteristics (ie, causal genetic mutation) is part of the disease mechanism, and certain markers (eg, neuroimaging measures, motor and cognitive ability measures) reflect pathological process, we propose a nonlinear model with random inflection points depending on subject-specific characteristics to jointly estimate the changing trajectories of the markers in the same disease domain. The model scales different markers into comparable progression curves with a temporal order based on the mean inflection point and establishes the relationship between the progression of markers with the underlying disease mechanism. The model also assesses how subject-specific characteristics affect the dynamic trajectory of different markers, which offers information on designing preventive therapeutics and personalized disease management strategy. We perform extensive simulation studies and apply our method to markers in neuroimaging, cognitive, and motor domains of Huntington's disease using the data collected from a large multisite natural history study of Huntington's disease, where we assess the temporal ordering of disease impairment between domains. We show that atrophy from certain brain area occurs first, followed by motor and cognitive domain, and show that an average patient has already experienced substantial regional brain
In this paper, we discuss the parameter estimation for the generalized gamma distribution based on left-truncated and right-censored data. A stochastic version of the expectation-maximization (em) algorithm is propose...
详细信息
In this paper, we discuss the parameter estimation for the generalized gamma distribution based on left-truncated and right-censored data. A stochastic version of the expectation-maximization (em) algorithm is proposed as an alternative method to compute approximate maximum likelihood estimates. Two different methods to obtain reliable initial estimates of the parameters required for the iterative algorithms are also proposed. Interval estimation based on a parametric bootstrap method is discussed. The proposed methodologies are illustrated with a numerical example. Then, a Monte Carlo simulation study is used to evaluate the performance of the proposed estimation procedures and to compare with the direct optimization method and the conventional em algorithm. Based on the simulation results, we show that the proposed stochastic em algorithm is a useful alternative estimation method for the model fitting of the generalized gamma distribution.
For analyzing current status data, a flexible partially linear proportional hazards model is proposed. Modeling flexibility is attained through using monotone splines to approximate the baseline cumulative hazard func...
详细信息
For analyzing current status data, a flexible partially linear proportional hazards model is proposed. Modeling flexibility is attained through using monotone splines to approximate the baseline cumulative hazard function, as well as B-splines to accommodate nonlinear covariate effects. To facilitate model fitting, a computationally efficient and easy to implement expectation-maximization algorithm is developed through a two-stage data augmentation process involving carefully structured latent Poisson random variables. Asymptotic normality and the efficiency of the spline estimator of the regression coefficients are established, and the spline estimators of the nonparametric components are shown to possess the optimal rate of convergence under suitable regularity conditions. The finite-sample performance of the proposed approach is evaluated through Monte Carlo simulation and it is further illustrated using uterine fibroid data arising from a prospective cohort study on early pregnancy.
We consider the FASST framework for audio source separation, which models the sources by full-rank spatial covariance matrices and multilevel nonnegative matrix factorization (NMF) spectra. The computational cost of t...
详细信息
ISBN:
(纸本)9781479936878
We consider the FASST framework for audio source separation, which models the sources by full-rank spatial covariance matrices and multilevel nonnegative matrix factorization (NMF) spectra. The computational cost of the expectation-maximization (em) algorithm in [1] greatly increases with the number of channels. We present alternative em updates using discrete hidden variables which exhibit a smaller cost. We evaluate the results on mixtures of speech and real-world environmental noise taken from our DemAND database. The proposed algorithm is several orders of magnitude faster and it provides better separation quality for two-channel mixtures in low input signal-to-noise ratio (iSNR) conditions.
Common diseases including cancer are heterogeneous. It is important to discover disease subtypes and identify both shared and unique risk factors for different disease subtypes. The advent of high-throughput technolog...
详细信息
Common diseases including cancer are heterogeneous. It is important to discover disease subtypes and identify both shared and unique risk factors for different disease subtypes. The advent of high-throughput technologies enriches the data to achieve this goal, if necessary statistical methods are developed. Existing methods can accommodate both heterogeneity identification and variable selection under parametric models, but for survival analysis, the commonly used Cox model is semiparametric. Although finite-mixture Cox model has been proposed to address heterogeneity in survival analysis, variable selection has not been incorporated into such semiparametric models. Using regularization regression, we propose a variable selection method for the finite-mixture Cox model and select important, subtype-specific risk factors from high-dimensional predictors. Our estimators have oracle properties with proper choices of penalty parameters under the regularization regression. An expectation-maximization algorithm is developed for numerical calculation. Simulations demonstrate that our proposed method performs well in revealing the heterogeneity and selecting important risk factors for each subtype, and its performance is compared to alternatives with other regularizers. Finally, we apply our method to analyze a gene expression dataset for ovarian cancer DNA repair pathways. Based on our selected risk factors, the prognosis model accounting for heterogeneity consistently improves the prediction for the survival probability in both training and test datasets.
In many chemical industries, a production line usually produces various products with different grades to meet the demands of the worldwide market. A process with multiple grades is not suitable to be described using ...
详细信息
In many chemical industries, a production line usually produces various products with different grades to meet the demands of the worldwide market. A process with multiple grades is not suitable to be described using a traditional single model. In this paper, a multi-grade principal component analysis (MGPCA) model is proposed for multi-grade process modeling and fault detection purposes. The proposed MGPCA can use the measurements from different grades with unequal sizes and to extract the essential information from the multi-grade process. The model is derived in a probabilistic framework and the corresponding parameters are estimated by the expectation-maximization algorithm. Finally, a simulated case and a real industrial polyethylene process with multiple grades are tested to evaluate the property of the proposed method.
This article studies the dependence of spatial linear models using a slash distribution with a finite second moment. The parameters of the model are estimated with maximum likelihood by using the em algorithm. To avoi...
详细信息
This article studies the dependence of spatial linear models using a slash distribution with a finite second moment. The parameters of the model are estimated with maximum likelihood by using the em algorithm. To avoid identifiability problems, the cross-validation, the Trace and the maximum log-likelihood value are used to choose the parameter for adjusting the kurtosis of the slash distribution and the selection of the model to explain the spatial dependence. We present diagnostic techniques of global and local influences for exploring the sensibility of estimators and the presence of possible influential observations. A simulation study is developed to determine the performance of the methodology. The results showed the effectiveness of the choice criteria of the parameter for adjusting the kurtosis and for the selection of the spatial dependence model. It has also showed that the slash distribution provides an increased robustness to the presence of influential observations. As an illustration, the proposed model and its diagnostics are used to analyze an aquifer data. The spatial prediction with and without the influential observations were compared. The results show that the contours of the interpolation maps and prediction standard error maps showed low changes when we removed the influential observations. Thus, this model is a robust alternative in the spatial linear modeling for dependent random variables. Supplementary materials accompanying this paper appear online.
Nowadays, online product reviews play a crucial role in the purchase decision of consumers. A high proportion of positive reviews will bring substantial sales growth, while negative reviews will cause sales loss. Driv...
详细信息
Nowadays, online product reviews play a crucial role in the purchase decision of consumers. A high proportion of positive reviews will bring substantial sales growth, while negative reviews will cause sales loss. Driven by the immense financial profits, many spammers try to promote their products or demote their competitors' products by posting fake and biased online reviews. By registering a number of accounts or releasing tasks in crowdsourcing platforms, many individual spammers could be organized as spammer groups to manipulate the product reviews together and can be more damaging. Existing works on spammer group detection extract spammer group candidates from review data and identify the real spammer groups using unsupervised spamicity ranking methods. Actually, according to the previous research, labeling a small number of spammer groups is easier than one assumes, however, few methods try to make good use of these important labeled data. In this paper, we propose a partially supervised learning model (PSGD) to detect spammer groups. By labeling some spammer groups as positive instances, PSGD applies positive unlabeled learning (PU-Learning) to study a classifier as spammer group detector from positive instances (labeled spammer groups) and unlabeled instances (unlabeled groups). Specifically, we extract reliable negative set in terms of the positive instances and the distinctive features. By combining the positive instances, extracted negative instances and unlabeled instances, we convert the PU-Learning problem into the well-known semi supervised learning problem, and then use a Naive Bayesian model and an em algorithm to train a classifier for spammer group detection. Experiments on real-life *** data set show that the proposed PSGD is effective and outperforms the state-of-the-art spammer group detection methods.
In this letter, we exploit the feature of data redundancy associated with alternate-relaying cooperative systems to develop an iterative channel estimation algorithm in the context of orthogonal frequency division mul...
详细信息
In this letter, we exploit the feature of data redundancy associated with alternate-relaying cooperative systems to develop an iterative channel estimation algorithm in the context of orthogonal frequency division multiplexing (OFDM) transmission. Our attention is also focused on the problem of in-phase/quadrature-phase (IQ) imbalance which is typically associated with OFDM transmission. Analytical analysis indicates that instead of estimating a family of parameters including IQ imbalance occurring at the source, relays, and destination, and channel impulse responses (CIRs) between the source-destination link, and relays-destination links, we can estimate one parameter called the equivalent CIR. In addition, we illustrate how to perform data detection using the estimated parameter. By employing expectation-maximization algorithm, we show that soft information provided by the detector can be combined with pilot symbols in an efficient way to enhance the estimation process. Simulations experiments have confirmed the efficiency of the proposed approach.
Simplex distribution has been proved useful for modelling double-bounded variables in data directly. Yet, it is not sufficient for multimodal distributions. This article addresses the problem of estimating a density w...
详细信息
Simplex distribution has been proved useful for modelling double-bounded variables in data directly. Yet, it is not sufficient for multimodal distributions. This article addresses the problem of estimating a density when data is restricted to the (0,1) interval and contains several modes. Particularly, we propose a simplex mixture model approach to model this kind of data. In order to estimate the parameters of the model, an Expectation Maximization (em) algorithm is developed. The parameter estimation performance is evaluated through simulation studies. Models are explored using two real datasets: i) gene expressions data of patients' survival times and the relation to adenocarcinoma and ii) magnetic resonant images (MRI) with a view in segmentation. In the latter case, given that data contains zeros, the main model is modified to consider the zero-inflated setting.
暂无评论