检索结果-内蒙古大学图书馆

Subgroup analysis based on structured mixed-effects models for longitudinal data

JOURNAL OF BIOPHARMACEUTICAL STATISTICS 2020年第4期30卷 607-622页

作者： Shen, Juan Qu, Annie Fudan Univ Dept Stat Shanghai Peoples R China Univ Calif Irvine Dept Stat Irvine CA USA

In recent years, subgroup analysis has emerged as an important tool to identify unknown subgroup memberships. However, subgroup analysis is still under-studied for longitudinal data. In this paper, we propose a structured mixed-effects approach for longitudinal data to model subgroup distribution and identify subgroup membership simultaneously. In the proposed structured mixed-effects model, the heterogeneous treatment effect is modeled as a random effect from a two-component mixture model, while the membership of the mixture model is incorporated using a logistic model with respect to some covariates. One advantage of our approach is that we are able to derive the estimation of the treatment effects through an em-type algorithm which keeps the subgroup membership unchanged over time. Our numerical studies and real data example demonstrate that the proposed model outperforms other competing methods.

关键词： em algorithm heterogeneous components mixed-effects models mixture model subgroup identification

来源：评论

学校读者我要写书评

暂无评论

A Scalable empirical Bayes Approach to Variable Selection in Generalized Linear Models

引用

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2020年第3期29卷 535-546页

作者： Bar, Haim Y. Booth, James G. Wells, Martin T. Univ Connecticut Dept Stat Storrs CT 06269 USA Cornell Univ Dept Stat & Data Sci Ithaca NY USA

A new empirical Bayes approach to variable selection in the context of generalized linear models is developed. The proposed algorithm scales to situations in which the number of putative explanatory variables is very large, possibly much larger than the number of responses. The coefficients in the linear predictor are modeled as a three-component mixture allowing the explanatory variables to have a random positive effect on the response, a random negative effect, or no effect. A key assumption is that only a small (but unknown) fraction of the candidate variables have a nonzero effect. This assumption, in addition to treating the coefficients as random effects facilitates an approach that is computationally efficient. In particular, the number of parameters that have to be estimated is small, and remains constant regardless of the number of explanatory variables. The model parameters are estimated using a generalized alternating maximization algorithm which is scalable, and leads to significantly faster convergence compared with simulation-based fully Bayesian methods. for this article are available online.

关键词： em algorithm Feature selection Generalized linear mixed model High-dimensional data Mixture model Sparsity

来源：评论

学校读者我要写书评

暂无评论

Inference and diagnostics for heteroscedastic nonlinear regression models under skew scale mixtures of normal distributions

引用

JOURNAL OF APPLIED STATISTICS 2020年第9期47卷 1690-1719页

作者： Ferreira, Clecio da Silva Lachos, Victor H. Garay, Aldo M. Univ Fed Juiz de Fora Dept Stat BR-36036900 Juiz De Fora MG Brazil Univ Connecticut Dept Stat Storrs CT USA Univ Fed Pernambuco Dept Stat Recife PE Brazil

The heteroscedastic nonlinear regression model (HNLM) is an important tool in data modeling. In this paper we propose a HNLM considering skew scale mixtures of normal (SSMN) distributions, which allows fitting asymmetric and heavy-tailed data simultaneously. Maximum likelihood (ML) estimation is performed via the expectation-maximization (em) algorithm. The observed information matrix is derived analytically to account for standard errors. In addition, diagnostic analysis is developed using case-deletion measures and the local influence approach. A simulation study is developed to verify the empirical distribution of the likelihood ratio statistic, the power of the homogeneity of variances test and a study for misspecification of the structure function. The method proposed is also illustrated by analyzing a real dataset.

关键词： em algorithm heteroscedastic nonlinear regression models influence diagnostics likelihood ratio test skew scale mixtures of normal distributions

来源：评论

学校读者我要写书评

暂无评论

An issue of identifying longitudinal biomarkers for competing risks data with masked causes of failure considering frailties model

引用

STATISTICAL METHODS IN MEDICAL RESEARCH 2020年第2期29卷 603-616页

作者： Ko, Feng-shou KF Stat Consulting Co Kaohsiung Taiwan

In this paper, we consider joint modeling of repeated measurements and competing risks failure time data to allow for more than one distinct failure type in the survival endpoint. Hence, we can fit a cause-specific hazards submodel to allow for competing risks, with a separate latent association between longitudinal measurements and each cause of failure. We also consider the possible masked causes of failure in joint modeling of repeated measurements and competing risks failure time data. We also derive a score test to identify longitudinal biomarkers or surrogates for a time-to-event outcome in competing risks data which contain masked causes of failure. With a carefully chosen definition of complete data, the maximum likelihood estimation of the cause-specific hazard functions and of the masking probabilities is performed via an expectation maximization algorithm. The simulations are used to explore how the number of individuals, the number of time points per individual, and the functional form of the random effects from the longitudinal biomarkers considering heterogeneous baseline hazards in individuals influence the power to detect the association of a longitudinal biomarker and the survival time.

关键词： Competing risks em algorithm masked causes of failure repeated measurements surrogate

来源：评论

学校读者我要写书评

暂无评论

Introducing a general class of species diversification models for phylogenetic trees

引用

STATISTICA NEERLANDICA 2020年第3期74卷 261-274页

作者： Richter, Francisco Haegeman, Bart Etienne, Rampal S. Wit, Ernst C. Univ Groningen Bernoulli Inst Math Comp Sci & Artificial Intelli Groningen Netherlands Univ Groningen Groningen Inst Evolutionary Life Sci Groningen Netherlands CNRS Theoret & Expt Ecol Stn Toulouse France Paul Sabatier Univ Toulouse France USI Inst Computat Sci Lugano Switzerland

Phylogenetic trees are types of networks that describe the temporal relationship between individuals, species, or other units that are subject to evolutionary diversification. Many phylogenetic trees are constructed from molecular data that is often only available for extant species, and hence they lack all or some of the branches that did not make it into the present. This feature makes inference on the diversification process challenging. For relatively simple diversification models, analytical or numerical methods to compute the likelihood exist, but these do not work for more realistic models in which the likelihood depends on properties of the missing lineages. In this article, we study a general class of species diversification models, and we provide an expectation-maximization framework in combination with a uniform sampling scheme to perform maximum likelihood estimation of the parameters of the diversification process.

关键词： em algorithm generalized linear models importance sampling nonhomogeneous Poisson process phylogenetic trees

来源：评论

学校读者我要写书评

暂无评论

Statistically efficient association analysis of quantitative traits with haplotypes and untyped SNPs in family studies

引用

BMC GENETICS 2020年第1期21卷 99-99页

作者： Diao, Guoqing Lin, Dan-yu George Washington Univ Dept Biostat & Bioinformat Washington DC 20052 USA Univ N Carolina Dept Biostat Chapel Hill NC 27515 USA

Background Associations between haplotypes and quantitative traits provide valuable information about the genetic basis of complex human diseases. Haplotypes also provide an effective way to deal with untyped SNPs. Two major challenges arise in haplotype-based association analysis of family data. First, haplotypes may not be inferred with certainty from genotype data. Second, the trait values within a family tend to be correlated because of common genetic and environmental factors. Results To address these challenges, we present an efficient likelihood-based approach to analyzing associations of quantitative traits with haplotypes or untyped SNPs. This approach properly accounts for within-family trait correlations and can handle general pedigrees with arbitrary patterns of missing genotypes. We characterize the genetic effects on the quantitative trait by a linear regression model with random effects and develop efficient likelihood-based inference procedures. Extensive simulation studies are conducted to examine the performance of the proposed methods. An application to family data from the Childhood Asthma Management Program Ancillary Genetic Study is provided. A computer program is freely available. Conclusions Results from extensive simulation studies show that the proposed methods for testing the haplotype effects on quantitative traits have correct type I error rates and are more powerful than some existing methods.

关键词： Complex diseases em algorithm Gene-environment interactions Haplotype analysis Hardy-Weinberg equilibrium Unphased genotype Variance-component models

来源：评论

学校读者我要写书评

暂无评论

Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns

引用

NUTRITION RESEARCH 2020年第0期75卷 67-76页

作者： Malan, Linda Smuts, Cornelius M. Baumgartner, Jeannine Ricci, Cristian North West Univ Ctr Excellence Nutr Potchefstroom South Africa ETH Lab Human Nutr Inst Food Nutr & Hlth Zurich Switzerland Univ Leipzig Med Fac Dept Pediat Pediat Epidemiol Leipzig Germany

Principal component analysis (PCA) is a popular statistical tool. However, despite numerous advantages, the good practice of imputing missing data before PCA is not common. In the present work, we evaluated the hypothesis that the expectation-maximization (em) algorithm for missing data imputation is a reliable and advantageous procedure when using PCA to derive biomarker profiles and dietary patterns. To this aim, we used numerical simulations aimed to mimic real data commonly observed in nutritional research. Finally, we showed the advantages and pitfalls of the em algorithm for missing data imputation applied to plasma fatty acid concentrations and nutrient intakes from real data sets deriving from the US National Health and Nutrition Examination Survey. PCA applied to simulated data having missing values resulted in biased eigenvalues with respect to the original data set without missing values. The bias between the eigenvalues from the original set of data and from the data set with missing values increased with number of missing values and appeared as independent with respect to the correlation structure among variables. On the other hand, when data were imputed, the mean of the eigenvalues over the 10 missing imputation runs overlapped with the ones derived from the PCA applied to the original data set. These results were confirmed when real data sets from the National Health and Nutrition Examination Survey were analyzed. We accept the hypothesis that the em algorithm for missing data imputation applied before PCA aimed to derive biochemical profiles and dietary patterns is an effective technique especially for relatively small sample sizes. (C) 2020 Elsevier Inc. All rights reserved.

关键词： Missing data imputation em algorithm Principal component analysis Dietary patterns Biochemical profiles

来源：评论

学校读者我要写书评

暂无评论

A Selective Overview and Comparison of Robust Mixture Regression Estimators

引用

INTERNATIONAL STATISTICAL REVIEW 2020年第1期88卷 176-202页

作者： Yu, Chun Yao, Weixin Yang, Guangren Jiangxi Univ Finance & Econ Sch Stat Nanchang 330013 Jiangxi Peoples R China Univ Calif Riverside Dept Stat Riverside CA 92521 USA Jinan Univ Sch Econ Dept Stat Guangzhou 510632 Guangdong Peoples R China

Mixture regression models have been widely used in business, marketing and social sciences to model mixed regression relationships arising from a clustered and thus heterogeneous population. The unknown mixture regression parameters are usually estimated by maximum likelihood estimators using the expectation-maximisation algorithm based on the normality assumption of component error density. However, it is well known that the normality-based maximum likelihood estimation is very sensitive to outliers or heavy-tailed error distributions. This paper aims to give a selective overview of the recently proposed robust mixture regression methods and compare their performance using simulation studies.

关键词： em algorithm outliers robust mixture modelling robust mixture regression

来源：评论

学校读者我要写书评

暂无评论

Analogy-Based Approaches to Improve Software Project Effort Estimation Accuracy

引用

JOURNAL OF INTELLIGENT SYSTemS 2020年第1期29卷 1468-1479页

作者： Resmi, V Vijayalakshmi, S. Udaya Sch Engn Dept Comp Applicat Vellamodi 629204 Tamil Nadu India Thiagarajar Coll Engn Dept Comp Applicat Madurai 625015 Tamil Nadu India

In the discipline of software development, effort estimation renders a pivotal role. For the successful development of the project, an unambiguous estimation is necessitated. But there is the inadequacy of standard methods for estimating an effort which is applicable to all projects. Hence, to procure the best way of estimating the effort becomes an indispensable need of the project manager. Mathematical models are only mediocre in performing accurate estimation. On that account, we opt for analogy-based effort estimation by means of some soft computing techniques which rely on historical effort estimation data of the successfully completed projects to estimate the effort. So in a thorough study to improve the accuracy, models are generated for the clusters of the datasets with the confidence that data within the cluster have similar properties. This paper aims mainly on the analysis of some of the techniques to improve the effort prediction accuracy. Here the research starts with analyzing the correlation coefficient of the selected datasets. Then the process moves through the analysis of classification accuracy, clustering accuracy, mean magnitude of relative error and prediction accuracy based on some machine learning methods. Finally, a bio-inspired firefly algorithm with fuzzy analogy is applied on the datasets to produce good estimation accuracy.

关键词： Effort estimation analogy-based estimation classification clustering firefly optimization fuzzy analogy linear regression multilayer perceptron k-means algorithm em algorithm

来源：评论

学校读者我要写书评

暂无评论

Tree-based inference of species interaction networks from abundance data

引用

METHODS IN ECOLOGY AND EVOLUTION 2020年第5期11卷 621-632页

作者： Momal, Raphaelle Robin, Stephane Ambroise, Christophe Univ Paris Saclay UMR MIA Paris AgroParisTech INRA Paris France Lab Math & Modelisat Evry Evry France

The behaviour of ecological systems mainly relies on the interactions between the species it involves. We consider the problem of inferring the species interaction network from abundance data. To be relevant, any network inference methodology needs to handle count data and to account for possible environmental effects. It also needs to distinguish between direct interactions and indirect associations and graphical models provide a convenient framework for this purpose. A simulation study shows that the proposed methodology compares well with state-of-the-art approaches, even when the underlying graph strongly differs from a tree. The analysis of two datasets highlights the influence of covariates on the inferred network. Accounting for covariates is critical to avoid spurious edges. The proposed approach could be extended to perform network comparison or to look for missing species.

关键词： abundance data covariates adjustment em algorithm graphical models matrix tree theorem Poisson log-normal model species interaction network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：