检索结果-内蒙古大学图书馆

作者： Solomon, Nicole Chanel Duke University

学位级别：博士

Probabilistic record linkage is the task of combining multiple data sources for statistical analysis by identifying records pertaining to the same individual in different databases. The need to perform probabilistic record linkage arises in comparative effectiveness research and other clinical research scenarios when records in different databases do not share an error-free unique patient identifier. This dissertation seeks to develop new methodology for probabilistic record linkage to address two highly practical and recurring challenges: how to implement record linkage in a manner that optimizes downstream statistical analyses of the linked data, and how to efficiently link databases having a clustered or multi-level data *** Chapter 2 we propose a new framework for balancing the tradeoff between false positive and false negative linkage errors when linked data are analyzed in a generalized linear model framework and non-linked records lead to missing data for the study outcome variable. Our method seeks to maximize the probability that the point estimate of the parameter of interest will have the correct sign and that the confidence interval around this estimate will correctly exclude the null value of zero. Using large sample approximations and a model for linkage errors, we derive expressions relating bias and hypothesis testing power to the user's choice of threshold that determines how many records will be linked. We use these results to propose three data-driven threshold selection rules. Under one set of simplifying assumptions we prove that maximizing asymptotic power requires that the threshold be relaxed at least until the point where all pairs with >50% probability of being a true match are *** Chapter 3 we explore the consequences of linkage errors when the study outcome variable is determined by linkage status and so linkage errors may cause outcome misclassification. This scenario arises when the outcome is disease status and those lin

关键词： Biostatistics data-driven em algorithm identifiability mixture model record linkage

来源：评论

学校读者我要写书评

暂无评论

Clustering in linear mixed models with approximate Dirichlet process mixtures using em algorithm

引用

STATISTICAL MODELLING 2013年第1期13卷 41-67页

作者： Heinzl, Felix Tutz, Gerhard Univ Munich Dept Stat Munich Germany

In linear mixed models, the assumption of normally distributed random effects is often inappropriate and unnecessarily restrictive. The proposed approximate Dirichlet process mixture assumes a hierarchical Gaussian mixture that is based on the truncated version of the stick breaking presentation of the Dirichlet process. In addition to the weakening of distributional assumptions, the specification allows to identify clusters of observations with a similar random effects structure. An Expectation-Maximization algorithm is given that solves the estimation problem and that, in certain respects, may exhibit advantages over Markov chain Monte Carlo approaches when modelling with Dirichlet processes. The method is evaluated in a simulation study and applied to the dynamics of unemployment in Germany as well as lung function growth data.

关键词： approximate Dirichlet process mixture em algorithm likelihood inference linear mixed models stick breaking

来源：评论

学校读者我要写书评

暂无评论

Inference in the Growth Curve Model under Multivariate Skew Normal Distribution

引用

SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS 2020年第1期82卷 34-69页

作者： Jana, Sayantee Balakrishnan, Narayanaswamy Hamid, Jemila S. McMaster Univ Dept Math & Stat Hamilton ON Canada Childrens Hosp Eastern Ontario Ottawa ON Canada

Existing methods for estimating the parameters of the Growth Curve Model (GCM) rely on the assumption that the underlying distribution for the error terms is multivariate normal. However, we often come across skewed data in practical applications;and estimators developed under the normality assumption may not be valid in such situations. Simulation studies conducted in this paper, in fact, show that existing methods are sensitive to skewness, where normal based estimators are associated with increased bias and mean squared error (MSE), when the normality assumption is violated. Methods appropriate for skewed distributions are, therefore, required. In this paper, estimators for the mean and covariance matrices of the GCM under multivariate skew normal (MSN) distribution are proposed. An estimator for the additional skewness parameter of the MSN distribution is also provided. The estimators are derived using the expectation maximization (em) algorithm and extensive simulations are performed to examine the performance of the estimators. Comparisons with existing estimators show that our estimators perform better than the existing estimators, when the underlying distribution is multivariate skew normal. Illustration using real data set is also provided.

关键词： Growth curve model Multivariate skew normal distribution em algorithm Longitudinal analysis Matrix estimation Primary Secondary

来源：评论

学校读者我要写书评

暂无评论

Mixtures of Gaussian Graphical Models with Constraints: Mélanges de modèles graphiques gaussiens sous contraintes

Mixtures of Gaussian Graphical Models with Constraints: Mél...

引用

作者： Lartigue, Thomas Institut polytechnique de Paris

学位级别：博士

La description des co-variations entre plusieurs variables aléatoires observées est un problème délicat. Les réseaux de dépendance sont des outils populaires qui décrivent les relations entre les variables par la présence ou l’absence d’arêtes entre les nœuds d’un graphe. En particulier, les graphes de corrélations conditionnelles sont utilisés pour représenter les corrélations “directes” entre les nœuds du graphe. Ils sont souvent étudiés sous l’hypothèse gaussienne et sont donc appelés “modèles graphiques gaussiens” (GGM). Un seul réseau peut être utilisé pour représenter les tendances globales identifiées dans un échantillon de données. Toutefois, lorsque les données observées sont échantillonnées à partir d’une population hétérogène, il existe alors différentes sous-populations qui doivent toutes être décrites par leurs propres graphes. De plus, si les labels des sous populations (ou “classes”) ne sont pas disponibles, des approches non supervisées doivent être mises en œuvre afin d’identifier correctement les classes et de décrire chacune d’entre elles avec son propre graphe. Dans ce travail, nous abordons le problème relativement nouveau de l’estimation hiérarchique des GGM pour des populations hétérogènes non labellisées. Nous explorons plusieurs axes clés pour améliorer l’estimation des paramètres du modèle ainsi que l’identification non supervisee des sous-populations. ´ Notre objectif est de s’assurer que les graphes de corrélations conditionnelles inférés sont aussi pertinents et interprétables que possible. Premièrement - dans le cas d’une population simple et homogène - nous développons une méthode composite qui combine les forces des deux principaux paradigmes de l’état de l’art afin d’en corriger les faiblesses. Pour le cas hétérogène non labellisé, nous proposons d’estimer un mélange de GGM avec un algorithme espérance-maximisation (em). Afin d’améliorer les solutions de cet algorithme em, et d’éviter de tomber dans des extrema locaux sous-optimaux q

关键词： Modèles graphiques Corrélations conditionnelles Estimation non-supervisée algorithme em Graphical model Conditional Correlations, Unsupervised Estimation em algorithm 519.5

来源：评论

学校读者我要写书评

暂无评论

A spatio-temporal model for detecting the effect of cocaine use disorder on functional connectivity

引用

SPATIAL STATISTICS 2021年 45卷 100530-100530页

作者： Zhao, Jifang Zhang, Qiong Fuentes, Montserrat Qian, Yanjun Ma, Liangsuo Moeller, Gerard Virginia Commonwealth Univ Dept Stat Sci & Operat Res Richmond VA 23284 USA Clemson Univ Sch Math & Stat Sci Clemson SC USA Univ Iowa Off Execut Vice President & Provost Iowa City IA USA Virginia Commonwealth Univ Inst Drug & Alcohol Studies Richmond VA USA

Drug addiction can lead to many health-related problems and social concerns. Researchers are interested in the association between long-term drug usage and abnormal functional connec-tivity. Functional connectivity obtained from functional magnetic resonance imaging data promotes a variety of fundamental un-derstandings in such association. Due to the complex correlation structure and large dimensionality, the modeling and analysis of the functional connectivity from neuroimage are challenging. By proposing a spatio-temporal model for multi-subject neuroimage data, we incorporate voxel-level spatio-temporal dependencies of whole-brain measurements to improve the accuracy of statis-tical inference. To tackle large-scale spatio-temporal neuroimage data, we develop a computational efficient algorithm to estimate the parameters. Our method is used to first identify functional connectivity, and then detect the effect of cocaine use disorder (CUD) on functional connectivity between different brain regions. The functional connectivity identified by our spatio-temporal model matches existing studies on brain networks, and further indicates that CUD may alter the functional connectivity in the medial orbitofrontal cortex subregions and the supplementary motor areas. (c) 2021 Elsevier B.V. All rights reserved.

关键词： Functional connectivity Spatio-temporal dependency em algorithm Functional magnetic resonance imaging (fMRI)

来源：评论

学校读者我要写书评

暂无评论

Optimizing information using the em algorithm in item response theory

引用

ANNALS OF OPERATIONS RESEARCH 2013年第1期206卷 627-646页

作者： Weissman, Alexander Law Sch Admiss Council Newtown PA 18940 USA

Latent trait models such as item response theory (IRT) hypothesize a functional relationship between an unobservable, or latent, variable and an observable outcome variable. In educational measurement, a discrete item response is usually the observable outcome variable, and the latent variable is associated with an examinee's trait level (e.g., skill, proficiency). The link between the two variables is called an item response function. This function, defined by a set of item parameters, models the probability of observing a given item response, conditional on a specific trait level. Typically in a measurement setting, neither the item parameters nor the trait levels are known, and so must be estimated from the pattern of observed item responses. Although a maximum likelihood approach can be taken in estimating these parameters, it usually cannot be employed directly. Instead, a method of marginal maximum likelihood (MML) is utilized, via the expectation-maximization (em) algorithm. Alternating between an expectation (E) step and a maximization (M) step, the em algorithm assures that the marginal log likelihood function will not decrease after each em cycle, and will converge to a local maximum. Interestingly, the negative of this marginal log likelihood function is equal to the relative entropy, or Kullback-Leibler divergence, between the conditional distribution of the latent variables given the observable variables and the joint likelihood of the latent and observable variables. With an unconstrained optimization for the M-step proposed here, the em algorithm as minimization of Kullback-Leibler divergence admits the convergence results due to Csiszar and Tusnady (Statistics & Decisions, 1:205-237, 1984), a consequence of the binomial likelihood common to latent trait models with dichotomous response variables. For this unconstrained optimization, the em algorithm converges to a global maximum of the marginal log likelihood function, yielding an information bound t

关键词： em algorithm Item response theory Latent trait theory Statistical estimation Marginal maximum likelihood Kullback-Leibler divergence Relative entropy Information Theory Model selection

来源：评论

学校读者我要写书评

暂无评论

Family of mean-mixtures of multivariate normal distributions: Properties, inference and assessment of multivariate skewness

引用

JOURNAL OF MULTIVARIATE ANALYSIS 2021年 181卷 104679-104679页

作者： Abdi, Me'raj Madadi, Mohsen Balakrishnan, Narayanaswamy Jamalizadeh, Ahad Shahid Bahonar Univ Kerman Dept Stat Fac Math & Comp Kerman Iran McMaster Univ Dept Math & Stat Hamilton ON Canada

In this paper, a new mixture family of multivariate normal distributions, formed by mixing multivariate normal distribution and a skewed distribution, is constructed. Some properties of this family, such as characteristic function, moment generating function, and the first four moments are derived. The distributions of affine transformations and canonical forms of the model are also derived. An em-type algorithm is developed for the maximum likelihood estimation of model parameters. Some special cases of the family, using standard gamma and standard exponential mixture distributions, denoted by MMNG and MMNE, respectively, are considered. For the proposed family of distributions, different multivariate measures of skewness are computed. In order to examine the performance of the developed estimation method, some simulation studies are carried out to show that the maximum likelihood estimates do provide a good performance. For different choices of parameters of MMNE distribution, several multivariate measures of skewness are computed and compared. Because some measures of skewness are scalar and some are vectors, in order to evaluate them properly, a simulation study is carried out to determine the power of tests, based on sample versions of skewness measures as test statistics for testing the fit of the MMNE distribution. Finally, two real data sets are used to illustrate the usefulness of the proposed model and the associated inferential methods. (C) 2020 Elsevier Inc. All rights reserved.

关键词： Canonical form em algorithm Mean mixtures of normal distribution Moments Multivariate measures of skewness

来源：评论

学校读者我要写书评

暂无评论

MRCIP: a robust Mendelian randomization method accounting for correlated and idiosyncratic pleiotropy

引用

BRIEFINGS IN BIOINFORMATICS 2021年第5期22卷 bbab019-bbab019页

作者： Xu, Siqi Fung, Wing Kam Liu, Zhonghua Univ Hong Kong Dept Stat & Actuarial Sci Pokfulam Rd Hong Kong Peoples R China

Mendelian randomization (MR) is a powerful instrumental variable (IV) method for estimating the causal effect of an exposure on an outcome of interest even in the presence of unmeasured confounding by using genetic variants as IVs. However, the correlated and idiosyncratic pleiotropy phenomena in the human genome will lead to biased estimation of causal effects if they are not properly accounted for. In this article, we develop a novel MR approach named MRCIP to account for correlated and idiosyncratic pleiotropy simultaneously. We first propose a random-effect model to explicitly model the correlated pleiotropy and then propose a novel weighting scheme to handle the presence of idiosyncratic pleiotropy. The model parameters are estimated by maximizing a weighted likelihood function with our proposed PRW-em algorithm. Moreover, we can also estimate the degree of the correlated pleiotropy and perform a likelihood ratio test for its presence. Extensive simulation studies show that the proposed MRCIP has improved performance over competing methods. We also illustrate the usefulness of MRCIP on two real datasets. The R package for MRCIP is publicly available at https://***/siqixu/MRCIP.

关键词： Mendelian randomization invalid instruments correlated pleiotropy idiosyncratic pleiotropy random effects weighting em algorithm

来源：评论

学校读者我要写书评

暂无评论

CADem: A conditional augmented data em algorithm for fitting one parameter probit models

引用

BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS 2013年第2期27卷 245-262页

作者： Azevedo, C. L. N. Andrade, D. F. Univ Estadual Campinas Inst Math Stat & Comp Sci Dept Stat BR-13083859 Campinas SP Brazil Univ Fed Santa Catarina Dept Informat & Stat BR-57072970 Florianopolis SC Brazil

In this article we develop an estimation method based on the augmented data scheme and em/Sem (Stochastic em) algorithms for fitting one-parameter probit (Rasch) IRT (Item Response Theory) models. Instead of using the S steps of the Sem algorithm, that is, instead of simulating values for the unobserved variables (augmented data and the latent traits), we consider the conditional expectations of a set of unobserved variables on the other set of unobserved variables, the current estimates of the parameters and the observed data, based on the full conditional distributions from the Gibbs sampling algorithm. Our method, named the CADem algorithm (conditional augmented data em), presents straightforward E steps, which avoid the need to evaluate the usual integrals, also facilitating the M steps, without the need to use numerical methods of optimization. We use the CADem algorithm to obtain both maximum likelihood estimates and maximum a posteriori estimates of the difficulty parameters for the one-parameter probit (Rasch) model. Also, we obtain estimates for the latent traits, based on conditional expectations. In addition, we show how to calculate the associated standard errors. Some directions are provided to extend our approach to other IRT models. In this respect, we perform a simulation study to compare the estimation methods. The results indicated that our approach is quite comparable to the usual marginal maximum likelihood (MML) and Gibbs sampling methods (GS) in terms of parameter recovery. However, CADem is as fast as MML and as flexible as GS.

关键词： Item response models maximum likelihood Bayesian estimates augmented data em algorithm

来源：评论

学校读者我要写书评

暂无评论

An em algorithm for the estimation of parameters of bivariate generalized exponential distribution under random left censoring

引用

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2013年第9期83卷 1648-1660页

作者： Dewan, Isha Nandi, Swagata Indian Stat Inst Theoret Stat & Math Unit New Delhi 110016 India

In this paper, we consider the four-parameter bivariate generalized exponential distribution proposed by Kundu and Gupta [Bivariate generalized exponential distribution, J. Multivariate Anal. 100 (2009), pp. 581-593] and propose an expectation-maximization algorithm to find the maximum-likelihood estimators of the four parameters under random left censoring. A numerical experiment is carried out to discuss the properties of the estimators obtained iteratively.

关键词： bivariate generalized exponential distribution random left censoring em algorithm pseudo-likelihood 62N01 62N02

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：