We first propose a two-step procedure to combine epidemiological data obtained from diverse sources with the aim to quantify risk factors affecting the probability that an individual develops certain disease such as c...
详细信息
We first propose a two-step procedure to combine epidemiological data obtained from diverse sources with the aim to quantify risk factors affecting the probability that an individual develops certain disease such as cancer. In the first step, we derive all possible unbiased estimating functions based on a group of cases and a group of controls each time. In the second step, we combine these estimating functions efficiently to make full use of the information contained in data. In a more extreme status where some of the risk factors for cases are totally missing, we extend our approach to make reasonable data imputations, and then perform a combination of the estimated regression coefficients. We optimize the combination by using a bootstrap procedure. Efficacies of our approaches are illustrated through several simulations and real data examples.
Information identities based on the product multinomial likelihood are proposed to illustrate the association between categorical variables. These identities are built upon the Pythagorean law of decomposed mutual inf...
详细信息
Information identities based on the product multinomial likelihood are proposed to illustrate the association between categorical variables. These identities are built upon the Pythagorean law of decomposed mutual information and examined to yield valid inference of log-linear and logistic models. For practical contingency tables, an optimal selection scheme of the information identity is formulated to yield proper log-linear models and logistic models, giving proper logarithmic odds ratios as maximum likelihood parameter estimates. Comparison of the proposed geometric information analysis with the classical AIC model selection is examined using empirical study of a medical data.
暂无评论