版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Xiamen Univ Sch Econ Dept Stat & Data Sci Xiamen Fujian Peoples R China Xiamen Univ Wang Yanan Inst Studies Econ Xiamen Peoples R China Yale Sch Publ Hlth Dept Biostat New Haven CT USA Shanghai Jiao Tong Univ Sch Med Sch Publ Hlth Shanghai Peoples R China
出 版 物:《STATISTICS IN MEDICINE》 (Stat. Med.)
年 卷 期:2025年第44卷第3-4期
页 面:e10330页
核心收录:
学科分类:0710[理学-生物学] 1004[医学-公共卫生与预防医学(可授医学、理学学位)] 1001[医学-基础医学(可授医学、理学学位)] 0714[理学-统计学(可授理学、经济学学位)] 10[医学]
基 金:National Natural Science Foundation of China [CA204120, CA121974, CA196530] NIH [72071169, 82204153, 71988101] National Science Foundation of China [22JJD910001] MOE Project of Key Research Institute of Humanities and Social Sciences
主 题:G-E interactions hierarchical multi-label classification high-dimensional data semi-supervised
摘 要:In biomedical studies, gene-environment (G-E) interactions have been demonstrated to have important implications for analyzing disease outcomes beyond the main G and main E effects. Many approaches have been developed for G-E interaction analysis, yielding important findings. However, hierarchical multi-label classification, which provides insightful information on disease outcomes, remains unexplored in G-E analysis literature. Moreover, unlabeled data are commonly observed in practical settings but omitted by many existing methods of hierarchical multi-label classification. In this study, we consider a semi-supervised scenario and develop a novel approach for the two-layer hierarchical response with G-E interactions. A two-step penalized estimation is then proposed using an efficient expectation-maximization (EM) algorithm. Simulation shows that it has superior performance in classification and feature selection. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer demonstrates the practical utility of the proposed method. Overall, this study can fill the important knowledge gap in G-E interaction analysis by providing a widely applicable framework for hierarchical multi-label classification of complex disease outcomes.