咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Limit theorems for empirical R... 收藏

Limit theorems for empirical R,nyi entropy and divergence with applications to molecular diversity analysis

用应用程序为实验 R é nyi 熵和分叉限制定理到分子的差异分析

作     者:Pietrzak, Maciej Rempala, Grzegorz A. Seweryn, Michal Wesolowski, Jacek 

作者机构:Ohio State Univ Coll Publ Hlth Div Biostat Columbus OH 43210 USA Ohio State Univ Math Biosci Inst Columbus OH 43210 USA Univ Lodz Dept Math & Comp Sci Lodz Poland Warsaw Univ Sci & Technol Wydzial Matemat & Nauk Informacyjnych Warsaw Poland 

出 版 物:《TEST》 (验算:西班牙统计与运筹学会杂志)

年 卷 期:2016年第25卷第4期

页      面:654-673页

核心收录:

学科分类:0202[经济学-应用经济学] 02[经济学] 020208[经济学-统计学] 07[理学] 0714[理学-统计学(可授理学、经济学学位)] 

基  金:US NIH [R01CA-152158, U01-GM092655] US NSF [DMS-1318886] Direct For Mathematical & Physical Scien Division Of Mathematical Sciences Funding Source: National Science Foundation 

主  题:Hill number Central limit theorem Next-generation sequencing Triangular arrays T-cell receptors 

摘      要:Quantitative methods for studying biodiversity have been traditionally rooted in the classical theory of finite frequency tables analysis. However, with the help of modern experimental tools, like high-throughput sequencing, we now begin to unlock the outstanding diversity of genomic data in plants and animals reflective of the long evolutionary history of our planet. This molecular data often defies the classical frequency/contingency tables assumptions and seems to require sparse tables with very large number of categories and highly unbalanced cell counts, e.g., following heavy-tailed distributions (for instance, power laws). Motivated by the molecular diversity studies, we propose here a frequency-based framework for biodiversity analysis in the asymptotic regime where the number of categories grows with sample size (an infinite contingency table). Our approach is rooted in information theory and based on the Gaussian limit results for the effective number of species (the Hill numbers) and the empirical Renyi entropy and divergence. We argue that when applied to molecular biodiversity analysis, our methods can properly account for the complicated data frequency patterns on one hand and the practical sample size limitations on the other. We illustrate this principle with two specific RNA sequencing examples: a comparative study of T-cell receptor populations and a validation of some preselected molecular hepatocellular carcinoma (HCC) markers.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分