检索结果-内蒙古大学图书馆

18th IEEE International Conference on Data Mining Workshops (ICDMW)

作者： Kocayusufoglu, Furkan Hoang, Minh X. Singh, Ambuj K. Univ Calif Santa Barbara Comp Sci Dept Santa Barbara CA 93106 USA

ISBN: (纸本)9781538691595

Understanding and modeling complex network processes is an important task in many real-world applications. The first challenge is to discover patterns in such complex data. In this work, our goal is to summarize different processes in a network by a small yet interpretable set of network patterns, each of which represents a local community of connected nodes frequently participating in the same network processes. We formulate this problem as a Boolean Matrix Factorization with a network constraint, which we prove to be NP-hard. We then propose an efficient algorithm that incrementally adds the best patterns and achieve scalability with two further improvements. First, to decide which network processes contain which network patterns, we introduce two mapping algorithms with linear costs. Second, to systematically mine the exponential subgraph search space for good patterns, we devise two sampling algorithms based on Monte Carlo Markov Chain. Experimental results on both synthetic and real-world datasets show that our solutions are scalable and find network patterns that effectively summarize network processes.

关键词： Boolean Matrix Factorization network-constraint network Processes

来源：评论

学校读者我要写书评

暂无评论

network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction

引用

CANCER INFORMATICS 2014年第SUPPL 6期13卷 25-33页

作者： Tian, Xinyu Wang, Xuefeng Chen, Jun SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY USA SUNY Stony Brook Dept Prevent Med Stony Brook NY USA Mayo Clin Div Biomed Stat & Informat Rochester MN 55905 USA

Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.

关键词： cancer subtype prediction multinomial logit model group lasso network-constraint proximal gradient algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：