检索结果-内蒙古大学图书馆

Bayesian clustering of multiple zero-inflated outcomes

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES 2023年第2247期381卷 20220145-20220145页

作者： Franzolini, Beatrice Cremaschi, Andrea van den Boom, Willem De Iorio, Maria ASTAR Singapore Inst Clin Sci SICS Singapore Singapore Natl Univ Singapore Yong Loo Lin Sch Med Singapore Singapore UCL Dept Stat Sci London England

Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared with traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored Markov chain Monte Carlo schemes. We demonstrate the proposed approach on an application involving the use of the messaging service *** article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.

关键词： conditional algorithm excess-of-zeros data enriched priors hurdle model finite mixtures nested clustering

来源：评论

学校读者我要写书评

暂无评论

Applying clustering Techniques for Refining Large Data Set: Case Study on Malware

Applying Clustering Techniques for Refining Large Data Set: ...

引用

International Conference on Advanced Information Technologies (ICAIT)

作者： Thwe, Yoon Myet Ogawa, Mizuhito Pham Ngoc Dung Japan Adv Inst Sci & Technol JAIST Nomi Japan Le Quy Don Tech Univ Hanoi Vietnam

ISBN: (纸本)9781728151731

Malware databases have been unintentionally collecting garbage (incomplete malware) together with malware through the Internet. This paper focuses on finding garbage (incomplete malware) from large malware datasets using binary pattern matching and speed up the matching by using nested clustering as a preprocessing. To verify the effectiveness of our method, we conduct experiments on various malware datasets. The results show that our method works efficiently while maintaining high accuracy.

关键词： binary pattern matching nested clustering malware incomplete malware

来源：评论

学校读者我要写书评

暂无评论

COMPUTING WEIGHTED STRENGTH AND APPLICATIONS TO PARTITIONING

引用

SIAM JOURNAL ON DISCRETE MATHEMATICS 2018年第4期32卷 2747-2782页

作者： Galtier, Jerome Orange Labs 44 AV Republ F-92320 Chatillon France

We study the mathematical concept of the strength of a graph as defined by Cunningham to partition very large graphs. The strength is the objective value of an attack problem on a graph where the striker aims at dividing the graph into a maximum number of connected components removing as few edges as possible per component uncoupled. The strength is the optimal number of edges per component uncoupled or, in the weighted version, the sum of the weights of the edges removed per component uncoupled. Given a connected graph with n vertices and m weighted edges, we denote by W the total sum of the integer weights, and we describe an algorithm that approximates within a factor of 1 + epsilon the weighted strength in time O(m log(W) log(3)(n)/epsilon(2)), and that can be implemented with O(m) memory use if we allow a complexity of O(m log(2)(W) log(3)(n)/epsilon(2)). This result has several important applications, in particular for the detection of dense areas of a graph, in terms of weighted edges, which in turn can be used for spam detection, community identification, and partitioning.

关键词： strength of a graph nested clustering approximation algorithm graphic matroid dendogram Multiplicative-Weights-Update Method Kruskal algorithm community detection small-world ranking classification

来源：评论

学校读者我要写书评

暂无评论

Fast approximation for computing the fractional arboricity and extraction of communities of a graph

引用

DISCRETE APPLIED MATHEMATICS 2016年第0期213卷 179-195页

作者： Worou, Bio Mikaila Toko Galtier, Jerome Orange Labs Paris France INRIA CNRS UNS Mascotte I3S Paris France

In this paper, we develop some algorithmic aspects of the fractional arboricity of a graph in order to study some new approaches for graph clustering. For a given undirected graph G(V, E) with m edges and n vertices, the fractional arboricity gamma(G) measures the maximum edge density of the subgraphs of G(V, E). It is the fractional covering number of the corresponding graphic matroid. The fractional arboricity, for applications in networks reliability or the approximation of the chromatic number of the graphs, has been studied for many years. There are some algorithms in polynomial time to compute the fractional arboricity and its integer part. But, for large graphs such as the graph of the Web or graphs of social networks, the exact algorithms are not fast enough for practical use. That is why we describe a new FPTAS to compute an epsilon-approximation of the fractional arboricity (with epsilon > 0 as small as desired). Our algorithm uses the principle of the multiplicative weights update method, needs a memory of size O(m) and has a complexity of O(m log(2)(m) log(m/n)/epsilon(2)). We also give a 2-approximation of gamma(G) with computation time O(m), which is a quick preprocessing for the main algorithm. Finally, we present a fast algorithm to extract a subgraph which achieves the value of the approximation of the fractional arboricity. (C) 2014 Elsevier B.V. All rights reserved.

关键词： Fractional arboricity Community detection FPTAS Social networks nested clustering

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：