检索结果-内蒙古大学图书馆

Stochastic Block Models for Complex Network Analysis: A Survey

ACM Transactions on Knowledge discovery from Data 2025年第3期19卷 1-35页

作者： Liu, Xueyan Song, Wenzhuo Musial, Katarzyna Li, Yang Zhao, Xuehua Yang, Bo College of Computer Science and Technology Jilin University Changchun China School of Information Science and Technology Northeast Normal University Changchun China Complex Adaptive Systems Lab Data Science Institute University of Technology Sydney Sydney Australia Aviation University of the Air Force Changchun China School of Digital Media Shenzhen Institute of Information Technology Shenzhen China Key Laboratory of Symbolic Computation and Knowledge Engineer Jilin University Ministry of Education Changchun China

Complex networks enable to represent and characterize the interactions between entities in various complex systems which widely exist in the real world and usually generate vast amounts of data about all the elements, their behaviors and interactions over time. The studies concentrating on new network analysis approaches and methodologies are vital because of the diversity and ubiquity of complex networks. The stochastic block model (SBM), based on Bayesian theory, is a statistical network model. SBMs are essential tools for analyzing complex networks since SBMs have the advantages of interpretability, expressiveness, flexibility and generalization. Thus, designing diverse SBMs and their learning algorithms for various networks has become an intensively researched topic in network analysis and data mining. In this article, we review, in a comprehensive and in-depth manner, SBMs for different types of networks (i.e., model extensions), existing methods (including parameter estimation and model selection) for learning optimal SBMs for given networks and SBMs combined with deep learning. Finally, we provide an outlook on the future research directions of SBMs. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.

关键词： Complex Networks structural pattern discovery Stochastic Block models Learning Methods of SBMs

来源：评论

学校读者我要写书评

暂无评论

Multiple graph alignment for the structural analysis of protein active sites

引用

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007年第2期4卷 310-320页

作者： Weskamp, Nils Huellermeier, Eyke Kuhn, Daniel Klebe, Gerhard Univ Marburg Dept Math & Comp Sci D-35032 Marburg Germany Univ Marburg Inst Pharmaceut Chem D-35032 Marburg Germany

Graphs are frequently used to describe the geometry and also the physicochemical composition of protein active sites. Here, the concept of graph alignment as a novel method for the structural analysis of protein binding pockets is presented. Using inexact graph-matching techniques, one is able to identify both conserved areas and regions of difference among different binding pockets. Thus, using multiple graph alignments, it is possible to characterize functional protein families and to examine differences among related protein families independent of sequence or fold homology. Optimized algorithms are described for the efficient calculation of multiple graph alignments for the analysis of physicochemical descriptors representing protein binding pockets. Additionally, it is shown how the calculated graph alignments can be analyzed to identify structural features that are characteristic for a given protein family and also features that are discriminative among related families. The methods are applied to a substantial high-quality subset of the PDB database and their ability to successfully characterize and classify 10 highly populated functional protein families is shown. Additionally, two related protein families from the group of serine proteases are examined and important structural differences are detected automatically and efficiently.

关键词： knowledge discovery in databases structural pattern discovery fuzzy patterns graph mining drug design

来源：评论

学校读者我要写书评

暂无评论

Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2002年第4期14卷 731-749页

作者： Wang, X Wang, JTL Shasha, D Shapiro, BA Rigoutsos, I Zhang, KZ Calif State Univ Fullerton Dept Comp Sci Fullerton CA 92834 USA New Jersey Inst Technol Dept Comp & Informat Sci Newark NJ 07102 USA NYU Courant Inst Math Sci Dept Comp Sci New York NY 10012 USA Natl Canc Inst Lab Expt & Computat Biol Div Basic Sci Frederick MD 21702 USA IBM Corp Thomas J Watson Res Ctr Yorktown Hts NY 10598 USA Univ Western Ontario Dept Comp Sci London ON N6A 5B7 Canada

This paper presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance "approximate occurrence.") The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the label-triplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes, and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering.

关键词： KDD classification and clustering data mining geometric hashing structural pattern discovery biochemistry medicine

来源：评论

学校读者我要写书评

暂无评论

Discovering frequent induced subgraphs from directed networks

引用

INTELLIGENT DATA ANALYSIS 2018年第6期22卷 1279-1296页

作者： Zhang, Sen Du, Zhihui Wang, Jason T. L. Jiang, Haodi SUNY Coll Oneonta Dept Math Comp Sci & Stat New York NY 13820 USA Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China New Jersey Inst Technol Dept Comp Sci Newark NJ 07102 USA

Directed networks find many applications in computer science, social science and biomedicine, among others. In this paper we propose a new graph mining algorithm that is capable of locating all frequent induced subgraphs in a given set of directed networks. We present an incremental coding scheme for representing the canonical form of a graph, study its properties, and develop new techniques for pattern generation suitable for directed networks. We prove that our algorithm is complete, meaning that no qualified pattern is missed by the algorithm. Furthermore, our algorithm is correct in the sense that all patterns found by the algorithm are frequent induced subgraphs in the given networks. Experimental results based on synthetic data and gene regulatory networks show the good performance of our algorithm, and its application in network inference.

关键词： Apriori algorithm graph mining network inference structural pattern discovery

来源：评论

学校读者我要写书评

暂无评论

SchemaDecrypt plus plus : Parallel on-line Versioned Schema Inference for Large Semantic Web Data sources

引用

INFORMATION SYSTEMS 2020年 93卷 101551-101551页

作者： Kellou-Menouer, Kenza Kedad, Zoubida Versailles St Quentin En Yvelines Univ DAVID Lab Versailles France Paris Nanterre Univ Nanterre France

A growing number of linked data sources are published on the Web. They form a single huge data space referred to as the Web of data. These data sources contain both the data and the schema describing them, but the data is not constrained by this schema. Indeed, two instances of the same class may be described by different properties. This flexibility for describing the data eases their evolution, but it comes at the cost of losing the description of the data, which can be useful in many contexts. The different structures of a class represent its versions. These versions provide useful information on property co-occurrence for a class, but their discovery can be very costly, and even impossible because the data sources are remote. Furthermore, they may have some access limitations, either on the query execution time, or on the number of queries, or on the size of the results. In this paper, we present SchemaDecrypt + +, a novel approach for the parallel discovery of a versioned schema for a remote data source. Our approach discovers the versions on-line, without uploading or browsing the data source. Broadly speaking, SchemaDecrypt + + allows to discover co-occurrences between properties from any set of properties: (i) specified by the user;(ii) describing the instances of a class or (iii) specified in the schema. SchemaDecrypt + + relies on our previous approach for schema discovery, SchemaDecrypt;in the present work we introduce a new strategy of parallelization of class version exploration, based on the discovery of a set of occurrence rules between the properties of the class. This strategy enables to overcome the source querying restrictions, the combinatorial explosion of the candidate versions and it improves the performances. We present some experimental evaluations on DBpedia to demonstrate the effectiveness of our approach. (C) 2020 Elsevier Ltd. All rights reserved.

关键词： RDF Property co-occurrence Source restrictions structural pattern discovery

来源：评论

学校读者我要写书评

暂无评论

Finding patterns on protein surfaces: Algorithms and applications to protein classification

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2005年第8期17卷 1065-1078页

作者： Wang, X Calif State Univ Fullerton Dept Comp Sci Fullerton CA 92834 USA

A successful application of data mining to bioinformatics is protein classification. A number of techniques have been developed to classify proteins according to important features in their sequences, secondary structures, or three-dimensional structures. In this paper, we introduce a novel approach to protein classification based on significant patterns discovered on the surface of a protein. We define a notion called alpha-surface. We discuss the geometric properties of alpha-surface and present an algorithm that calculates the alpha-surface from a finite set of points in R-3. We apply the algorithm to extracting the alpha-surface of a protein and use a pattern discovery algorithm to discover frequently occurring patterns on the surfaces. The pattern discovery algorithm utilizes a new index structure called the Delta B+ tree. We use these patterns to classify the proteins. While most existing techniques focus on the binary classification problem, we apply our approach to classifying three families of proteins. Experimental results show the good performance of the proposed approach.

关键词： KDD classification data mining structural pattern discovery biochemistry medicine

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：