Many practice problems can be transformed into complex networks, and complex network community discovery has become a hot research topic in various fields. The classic label propagation algorithm (LPA) can give commun...
详细信息
Many practice problems can be transformed into complex networks, and complex network community discovery has become a hot research topic in various fields. The classic label propagation algorithm (LPA) can give community partition very quickly, but stability of the algorithm is poor due to random labelpropagation. To solve this problem, community leader principle is built and transition probability is introduced, a label propagation algorithm based on community leader and transition probability (CTLPA) is proposed. CTLPA selects threatened leaders and their communities according to the community leader principle, and uses the transition probability and the degree of the leader to jointly control the order for community merger, so that the threatened leader continuously devours the communities that threaten him, until a preliminary community partition is formed. To further reduce the number of community, in CTLPA, based on the characteristic of the community structure: close relationship within the community and sparse relationship outside the community, the closest communities are merged, until the final community partition is obtained. The CTLPA is compared with other five classic algorithms on LFR artificially generated networks and several real data sets. The experimental results show that CTLPA is robust in community partition, it always gives the same community partition, while the LPA will give different results from multiple independent runs. The number of community partition and the normalized mutual information (NMI) of the CTLPA are the best in most cases.
Background: Many large-scale studies analyzed high-throughput genomic data to identify altered pathways essential to the development and progression of specific types of cancer. However, no previous study has been ext...
详细信息
Background: Many large-scale studies analyzed high-throughput genomic data to identify altered pathways essential to the development and progression of specific types of cancer. However, no previous study has been extended to provide a comprehensive analysis of pathways disrupted by copy number alterations across different human cancers. Towards this goal, we propose a network-based method to integrate copy number alteration data with human protein-protein interaction networks and pathway databases to identify pathways that are commonly disrupted in many different types of cancer. Results: We applied our approach to a data set of 2,172 cancer patients across 16 different types of cancers, and discovered a set of commonly disrupted pathways, which are likely essential for tumor formation in majority of the cancers. We also identified pathways that are only disrupted in specific cancer types, providing molecular markers for different human cancers. Analysis with independent microarray gene expression datasets confirms that the commonly disrupted pathways can be used to identify patient subgroups with significantly different survival outcomes. We also provide a network view of disrupted pathways to explain how copy number alterations affect pathways that regulate cell growth, cycle, and differentiation for tumorigenesis. Conclusions: In this work, we demonstrated that the network-based integrative analysis can help to identify pathways disrupted by copy number alterations across 16 types of human cancers, which are not readily identifiable by conventional overrepresentation-based and other pathway-based methods. All the results and source code are available at http://***/NetPathID/.
Science and technology are highly inheritable undertakings, and any scientific and technological worker can make good progress without the experience and achievements of predecessors or others. In the face of an ever-...
详细信息
Science and technology are highly inheritable undertakings, and any scientific and technological worker can make good progress without the experience and achievements of predecessors or others. In the face of an ever-expanding pool of literature, the ability to efficiently and accurately search for similar works is a major challenge in current research. This paper uses Latent Dirichlet Allocation (LDA) topic model to construct feature vectors for the title and abstract, and the bag-of-words model to construct feature vectors for publication type. The similarity between the feature vectors is measured by calculating the cosine values. The experiment demonstrated that the precision, recall and WSS95 scores of the algorithm proposed in the study were 90.55%, 98.74% and 52.45% under the literature title element, and 91.78%, 99.58% and 62.47% under the literature abstract element, respectively. Under the literature publication type element, the precision, recall and WSS95 scores of the proposed algorithm were 90.77%, 98.05% and 40.14%, respectively. Under the combination of literature title, abstract and publication type elements, the WSS95 score of the proposed algorithm was 79.03%. In summary, the study proposes a robust performance of the literature screening (LS) algorithm based on the LDA topic model, and a similar LS system designed on this basis can effectively improve the efficiency of LS.
How communities form can depend on the geospatial location of people within a social network. Here, we investigated the implementation of the label propagation algorithm (LPA) and labelRankT community detection algori...
详细信息
ISBN:
(纸本)9781479903016
How communities form can depend on the geospatial location of people within a social network. Here, we investigated the implementation of the label propagation algorithm (LPA) and labelRankT community detection algorithm in Gephi, a graph visualization tool. We researched extending these community detection algorithms to incorporate the geospatial distance between nodes in a network as a limiting factor for the automatic detection of community formation.
The proliferation of data generation devices, including IoT and edge computing has led to the big data paradigm, which has considerably placed pressure on well-established relational databases during the last decade. ...
详细信息
ISBN:
(纸本)9781450398541
The proliferation of data generation devices, including IoT and edge computing has led to the big data paradigm, which has considerably placed pressure on well-established relational databases during the last decade. Researchers have proposed several alternative database models in order to model the captured data more efficiently. Among these approaches, graph databases seem the most promising candidate to supplement relational schemes. Within this study, a comparison is performed among Neo4j, one of the leading graph databases, and Apache Spark, a unified engine for distributed large-scale data processing environment, in terms of processing limits. More specifically, the two frameworks are compared on their capacity to execute community detection algorithms.
暂无评论