biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key techniq...
详细信息
biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
Discovering the risks posed by software vulnerabilities is a challenge. Software vulnerabilities are often not listed and studies have shown 50.3% of the reports do not include the list of vulnerable libraries. Thus, ...
详细信息
biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein-protein interactions and gene expressions. However, for robust performance in recent...
详细信息
biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein-protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took similar to 71.1s to extract 494,872 biclusters. In the human PPI database of size 4085 x 4085, our method generates 1840 biclusters in similar to 48.6 s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEG-Genriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at (https://***/CMATERJU-BIOINFO/RUBic) for academic use only.
Microarray technology enables the monitoring of the expression patterns of a huge number of genes across different experimental conditions or time points simultaneously. biclustering of microarray data is an important...
详细信息
Microarray technology enables the monitoring of the expression patterns of a huge number of genes across different experimental conditions or time points simultaneously. biclustering of microarray data is an important technique to discover a group of genes that are co-regulated in a subset of experimental conditions. Traditional clustering algorithms find groups of genes/conditions over the complete feature space. Therefore they may fail to discover the local patterns where a subset of genes has similar behaviour over a subset of conditions. biclustering algorithms aim to discover such local patterns from the gene expression matrix, thus can be thought as simultaneous clustering of genes and conditions. In recent years, a large number of biclustering algorithms have been proposed in literature. In this article, a study has been made on various issues regarding the biclustering problem along with a comprehensive survey on available biclustering algorithms. Moreover, a survey on freely available biclustering software is also made.
A good number of biclustering algorithins have been proposed for grouping gene expression data. Many of them have adopted matrix norms to define the similarity score of a bicluster We shall show, that almost all matri...
详细信息
ISBN:
(纸本)0769523447
A good number of biclustering algorithins have been proposed for grouping gene expression data. Many of them have adopted matrix norms to define the similarity score of a bicluster We shall show, that almost all matrix metrics can be converted into vector norms while preserving the rank equivalence. Vector norms provide a much more efficient vehicle for biclustering analysis and computation. The advantages are two folds: ease of analysis and saving of computation.
暂无评论