In bioinformatics, Planted (l, d)-Motif finding is an important and challenging problem, which has many applications. Generally, it is to locate recurring patterns in the promoter regions of co-expressed or co-regulat...
详细信息
ISBN:
(纸本)9781467358743
In bioinformatics, Planted (l, d)-Motif finding is an important and challenging problem, which has many applications. Generally, it is to locate recurring patterns in the promoter regions of co-expressed or co-regulated genes. As we can't expect the pattern to be exact matching copies owing to biological mutations, the motif finding turns to be an NP-complete problem. By approximating the same in different aspects, scientists have provided many solutions in the literature. These solutions are either "exact" or "approximate". All the proposed exact solutions take exponential-time;they need more time to search for larger parameters l and d. The problems of bioinformatics seldom need the exact optimum solution;rather what they need is robust, fast and near optimal solutions. Therefore, it is impractical to use an exact algorithm to search for large parameters of motifs in real biological dataset. In this paper, we have adopted the features of the Particle Swarm Optimization (PSO) with k-nearest neighbor algorithm to solve the Planted (l, d)-Motif Finding Problem. PSO is a global approximation optimization technique and has wide applications. It finds the global best solution by simply adjusting the trajectory of each individual towards its own best location and towards the best particle of the swarm at each generation. We have performed some experiments on synthetic data by increasing number of sequences and the length of the sequences for different (l, d)-Motifs for the following data sets: general instances (10, 2), (11, 2), (12, 3), (15, 4), (16, 5), (18, 6), (20, 7) (30, 11) and (40,15). Challenging instances: (9, 2), (11, 3), (13, 4), (15, 5), (20, 7), (30, 11), (40, 15) and finally, we have applied our proposed method for real biological sequences. From the experimental results we observe that the proposed algorithm is more efficient and accurate compared to existing approximation algorithms and even it works better for larger motif instances.
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed group...
详细信息
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.
We present r..eal , a library that integrates the R statistical environment with Prolog. Due to R's functional programming affinity the interface introduced has a minimalistic feel. Programs utilising the library ...
详细信息
The proceedings contain 58 papers. The topics discussed include: protein model assessment using extended fuzzy decision tree with spatial neighborhood features;quantitative analysis of redundancy in evolution of devel...
ISBN:
(纸本)9781467311892
The proceedings contain 58 papers. The topics discussed include: protein model assessment using extended fuzzy decision tree with spatial neighborhood features;quantitative analysis of redundancy in evolution of developmental systems;developing a novel integrated model of p38 MAPK and glucocorticoid signaling pathways;idock: a multithreaded virtual screening tool for flexible ligand docking;multiple worlds model for motif discovery;data mining techniques for AFM- based tumor classification;robust integrated framework for effective feature selection and sample classification and its application to gene expression data analysis;towards predictive structure-based models of evolved drug resistance;learning to predict health status of geriatric patients from observational data;multicenter study desing in survival analysis;and hybrid feature selection method for biomedical datasets.
We propose an interactive framework where artists can design simple story characters using an agent-based interactive system. The characters' personalities are designed by modifying the dynamics of their internal ...
详细信息
We propose an interactive framework where artists can design simple story characters using an agent-based interactive system. The characters' personalities are designed by modifying the dynamics of their internal states while observing their interactions in a virtual environment. This can provide the artist with interesting characters that can be used for further story generation. We also argue that cartoon characters are particularly suited to this kind of generative approach.
We develop a theory of algebraic operations over linear grammars that makes it possible to combine simple "atomic" grammars operating on single sequences into complex, multi-dimensional grammars. We demonstr...
详细信息
The following topics are dealt with: computationalintelligence applications; CIASG; optimization; cyber security; demand side management; distributed energy resources; FACTS; power system planning; power markets; pow...
详细信息
The following topics are dealt with: computationalintelligence applications; CIASG; optimization; cyber security; demand side management; distributed energy resources; FACTS; power system planning; power markets; power system economics; power system control; plug-in vehicles; renewable energy; smart micro-grids; smart grid education; smart sensing; synchrophasors; wide area monitoring and power system protection.
Recovery software system operations from a state of extensive damage without human intervention is a challenging problem as it may need to be based on a different infrastructure from the one that the system was origin...
详细信息
Recovery software system operations from a state of extensive damage without human intervention is a challenging problem as it may need to be based on a different infrastructure from the one that the system was originally designed for and deployed on (i.e., computational and communication devices) and significant reorganization of system functionalities. In this paper, we introduce a bio-inspired approach for reconstructing nearly extinct complex software systems. Our approach is based on encoding a computational DNA (co-DNA) of a system and computational analogues of biological processes to enable the transmission of co-DNA over computational devices and, through it, the transformation of these devices into system cells that can realise chunks of the system functionality, and spread further its reconstruction process.
bioinformatics has been emerging as a new research dimension since the last century by combining computer science and biology techniques for the automatic analysis of biological sequence data. The volume of the biolog...
详细信息
bioinformatics has been emerging as a new research dimension since the last century by combining computer science and biology techniques for the automatic analysis of biological sequence data. The volume of the biological data gathered under different sequencing projects is increasing exponentially. These sequences contain extremely important information about genes, their structure and function. computational techniques which involve machine learning and pattern recognition are becoming very useful on bioinformatics data like DNA and protein. Protein classification into different groups could be used for knowing the structure or the function of unknown protein sequence. The process of classifying protein amino acid sequences into a family /superfamily is a very complex problem. However, from among other major issues in a protein classification, the critical one is an accurate representation of amino acid sequence during the feature extraction. In this work, we have proposed a distance-based feature-encoding method; the proposed technique has been tested with different classifiers, which have shown better results than the previously available techniques for superfamily classification of protein sequences. The maximum average classification accuracy obtained was 91.2%. The dataset used in the experiments was taken from the well known UniProtKB protein database.
暂无评论