Lingras et at. proposed a rough cluster algorithm and successfully applied it to web mining. In this paper we analyze their algorithm with respect to its objective function, numerical stability, the stability of the c...
详细信息
Lingras et at. proposed a rough cluster algorithm and successfully applied it to web mining. In this paper we analyze their algorithm with respect to its objective function, numerical stability, the stability of the clusters and others. Based on this analysis a refined rough cluster algorithm is presented. The refined algorithm is applied to synthetic, forest and microarray gene expression data. (c) 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
It is commonly recognized that using the same dataset for training and testing the classifier introduces optimistic bias in estimating classifier performance. However, bias of the same kind may still exist even when i...
详细信息
ISBN:
(纸本)9781424496365
It is commonly recognized that using the same dataset for training and testing the classifier introduces optimistic bias in estimating classifier performance. However, bias of the same kind may still exist even when independent datasets are used for training and testing a classifier. This problem is especially important in the setting of high dimensional feature space and limited data. bioinformatics data is typically characterized by a tremendous amount of data per patient but from a limited number of patients. Often the entire data set is utilized in a "pre-training" stage during which the feature set is winnowed to a manageable number, and the parameters of the training algorithm are established. Subsequently the data is bifurcated into training and test sets;however, bias has already been introduced into the classifier development process. We investigate the significance of this bias by performing simulated gene expression experiments. We find that, for data with moderate intrinsic separability and modest sample size, any observed separation is due to selection bias introduced in the aforementioned pre-training process. For greater intrinsic separability, correct data hygiene, i.e., complete separation of development and validation data yields a positive result, but one far less impressive than that mistakenly obtained using incomplete data separation.
bioinformatics data tend to be highly dimensional in nature thus impose significant computational demands. To resolve limitations of conventional computing methods, several alternative high performance computing solut...
详细信息
ISBN:
(纸本)9781424492701
bioinformatics data tend to be highly dimensional in nature thus impose significant computational demands. To resolve limitations of conventional computing methods, several alternative high performance computing solutions have been proposed by scientists such as Graphical Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The latter have shown to be efficient and high in performance. In recent years, FPGAs have been benefiting from dynamic partial reconfiguration (DPR) feature for adding flexibility to alter specific regions within the chip. This work proposes combing the use of FPGAs and DPR to build a dynamic multi-classifier architecture that can be used in processing bioinformatics data. In bioinformatics, applying different classification algorithms to the same dataset is desirable in order to obtain comparable, more reliable and consensus decision, but it can consume long time when performed on conventional PC. The DPR implementation of two common classifiers, namely support vector machines (SVMs) and K-nearest neighbor (KNN) are combined together to form a multi-classifier FPGA architecture which can utilize specific region of the FPGA to work as either SVM or KNN classifier. This multi-classifier DPR implementation achieved at least similar to 8x reduction in reconfiguration time over the single non-DPR classifier implementation, and occupied less space and hardware resources than having both classifiers. The proposed architecture can be extended to work as an ensemble classifier.
暂无评论