With microarray data being dramatically accumulated, integrating data from related studies represents a natural way to increase sample size so that more reliable statistical analysis may be performed. However inherent...
详细信息
ISBN:
(纸本)9780769530697
With microarray data being dramatically accumulated, integrating data from related studies represents a natural way to increase sample size so that more reliable statistical analysis may be performed. However inherent variation among different microarray platforms makes the data integration not a trivial task. In this paper we present a simple and effective integration scheme, called normalized linear transform (NLT), to combine data from different microarray platforms. the NLT scheme is compared withthree other integration schemes for two tasks: classification analysis and gene marker selection. Our experiments demonstrate that the NLT scheme performs best in terms of classification accuracy under various classification settings, and leads to more biologically significant marker genes.
We study a new high dimensional data problem in this paper. In pattern classification, if many dimensions of two groups share a similar distribution, the classification error rates will be 50%. We have proposed a new ...
详细信息
ISBN:
(纸本)9781424409723
We study a new high dimensional data problem in this paper. In pattern classification, if many dimensions of two groups share a similar distribution, the classification error rates will be 50%. We have proposed a new clustering algorithm to deal withthis problem. Its basic idea is to confine the support of the optimization equation so that the data points in one group can only have small contribution to the estimated cluster center in another group. Experiments show that the proposed method is able to yield good results in eight real world data sets and its performance is better than 10 existing methods.
data preprocessing is important in machinelearning, datamining, and patternrecognition. In particular, selecting relevant features in high-dimensional data is often necessary to efficiently construct models that ac...
详细信息
Two known SVM-based approaches to ranking learning (ordinal regression estimation, supervised patternrecognition with ordered classes) are scrutinized as different generalizations of the classical principle of findin...
详细信息
ISBN:
(纸本)9781424409723
Two known SVM-based approaches to ranking learning (ordinal regression estimation, supervised patternrecognition with ordered classes) are scrutinized as different generalizations of the classical principle of finding the optimal discriminant hyperplane in a linear space. Easily verifiable natural conditions are found under which the training result obtained by the computationally much more attractive truncated technique is completely equivalent to the hypothetical strict solution. the numerical procedures are essentially simplified for both techniques.
Decision support system using datamining to find decision knowledge is called Intelligent Decision Support System (IDSS). Neural network as a datamining method commonly is used to find classification knowledge in ID...
详细信息
ISBN:
(纸本)9781424409723
Decision support system using datamining to find decision knowledge is called Intelligent Decision Support System (IDSS). Neural network as a datamining method commonly is used to find classification knowledge in IDSS. But the classic datamining based on neural network is short in dealing withthe blank value data or data withthe character of blurring and randomicity. Such data is called as imperfect data. In order to overcoming this shortcoming the method of combination of cloud model and neural network to find knowledge from imperfect data in IDSS is proposed. Firstly the cloud is used to depict the imperfect data by group decision. In the following, attribution generation based on cloud model or grey cloud model is used to generate the upper concept layer. In this step the cloud model depicting the imperfect data is classified into the concept layer that is proximal to itself according to distance between two cloud models. then classic neural network method is used to gain knowledge. the data is input into the neural network to training and gaining the classification knowledge. Lastly an experiment is given to verify the validity of the method.
the amount of ontologies and meta data available on the Web is constantly growing. the successful application of machinelearning techniques for learning of ontologies from textual data, i.e. mining for the Semantic W...
详细信息
ISBN:
(纸本)9783540762973
the amount of ontologies and meta data available on the Web is constantly growing. the successful application of machinelearning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machinelearning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machinelearning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.
In this paper, a new human face recognition method based on anti-symmetrical biorthogonal wavelet transformation (ASBWT) and eigenface was proposed. First the anti-symmetrical biorthogonal wavelet is chosen to degrade...
详细信息
ISBN:
(纸本)9781424409723
In this paper, a new human face recognition method based on anti-symmetrical biorthogonal wavelet transformation (ASBWT) and eigenface was proposed. First the anti-symmetrical biorthogonal wavelet is chosen to degrade the face image dimension, meanwhile complete the process of face location and segmentation;And then human face is reverted through the face space of Eigenface, the traditional average human face is replaced in the within-class scatter matrix. this within-class scatter matrix is used to calculate within-class and between-class distance proportion as a rule function, calculate the twice eigenface through Discrete Karhunen-Loeve Transform (DKLT), and use Singular Value Decomposition (SVD) method to calculate the eigenvector. Finally we compute the weights and classify the face images. the results show that the proposed method has higher recognition rate and more robust than the traditional eigenface analysis method.
One of most important algorithms for miningdata streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
One of most important algorithms for miningdata streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system fVFDT on top of VFDT and VFDTc. We make the following four contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n(2)). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but fVFDT just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, fVFDT's candidate split-test number decrease from O(n) to O(logn).4)lmprove the soft discretization method to be used in data streams mining, it overcomes the problem of noise data and improve the classification accuracy.
In this paper, a graph classification approach based on a multi-objective genetic algorithm is presented. the method consists in the learning of sets composed of synthetic graph prototypes which are used for a classif...
详细信息
ISBN:
(纸本)9783540729020
In this paper, a graph classification approach based on a multi-objective genetic algorithm is presented. the method consists in the learning of sets composed of synthetic graph prototypes which are used for a classification step. these learning graphs are generated by simultaneously maximizing the recognition rate while minimizing the confusion rate. Using such an approach the algorithm provides a range of solutions, the couples (confusion, recognition) which suit to the needs of the system. Experiments are performed on real data sets, representing 10 symbols. these tests demonstrate the interest to produce prototypes instead of finding representatives which simply belong to the data set.
For the characteristic of scale-free networks, containing a few nodes that have a very high degree and many with low degree,the high connectivity nodes play an important role of hubs in communication and networking. T...
详细信息
ISBN:
(纸本)9781424409723
For the characteristic of scale-free networks, containing a few nodes that have a very high degree and many with low degree,the high connectivity nodes play an important role of hubs in communication and networking. this characteristic can be exploited with designing efficient search algorithms. this paper proposes an algorithm to change each new node connecting to the network based on its high-degree-probability for equal-degree-probability after having constituted initial model by choosing high-degree-probability nodes. We use an association rule search strategy that utilizes high degree nodes in scale-free networks and costs scaling withthe size of the graph. We also demonstrate the utility of these CSCCNU network.. It can improve networks' robustness.
暂无评论