Identifying cancer molecular patterns robustly from large dimensional protein expression data not only has significant impacts on clinical ontology, but also presents a challenge for statistical learning. Principal co...
详细信息
ISBN:
(纸本)9783540884347
Identifying cancer molecular patterns robustly from large dimensional protein expression data not only has significant impacts on clinical ontology, but also presents a challenge for statistical learning. Principal component analysis (PCA) is a widely used feature selection algorithm and generally integrated with classic classification algorithms to conduct cancer molecular pattern discovery. However, its holistic mechanism prevents local data characteristics capture in feature selection. This may lead to the increase of misclassification rates and affect robustness of cancer molecular diagnostics. In this study, we develop a nonnegative principal component analysis (NPCA) algorithm and propose a NPCA-based SVM algorithm with sparse coding in the cancer molecular pattern analysis of proteomics data. We report leading classification results from this novel algorithm in predicting cancer molecular patterns of three benchmark proteomics datasets, under 100 trials of 50% hold-out and leave one out cross validations, by directly comparing its performances with those of the PCA-SVM, NMF-SVM, SVM, k-NN and PCA-LDA classification algorithms with respect to classification rates, sensitivities and specificities. Our algorithm also overcomes the overfitting problem in the SVM and PCA-SVM classifications and provides exceptional sensitivities and specificities.
Many researchers have recently used independent component analysis (ICA) to generate codebooks or features for a single channel of data. We examine the nature of these codebooks and identify when such features can be ...
详细信息
Many researchers have recently used independent component analysis (ICA) to generate codebooks or features for a single channel of data. We examine the nature of these codebooks and identify when such features can be used to extract independent components from a stationary scalar time series. This question is motivated by empirical work that suggests that single channel ICA can sometimes be used to separate out important components from a time series. Here we show that as long as the sources are reasonably spectrally disjoint then we can identify and approximately separate out individual sources. However, the linear nature of the separation equations means that when the sources have substantially overlapping spectra both identification using standard ICA and linear separation are no longer possible. (C) 2007 Elsevier B.V. All rights reserved.
Images of natural scenes contain a rich variety of visual patterns. To learn and recognize these patterns from natural images, it is necessary to construct statistical models for these patterns. In this review article...
详细信息
Images of natural scenes contain a rich variety of visual patterns. To learn and recognize these patterns from natural images, it is necessary to construct statistical models for these patterns. In this review article we describe three statistical principles for modeling image patterns: the sparse coding principle, the minimax entropy principle, and the meaningful alignment principle. We explain these three principles and their relationships in the context of modeling images as compositions of Gabor wavelets. These three principles correspond to three regimes of composition patterns of Gabor wavelets, and these three regimes are connected by changes in scale or resolution.
Generative and discriminative models are best defined by the structure of their graphical representation. This paper introduces such a definition and uses it to argue that, in some practical cases, generative models n...
详细信息
Generative and discriminative models are best defined by the structure of their graphical representation. This paper introduces such a definition and uses it to argue that, in some practical cases, generative models need to be formulated in order to be implemented within generate-and-test algorithms. This argument is inspired mainly by the ideas of the late Donald MacKay and by considerations of computational complexity. (C) 2006 Elsevier Inc. All rights reserved.
Computational models of primary visual cortex have demonstrated that principles of efficient coding and neuronal sparseness can explain the emergence of neurones with localised oriented receptive fields. Yet, existing...
详细信息
Computational models of primary visual cortex have demonstrated that principles of efficient coding and neuronal sparseness can explain the emergence of neurones with localised oriented receptive fields. Yet, existing models have failed to predict the diverse shapes of receptive fields that occur in nature. The existing models used a particular "soft" form of sparseness that limits average neuronal activity. Here we study models of efficient coding in a broader context by comparing soft and "hard" forms of neuronal sparseness. As a result of our analyses, we propose a novel network model for visual cortex. The model forms efficient visual representations in which the number of active neurones, rather than mean neuronal activity, is limited. This form of hard sparseness also economises cortical resources like synaptic memory and metabolic energy. Furthermore, our model accurately predicts the distribution of receptive field shapes found in the primary visual cortex of cat and monkey.
Animals routinely rely on their eyes to localize fixed and moving targets. Such a localization process might include prediction of future target location, recalling a sequence of previously visited places or, for the ...
详细信息
Animals routinely rely on their eyes to localize fixed and moving targets. Such a localization process might include prediction of future target location, recalling a sequence of previously visited places or, for the motor control circuit, actuating a successful movement. Typically, target localization is carried out by fusing images from two eyes, in the case of binocular vision, wherein the challenge is to have the images calibrated before fusion. in the field of machine vision, a typical problem of interest is to localize the position and orientation of a network of mobile cameras (sensor network) that are distributed in space and are simultaneously tracking a target. Inspired by the animal visual circuit, we study the problem of binocular image fusion for the purpose of. localizing an unknown target in space. Guided by the dynamics of "eye rotation," we introduce control strategies that could be used to build machines with multiple sensors. In particular, we address the problem of how a group of visual sensors can be optimally controlled in a formation. we also address how images from multiple sensors are encoded using a set of basis functions, choosing a "larger than minimum" number of basis functions so that the resulting code that represents the image is sparse. We address the problem of how a sparsely encoded visual data stream is internally represented by a pattern of neural activity. in addition to the control mechanism, the synaptic interaction between cells is also subjected to "adaptation" that enables the activity waves to respond with greater sensitivity to visual input. We study how the rat hippocampal place cells are used to form a cognitive map of the environment so that the animal's location can be determined from its place cell activity. Finally, we study the problem of "decoding" location of moving targets from the neural activity wave in the cortex.
This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent informa...
详细信息
This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent information. Basis extraction is performed by independent component analysis (ICA) applied to image patches culled from natural scenes. The study of the resulting coding units (coding filters) extracted from well-chosen categories of images shows that they adapt and respond selectively to discriminant features in natural scenes. Given this basis, we define global and local image signatures relying on the maximal activity of filters on the input image. Locally, the construction of the signature takes into account the spatial distribution of the maximal responses within the image. We propose a criterion to reduce the size of the space of representation for faster computation. The proposed approach is tested in the context of texture classification (111 classes), as well as natural scenes classification (11 categories, 2037 images). Using a common protocol, the other commonly used descriptors have at most 47.7% accuracy on average while our method obtains performances of up to 63.8%. We show that this advantage does not depend on the size of the signature and demonstrate the efficiency of the proposed criterion to select ICA filters and reduce the dimension.
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of compon...
详细信息
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements.
This article proposes a generative image model, which is called "primal sketch," following Marr's insight and terminology. This model combines two prominent classes of generative models, namely, sparse c...
详细信息
This article proposes a generative image model, which is called "primal sketch," following Marr's insight and terminology. This model combines two prominent classes of generative models, namely, sparse coding model and Markov random field model, for representing geometric structures and stochastic textures, respectively. Specifically, the image lattice is divided into structure domain and texture domain. The sparse coding model is used to represent image intensities on the structure domain, where edge and ridge segments are modeled by image coding functions with explicit geometric and photometric parameters. The edge and ridge segments form a sketch graph whose nodes are corners and junctions. The sketch graph is governed by a simple spatial prior model. The Markov random field model is used to summarize image intensities on the texture domain, where the texture patterns are characterized by feature statistics in the form of marginal histograms of responses from a set of linear filters. The Markov random fields in-paint the texture domain while interpolating the structure domain seamlessly. A sketch pursuit algorithm is proposed for model fitting. A number of experiments on real images are shown to demonstrate the model and the algorithm. (C) 2006 Elsevier Inc. All rights reserved.
We examined closely the cerebellar circuit model that we have proposed previously. The model granular layer generates a finite but very long sequence of active neuron populations without recurrence, which is able to r...
详细信息
We examined closely the cerebellar circuit model that we have proposed previously. The model granular layer generates a finite but very long sequence of active neuron populations without recurrence, which is able to represent the passage of time. For all the possible binary patterns fed into mossy fibres, the circuit generates the same number of different sequences of active neuron populations. Model Purkinje cells that receive parallel fiber inputs from neurons in the granular layer learn to stop eliciting spikes at the timing instructed by the arrival of signals from the inferior olive. These functional roles of the granular layer and Purkinje cells are regarded as a liquid state generator and readout neurons, respectively. Thus, the cerebellum that has been considered to date as a biological counterpart of a perceptron is reinterpreted to be a liquid state machine that possesses powerful information processing capability more than a perceptron. (c) 2007 Published by Elsevier Ltd.
暂无评论