Background: High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics...
详细信息
Background: High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics data could provide a broader insight into the mechanisms of cancer biology, helping researchers and clinicians to develop personalized therapies. Results: In the context of CAMDA 2017 Neuroblastoma Data Integration challenge, we explore the use of Integrative Network Fusion (INF), a bioinformatics framework combining a similarity network fusion with machine learning for the integration of multiple omics data. We apply the INF framework for the prediction of neuroblastoma patient outcome, integrating RNA-Seq, microarray and array comparative genomic hybridization data. We additionally explore the use of autoencoders as a method to integrate microarray expression and copy number data. Conclusions: The INF method is effective for the integration of multiple data sources providing compact feature signatures for patient classification with performances comparable to other methods. Latent space representation of the integrated data provided by the autoencoder approach gives promising results, both by improving classification on survival endpoints and by providing means to discover two groups of patients characterized by distinct overall survival (OS) curves.
A novel framework for the classification of lung nodules using computed tomography scans is proposed in this article. To get an accurate diagnosis of the detected lung nodules, the proposed framework integrates the fo...
详细信息
A novel framework for the classification of lung nodules using computed tomography scans is proposed in this article. To get an accurate diagnosis of the detected lung nodules, the proposed framework integrates the following 2 groups of features: ( I ) appearance features modeled using the higher order Markov Gibbs random field model that has the ability to describe the spatial inhomogeneities inside the lung nodule and (2) geometric features that describe the shape geometry of the lung nodules. The novelty of this article is to accurately model the appearance of the detected lung nodules using a new developed seventh-order Markov Gibbs random field model that has the ability to model the existing spatial inhomogeneities for both small and large detected lung nodules, in addition to the integration with the extracted geometric features. Finally, a deep autoencoder classifier is fed by the above 2 feature groups to distinguish between the malignant and benign nodules. To evaluate the proposed framework, we used the publicly available data from the Lung Image Database Consortium. We used a total of 727 nodules that were collected from 467 patients. The proposed system demonstrates the promise to be a valuable tool for the detection of lung cancer evidenced by achieving a nodule classification accuracy of 91.20%.
Circadian rhythms modulate many aspects of physiology. Knowledge of the molecular basis of these rhythms has exploded in the last 20 years. However, most of these data are from model organisms, and translation to clin...
详细信息
Circadian rhythms modulate many aspects of physiology. Knowledge of the molecular basis of these rhythms has exploded in the last 20 years. However, most of these data are from model organisms, and translation to clinical practice has been limited. Here, we present an approach to identify molecular rhythms in humans from thousands of unordered expression measurements. Our algorithm, cyclic ordering by periodic structure (CYCLOPS), uses evolutionary conservation and machine learning to identify elliptical structure in high-dimensional data. From this structure, CYCLOPS estimates the phase of each sample. We validated CYCLOPS using temporally ordered mouse and human data and demonstrated its consistency on human data from two independent research sites. We used this approach to identify rhythmic transcripts in human liver and lung, including hundreds of drug targets and disease genes. Importantly, for many genes, the circadian variation in expression exceeded variation from genetic and other environmental factors. We also analyzed hepatocellular carcinoma samples and show these solid tumors maintain circadian function but with aberrant output. Finally, to show how this method can catalyze medical translation, we show that dosage time can temporally segregate efficacy from dose-limiting toxicity of streptozocin, a chemotherapeutic drug. In sum, these data show the power of CYCLOPS and temporal reconstruction in bridging basic circadian research and clinical medicine.
We present a feature engineering pipeline for the construction of musical signal characteristics, to be used for the design of a supervised model for musical genre identification. The key idea is to extend the traditi...
详细信息
ISBN:
(纸本)9781538646595
We present a feature engineering pipeline for the construction of musical signal characteristics, to be used for the design of a supervised model for musical genre identification. The key idea is to extend the traditional two-step process of extraction and classification with additive stand-alone phases which are no longer organized in a waterfall scheme. The whole system is realized by traversing backtrack arrows and cycles between various stages. In order to give a compact and effective representation of the features, the standard early temporal integration is combined with other selection and extraction phases: on the one hand, the selection of the most meaningful characteristics based on information gain, and on the other hand, the inclusion of the nonlinear correlation between this subset of features, determined by an autoencoder. The results of the experiments conducted on GTZAN dataset reveal a noticeable contribution of this methodology towards the model's performance in classification task.
WeChat is one of social network applications that connects people widely. Huge data is generated when users conduct conversations, which can be used to enhance their lives. This paper will describe how this data is co...
详细信息
ISBN:
(纸本)9781538674482;9781538674475
WeChat is one of social network applications that connects people widely. Huge data is generated when users conduct conversations, which can be used to enhance their lives. This paper will describe how this data is collected, how to develop a personalized chatbot using personal conversation records. Our system will have a cognitive map based on the word2vec model, which is used to learn and store the relationship of each word that appears in the chatting records. Each word will be mapped to a continuous high dimensional vector space. Then we will adopt the sequence-to-sequence framework (seq2seq) to learn the chatting styles from all pairs of chatting sentences. Meanwhile, we will replace the traditional one-hot embedding layer with our word2vec embedding layer in the seq2seq model. Furthermore, we trained an autoencoder of seq2seq architecture to learn the vector representation of each sentence, then we can evaluate the cosine similarity between model generated response and the pre-existing response in test set, and we can also display the distance with principal component analysis (PCA) projection. As a result, our word2vec embedded seq2seq model significantly outperforms the one-hot embedded one.
In order to reduce data dimensions, autoencoders with neural networks have been proposed by Hinton et al. autoencoders are composed of input, one hidden, and output layers, which tune weights and biases by a back prop...
详细信息
ISBN:
(纸本)9781509006199
In order to reduce data dimensions, autoencoders with neural networks have been proposed by Hinton et al. autoencoders are composed of input, one hidden, and output layers, which tune weights and biases by a back propagation to minimize an error between inputs and outputs. The learned weights have input features, and can be applied to pretrainings of deep neural networks. However, these autoencoders have been developed for real-valued neural networks. In this study, we propose complex and quaternion autoencoders for complex and quaternion neural networks, respectively. In the complex-valued autoencoder, inputs, weights, biases and outputs of the real-valued autoencoder are extended to complex numbers. In the quaternion autoencoder, these parameters are extended to quaternion numbers. We show the learning abilities of the proposed methods using handwritten digit images. The results show that the proposed methods can recognize the images as the real-valued methods.
Recent work has shown that it is possible for two wearable devices worn by the same user to generate a common key for secure pairing by exploiting gait as a common secret. A key challenge for such device pairing lies ...
详细信息
ISBN:
(纸本)9781450359528
Recent work has shown that it is possible for two wearable devices worn by the same user to generate a common key for secure pairing by exploiting gait as a common secret. A key challenge for such device pairing lies in matching the bits of the keys generated by two independent devices despite the noisy on-board sensor measurements. We propose a novel machine learning framework that uses an autoencoder to help one device predict the sensor observations at another device and generate the key using the predicted sensor data. We prototype the proposed method and evaluate it using real subjects. Our results show that the proposed method achieves a 10% increase in bit agreement rate between two keys generated independently by two different wearable devices.
While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information. ...
详细信息
ISBN:
(纸本)9781538646595
While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information. Audio Word2Vec can be trained in an unsupervised way from an unlabeled corpus, except the word boundaries are needed. In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information. This is achieved by a segmental sequence-to-sequence autoencoder (SSAE), in which a segmentation gate trained with reinforcement learning is inserted in the encoder. Experiments on English, Czech, French and German show very good performance in both unsupervised spoken word segmentation and spoken term detection applications (significantly better than frame-based DTW).
The focus of this work is to detect the psychological emotional state of a human and also determine the personality trait of the person using speech samples. Cross-corpus technique has been employed for validation. Va...
详细信息
ISBN:
(纸本)9781509066216
The focus of this work is to detect the psychological emotional state of a human and also determine the personality trait of the person using speech samples. Cross-corpus technique has been employed for validation. Various Spectral features of speech along with Domain-Adaptive Least square Regression (DaLSR) and Auto-encoder classifier are considered. Voice samples from the publicly available database of Berlin and Enterface are used. The work is extended to classify the personality of a person into introvert or extrovert using detected psychological state. Using cross-corpus technique an improvement of 15% is obtained for psychological state classification compared to the existing reported work. The technique of personality classification is an initial attempt and needs to be improved for better recognition.
autoencoder has been successfully used as an unsupervised learning framework to learn some useful representations in deep learning tasks. Based on it, a wide variety of regularization techniques have been proposed suc...
详细信息
ISBN:
(纸本)9783319422978;9783319422961
autoencoder has been successfully used as an unsupervised learning framework to learn some useful representations in deep learning tasks. Based on it, a wide variety of regularization techniques have been proposed such as early stopping, weight decay and contraction. This paper presents a new training principle for autoencoder based on denoising autoencoder and dropout training method. We extend denoising autoencoder by both partial corruption of the input pattern and adding noise to its hidden units. This kind of noisy autoencoder can be stacked to initialize deep learning architectures. Moreover, we show that in the full noisy network the activations of hidden units are sparser. Furthermore, the method significantly improves learning accuracy when conducting classification experiments on benchmark data sets.
暂无评论