Traditional recommendation methods based on matrix factorization techniques have yielded immense success because of their good scalability. However, they still face the problem of data sparsity, which may lead to a re...
详细信息
Traditional recommendation methods based on matrix factorization techniques have yielded immense success because of their good scalability. However, they still face the problem of data sparsity, which may lead to a reduction in recommendation performance. As it is hard to learn good latent features in the sparse user-item rating matrix. In recent years, deep learning is very appealing in learning effective representations. Its non-linear characteristics just remedy the shortcomings of matrix factorization. In this paper, a novel method deep variational matrix factorization recommendation (DVMF) is proposed for large scale sparse dataset. DVMF is based on latent factors to predict the ratings. The latent features of the users and items are respectively obtained through a deep nonlinear structure. Based on the latent factors and combined with matrix factorization method, the paper presents algorithm optimization method of DVMF. The experiments on three real-world datasets from different domains show that DVMF is able to provide higher accuracy than recommendation algorithms based on matrix factorization or deep learning individually on large scale sparse dataset. (C) 2019 Elsevier B.V. All rights reserved.
System logs are useful to understand the status of and detect faults in large scale networks. However, due to their diversity and volume of these logs, log analysis requires much time and effort. In this paper, we pro...
详细信息
System logs are useful to understand the status of and detect faults in large scale networks. However, due to their diversity and volume of these logs, log analysis requires much time and effort. In this paper, we propose a log event anomaly detection method for large-scale networks without pre-processing and feature extraction. The key idea is to embed a large amount of diverse data into hidden states by using latent variables. We evaluate our method with 12 months of system logs obtained from a nation-wide academic network in Japan. Through comparisons with Kleinberg's univariate burst detection and a traditional multivariate analysis (i.e., PCA), we demonstrate that our proposed method achieves 14.5% higher recall and 3% higher precision than PCA. A case study shows detected anomalies are effective information for troubleshooting of network system faults.
In the past few decades, measuring and recording the brain electrical activities using Electroencephalogram (EEG) has become a standout amongst the tools utilized for neurological disorders' diagnosis, especially ...
详细信息
In the past few decades, measuring and recording the brain electrical activities using Electroencephalogram (EEG) has become a standout amongst the tools utilized for neurological disorders' diagnosis, especially seizure detection. In this letter, a novel epileptic seizure detection system based on classifying raw EEG signals' recordings, eliminating the overhead of engineered feature extraction, is proposed. The system employs a mixing of unsupervised and supervised deep learning utilizing a one-dimensional convolutional variational autoencoder. To ascertain the robustness of the system against classifying unseen data, the evaluation of the proposed system is done using k-fold cross-validation. The classification results between normal and ictal cases have achieved a 100 accuracy while the classification results between the normal, inter-ictal and ictal cases accomplished a 99 overall accuracy which makes our system one of the most efficient among other state-of-the-art systems.
Lung cancer causes over one million deaths each year worldwide. DNA methylation is a well-defined epigenetics factor in genome data analyses for model training. In this article, we explore the applications of unsuperv...
详细信息
ISBN:
(纸本)9781538654880
Lung cancer causes over one million deaths each year worldwide. DNA methylation is a well-defined epigenetics factor in genome data analyses for model training. In this article, we explore the applications of unsupervised deep learning method, variational autoencoders, using DNA methylation data of lung cancer samples downloaded from the GDC TCGA project and perform further work with latent features. We show the logistic regression classifier on the encoded latent features accurately classifies cancer subtypes.
Collaborative filtering (CF) is one of the most widely applied models for recommender systems. Despite its success, CF-based methods suffer from rating sparsity and cold-start problem, which leads to poor quality of r...
详细信息
Collaborative filtering (CF) is one of the most widely applied models for recommender systems. Despite its success, CF-based methods suffer from rating sparsity and cold-start problem, which leads to poor quality of recommendations. Previous studies have gave great attention to construct hybrid methods, by incorporating side information and user rating. variational autoencoder (VAE) has been confirmed to be highly effective in CF task, due to its Bayesian nature and non-linearity. However, rating sparsity remains a great challenge to most VAE models, which leads to poor latent user/item representations. In addition, most existing VAE-based methods model either latent user factors or latent item factors, resulting in the incapacity to recommend items to a new user or suggest a new item to existing users. To address these problems, we design a novel deep hybrid framework for top-k recommendation, neural variational collaborative filtering (NVCF), and propose three NVCF-based instantiation. In generative process, the side information of user and item is incorporated to alleviate rating sparsity, for learning better latent user/item representations. In inference process, a Stochastic Gradient variational Bayes approach is employed to approximate the unmanageable distributions of latent user/item factors. Experiments performed on four public datasets have indicated our methods significantly outperform the state-of-the-art hybrid CF models and VAE-based methods.
This paper proposes a new approach for solving ill-posed nonlinear inverse problems. For ease of explanation of the proposed approach, we use the example of lung electrical impedance tomography (EIT), which is known t...
详细信息
This paper proposes a new approach for solving ill-posed nonlinear inverse problems. For ease of explanation of the proposed approach, we use the example of lung electrical impedance tomography (EIT), which is known to be a nonlinear and ill-posed inverse problem. Conventionally, penalty-based regularization methods have been used to deal with the ill-posed problem. However, experiences over the last three decades have shown methodological limitations in utilizing prior knowledge about tracking expected imaging features for medical diagnosis. The proposed method's paradigm is completely different from conventional approaches;the proposed reconstruction uses a variety of training data sets to generate a low dimensional manifold of approximate solutions, which allows conversion of the ill-posed problem to a well-posed one. variational autoencoder was used to produce a compact and dense representation for lung EIT images with a low dimensional latent space. Then, we learn a robust connection between the EIT data and the low dimensional latent data. Numerical simulations validate the effectiveness and feasibility of the proposed approach.
The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information...
详细信息
ISBN:
(纸本)9789813235533;9789813235526
The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information about the state of each tumor. Certain classes of deep neural network models are capable of learning a meaningful latent space. Such a latent space could be used to explore and generate hypothetical gene expression profiles under various types of molecular and genetic perturbation. For example, one might wish to use such a model to predict a tumor's response to specific therapies or to characterize complex gene expression activations existing in differential proportions in different tumors. variational autoencoders (VAEs) are a deep neural network approach capable of generating meaningful latent spaces for image and text data. In this work, we sought to determine the extent to which a VAE can be trained to model cancer gene expression, and whether or not such a VAE would capture biologically-relevant features. In the following report, we introduce a VAE trained on TCGA pan-cancer RNA-seq data, identify specific patterns in the VAE encoded features, and discuss potential merits of the approach. We name our method "Tybalt" after an instigative, cat-like character who sets a cascading chain of events in motion in Shakespeare's "Romeo and Juliet". From a systems biology perspective, Tybalt could one day aid in cancer stratification or predict specific activated expression patterns that would result from genetic changes or treatment effects.
Since the beginning of Neural Networks, different mechanisms have been required to provide a sufficient number of examples to avoid overfitting. Data augmentation, the most common one, is focused on the generation of ...
详细信息
ISBN:
(纸本)9789897583063
Since the beginning of Neural Networks, different mechanisms have been required to provide a sufficient number of examples to avoid overfitting. Data augmentation, the most common one, is focused on the generation of new instances performing different distortions in the real samples. Usually, these transformations are problem-dependent, and they result in a synthetic set of, likely, unseen examples. In this work, we have studied a generative model, based on the paradigm of encoder-decoder, that works directly in the data space, that is, with images. This model encodes the input in a latent space where different transformations will be applied. After completing this, we can reconstruct the latent vectors to get new samples. We have analysed various procedures according to the distortions that we could carry out, as well as the effectiveness of this process to improve the accuracy of different classification systems. To do this, we could use both the latent space and the original space after reconstructing the altered version of these vectors. Our results have shown that using this pipeline (encoding-altering-decoding) helps the generalisation of the classifiers that have been selected.
In the past, evolutionary algorithms (EAs) that use probabilistic modeling of the best solutions incorporated latent or hidden variables to the models as a more accurate way to represent the search distributions. Rece...
详细信息
ISBN:
(纸本)9781450356183
In the past, evolutionary algorithms (EAs) that use probabilistic modeling of the best solutions incorporated latent or hidden variables to the models as a more accurate way to represent the search distributions. Recently, a number of neural-network models that compute approximations of posterior (latent variable) distributions have been introduced. In this paper, we investigate the use of the variational autoencoder (VAE), a class of neural-network based generative model, for modeling and sampling search distributions as part of an estimation of distribution algorithm. We show that VAE can capture dependencies between decision variables and objectives. This feature is proven to improve the sampling capacity of model based EAs. Furthermore, we extend the original VAE model by adding a new, fitness-approximating network component. We show that it is possible to adapt the architecture of these models and we present evidence of how to extend VAEs to better fulfill the requirements of probabilistic modeling in EAs. While our results are not yet competitive with state of the art probabilistic-based optimizers, they represent a promising direction for the application of generative models within EDAs.
This paper describes a semi-supervised multichannel speech enhancement method that uses clean speech data for prior training. Although multichannel nonnegative matrix factorization (MNMF) and its constrained variant c...
详细信息
This paper describes a semi-supervised multichannel speech enhancement method that uses clean speech data for prior training. Although multichannel nonnegative matrix factorization (MNMF) and its constrained variant called independent low-rank matrix analysis (ILRMA) have successfully been used for unsupervised speech enhancement, the low-rank assumption on the power spectral densities (PSDs) of all sources (speech and noise) does not hold in reality. To solve this problem, we replace a low-rank speech model with a deep generative speech model, i.e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise. The deep speech model is trained from clean speech data in an unsupervised auto-encoding variational Bayesian manner. Given multichannel noisy speech spectra, the full-rank or rank-1 spatial covariance matrices and PSDs of speech and noise are estimated in an unsupervised maximum-likelihood manner. Experimental results showed that the full-rank version of the proposed method was significantly better than MNMF, ILRMA, and the rank-1 version. We confirmed that the initialization-sensitivity and local-optimum problems of MNMF with many spatial parameters can be solved by incorporating the precise speech model.
暂无评论