Android security incidents occurred frequently in recent years. To improve the accuracy and efficiency of large-scale Android malware detection, in this work, we propose a hybrid model based on deep autoencoder (DAE) ...
详细信息
Android security incidents occurred frequently in recent years. To improve the accuracy and efficiency of large-scale Android malware detection, in this work, we propose a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN). First, to improve the accuracy of malware detection, we reconstruct the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware. In the serial convolutional neural network architecture (CNN-S), we use Relu, a non-linear function, as the activation function to increase sparseness and dropout to prevent over-fitting. The convolutional layer and pooling layer are combined with the full-connection layer to enhance feature extraction capability. Under these conditions, CNN-S shows powerful ability in feature extraction and malware detection. Second, to reduce the training time, we use deep autoencoder as a pre-training method of CNN. With the combination, deep autoencoder and CNN model (DAE-CNN) can learn more flexible patterns in a short time. We conduct experiments on 10,000 benign apps and 13,000 malicious apps. CNN-S demonstrates a significant improvement compared with traditional machine learning methods in Android malware detection. In details, compared with SVM, the accuracy with the CNN-S model is improved by 5%, while the training time using DAE-CNN model is reduced by 83% compared with CNN-S model.
Single-cell RNA sequencing(scRNA-seq)technology has become an effective tool for high-throughout transcriptomic study,which circumvents the averaging artifacts corresponding to bulk RNA-seq technology,yielding new per...
详细信息
Single-cell RNA sequencing(scRNA-seq)technology has become an effective tool for high-throughout transcriptomic study,which circumvents the averaging artifacts corresponding to bulk RNA-seq technology,yielding new perspectives on the cellular diversity of potential superficially homogeneous *** various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material,the technical noise and biological variation are inevitably introduced into experimental process,resulting in high dropout events,which greatly hinder the downstream *** the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data,we propose a customized autoencoder based on a twopart-generalized-gamma distribution(AE-TPGG)for scRNAseq data analysis,which takes mixed discrete-continuous random variables of scRNA-seq data into account using a twopart model and utilizes the generalized gamma(GG)distribution,for fitting the positive and right-skewed continuous *** adopted autoencoder enables AE-TPGG to captures the inherent relationship between *** addition to the ability of achieving low-dimensional representation,the AETPGG model also provides a denoised imputation according to statistical characteristic of gene *** on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses.
Dynamic community detection is significant for controlling and capturing the temporal features of networks. The evolutionary clustering framework provides a temporal smoothness constraint for simultaneously maximizing...
详细信息
Dynamic community detection is significant for controlling and capturing the temporal features of networks. The evolutionary clustering framework provides a temporal smoothness constraint for simultaneously maximizing the clustering quality at the current time step and minimizing the clustering deviation between two successive time steps. Based on this framework, some existing methods, such as the evolutionary spectral clustering and evolutionary nonnegative matrix factorization, aim to look for the low-dimensional representation by mapping reconstruction. However, such reconstruction does not address the nonlinear characteristics of networks. In this paper, we propose a semi-supervised algorithm(sE-autoencoder) to overcome the effects of nonlinear property on the low-dimensional representation. Our proposed method extends the typical nonlinear reconstruction model to the dynamic network by constructing a temporal matrix. More specifically, the potential community characteristics and the previous clustering, as the prior information,are incorporated into the loss function as a regularization term. Experimental results on synthetic and realworld datasets demonstrate that the proposed method is effective and superior to other methods for dynamic community detection.
Due to the increasing cyber-attacks,various Intrusion Detection Systems(IDSs)have been proposed to identify network *** existing machine learning-based IDSs learn patterns from the features extracted from network traf...
详细信息
Due to the increasing cyber-attacks,various Intrusion Detection Systems(IDSs)have been proposed to identify network *** existing machine learning-based IDSs learn patterns from the features extracted from network traffic flows,and the deep learning-based approaches can learn data distribution features from the raw data to differentiate normal and anomalous network *** having been used in the real world widely,the above methods are vulnerable to some types of *** this paper,we propose a novel attack framework,Anti-Intrusion Detection autoencoder(AIDAE),to generate features to disable the *** the proposed framework,an encoder transforms features into a latent space,and multiple decoders reconstruct the continuous and discrete features,***,a generative adversarial network is used to learn the flexible prior distribution of the latent *** correlation between continuous and discrete features can be kept by using the proposed training *** conducted on NSL-KDD,UNSW-NB15,and CICIDS2017 datasets show that the generated features indeed degrade the detection performance of existing IDSs dramatically.
Feature selection is a dimensionality reduction technique that selects a subset of representative features from high-dimensional data by eliminating irrelevant and redundant features. Recently, feature selection combi...
详细信息
Feature selection is a dimensionality reduction technique that selects a subset of representative features from high-dimensional data by eliminating irrelevant and redundant features. Recently, feature selection combined with sparse learning has attracted significant attention due to its outstanding performance compared with traditional feature selection methods that ignores correlation between features. These works first map data onto a low-dimensional subspace and then select features by posing a sparsity constraint on the transformation matrix. However, they are restricted by design to linear data transformation, a potential drawback given that the underlying correlation structures of data are often non-linear. To leverage a more sophisticated embedding, we propose an autoencoder-based unsupervised feature selection approach that leverages a single-layer autoencoder for a joint framework of feature selection and manifold learning. More specifically, we enforce column sparsity on the weight matrix connecting the input layer and the hidden layer, as in previous work. Additionally, we include spectral graph analysis on the projected data into the learning process to achieve local data geometry preservation from the original data space to the low-dimensional feature space. Extensive experiments are conducted on image, audio, text, and biological data. The promising experimental results validate the superiority of the proposed method. (C) 2018 Elsevier B.V. All rights reserved.
Cross-modal retrieval has gained much attention in recent years. As the research mainstream, most of existing approaches learn projections for data from different modalities into a common space where data can be compa...
详细信息
Cross-modal retrieval has gained much attention in recent years. As the research mainstream, most of existing approaches learn projections for data from different modalities into a common space where data can be compared directly. However, they neglect the preservation of feature and semantic information, so they are unable to obtain satisfactory results as expected. In this paper, we propose a two-stage learning method to learn multi-modal mappings that project multi-modal data to low dimensional embeddings that preserve both feature and semantic information. In the first stage, we combine both low-level feature and high-level semantic information to learn feature-aware semantic code vectors. In the second stage, we use encoder-decoder paradigm to learn projections. The encoder projects feature vectors to code vectors, and the decoder projects code vectors back to feature vectors. The encoder-decoder paradigm guarantees the embeddings to preserve both feature and semantic information. An alternating minimization procedure is developed to solve the multi-modal semantic autoencoder optimization problem. Extensive experiments on three benchmark datasets demonstrate that the proposed method outperforms state-of-the-art cross-modal retrieval methods. (C) 2018 Elsevier B.V. All rights reserved.
This paper presents an IC implementation of on-chip learning neuromorphic autoencoder unit in a form of rate-based spiking neural network. With a current-mode signaling scheme embedded in a 500 x 500 6b SRAM-based mem...
详细信息
This paper presents an IC implementation of on-chip learning neuromorphic autoencoder unit in a form of rate-based spiking neural network. With a current-mode signaling scheme embedded in a 500 x 500 6b SRAM-based memory, the proposed architecture achieves simultaneous processing of multiplications and accumulations. In addition, a transposable memory read for both forward and backward propagations and a virtual lookup table are also proposed to perform an unsupervised learning of restricted Boltzmann machine. The IC is fabricated using 28-nm CMOS process and is verified in a three-layer network of encoder-decoder pair for training and recovery of images with two-dimensional 16 x 16 pixels. With a dataset of 50 digits, the IC shows a normalized root mean square error of 0.078. Measured energy efficiencies are 4.46 pJ per synaptic operation for inference and 19.26 pJ per synaptic weight update for learning, respectively. The learning performance is also estimated by simulations if the proposed hardware architecture is extended to apply to a batch training of 60 000 MNIST datasets.
As a powerful soft computing tool, fuzzy cognitive maps (FCMs) have been successfully employed for time-series modeling and forecasting problems. However, both the rapid time variation and the trends are still open pr...
详细信息
As a powerful soft computing tool, fuzzy cognitive maps (FCMs) have been successfully employed for time-series modeling and forecasting problems. However, both the rapid time variation and the trends are still open problems when processing univariate non-stationary time-series forecasting problems via FCM-based models. In this paper, we propose a time-series forecasting model by composing FCMs, gated recurrent unit network (GRU), and autoencoder network (AE). The model is termed GAE-FCM. Firstly, a scheme based on gated recurrent unit networks and autoencoder networks is designed to learn the potential representations and capture the long-term trend of non-stationary time series while decomposing these univariate time series into a group of multivariate feature vectors. Then, the obtained multivariate feature vectors are modeled as a fuzzy cognitive map in which quantifying its connection matrix is regarded as a convex optimization problem. Finally, the time-series trend is predicted by the optimized fuzzy cognitive map and corresponding modeling mechanism. The performance of the proposed model has been validated by comparison with several representative methods on five non-stationary time-series datasets.
To enhance the accuracy of identifying water sources in mine inrush incidents, this study, taking the Shengquan coal mine in Shandong, China, as a case study, proposes a novel water source identification model based o...
详细信息
To enhance the accuracy of identifying water sources in mine inrush incidents, this study, taking the Shengquan coal mine in Shandong, China, as a case study, proposes a novel water source identification model based on an improved autoencoder-the "Masked autoencoder-based Classifier" model. This model, through a unique autoencoder framework and a custom 'masked_loss' loss function, achieves semi-supervised learning and dimensionality reduction of groundwater sample ionic data. By configuring the hidden layers, the classifier component of the model directly receives data processed by the encoder component. This not only improves the model's performance but also optimizes its complexity. Through an evaluation of the model's fitting effectiveness, our model achieved an average accuracy of 88.8% across 20 runs, with precision, recall, F1-score, and MCC reaching 88.1%, 80.6%, 0.827, and 0.833, respectively, significantly outperforming other classic models. The model successfully identified the sources of three sets of inrush water samples, with a high number of successful runs and clear average probabilities. This work contributes not only to the field of mine water inrush source identification but also offers a new perspective for the broader field of machine learning.
The domain adaptation uses labeled source domain data to train a classifier to be used in the target domain with no or small amount of labeled data. Usually there exists discrepancy in terms of marginal and conditiona...
详细信息
The domain adaptation uses labeled source domain data to train a classifier to be used in the target domain with no or small amount of labeled data. Usually there exists discrepancy in terms of marginal and conditional distributions for both source and target domains,which is of critical importance to minimize the distribution discrepancy between domains. As a classical model in deep learning, the autoencoder is capable of realizing distribution matching and enhancing classification accuracy by extracting more abstract and effective features from data. A Domain adaptation network based on autoencoder(DANA) is proposed. The DANA structure consists of a couple of encoding layers: a feature extraction layer and a classification layer. For the feature extraction layer,the marginal distributions of source and target domains are matched by using the nonparametric maximum mean discrepancy measurement. For the classification layer, the softmax regression model is applied to encode the label information of source domains meanwhile to match the conditional distribution. Experimental results on ImageNet,Corel and Leaves datasets have shown the enhanced classification accuracy by our proposed algorithm compared with the classical methods.
暂无评论