In recent years, internal attacks have posed a serious threat to the security of individuals, companies and even the country. Machine learning is currently a common method of insider threat detection. However, this te...
详细信息
In recent years, internal attacks have posed a serious threat to the security of individuals, companies and even the country. Machine learning is currently a common method of insider threat detection. However, this technology requires a series of complex feature engineering, which has certain limitations in practical applications. This paper comprehensively considers the user's business operation behavior data and internal psychological data, and establishes an internal threat detection model to analyze their potential associations. The main tasks are as follows: In order to improve the fine-grained features of heterogeneous behavior log data and accurately reflect user behavior attributes, a session-based full feature extraction method is proposed. In this method, combined with a variational autoencoder, a long and shortterm memory variational autoencoder (LVE) model is proposed. Taking into account the time characteristics of user behavior, a long and short-term memory network is used in the codec part, that is, input data, generate hidden variables, and then restore output data through hidden variables. The results show that this method improves the recall rate compared with other algorithms. Finally, the main work and improvement prospects are summarized.
Studying the outcomes of genetic perturbation based on single-cell RNA-seq data is crucial for understanding genetic regulation of cells. However, the high cost of cellular experiments and single-cell sequencing restr...
详细信息
Studying the outcomes of genetic perturbation based on single-cell RNA-seq data is crucial for understanding genetic regulation of cells. However, the high cost of cellular experiments and single-cell sequencing restrict us from measuring the full combination space of genetic perturbations and cell types. Consequently, a bunch of computational models have been proposed to predict unseen combinations based on existing data. Among them, generative models, e.g. variational autoencoder and diffusion models, have the superiority in capturing the perturbed data distribution, but lack a biologically understandable foundation for generalization. On the other side of the spectrum, Gene Regulation Networks or gene pathway knowledge have been exploited for more reasonable generalization enhancement. Unfortunately, they do not reach a balanced processing of the two data modalities, leading to a degraded fitting ability. Hence, we propose a dual-stream architecture. Before the information from two modalities are merged, the sequencing data are learned with a generative model while three types of knowledge data are comprehensively processed with graph networks and a masked transformer, enforcing a deep understanding of single-modality data, respectively. The benchmark results show an approximate 20% reduction in terms of mean squared error, proving the effectiveness of the model.
Recent studies indicate that differences in cognition among individuals may be partially attributed to unique brain wiring patterns. While functional connectivity (FC)-based fingerprinting has demonstrated high accura...
详细信息
Recent studies indicate that differences in cognition among individuals may be partially attributed to unique brain wiring patterns. While functional connectivity (FC)-based fingerprinting has demonstrated high accuracy in identifying adults, early studies on neonates suggest that individualized FC signatures are absent. We posit that individual uniqueness is present in neonatal FC data and that conventional linear models fail to capture the rapid developmental trajectories characteristic of newborn brains. To explore this hypothesis, we employed a deep generative model, known as a variational autoencoder (VAE), leveraging two extensive public datasets: one comprising resting-state functional MRI (rs-fMRI) scans from 100 adults and the other from 464 neonates. VAE models trained on rs-fMRI from both adults and newborns produced superior age prediction performance (with r between predicted- and actual age similar to 0.7) and individual identification accuracy (similar to 45 %) compared to models trained solely on adult or neonatal data. The VAE model also showed significantly higher individual identification accuracy than linear models (=10 similar to 30 %). Importantly, the VAE differentiated connections reflecting age-related changes from those indicative of individual uniqueness, a distinction not possible with linear models. Moreover, we derived 20 latent variables, each corresponding to distinct patterns of cortical functional network (CFNs). These CFNs varied in their representation of brain maturation and individual signatures;notably, certain CFNs that failed to capture neurodevelopmental traits, in fact, exhibited individual signatures. CFNs associated with neonatal neurodevelopment predominantly encompassed unimodal regions such as visual and sensorimotor areas, whereas those linked to individual uniqueness spanned multimodal and transmodal brain regions. The VAE's capacity to extract features from rs-fMRI data beyond the capabilities of linear models posit
Cavitation is a dominant failure mode that accelerates the wear and deterioration of pumps. Cavitation can lead to pump malfunction and, eventually, catastrophic failure of the whole system. Therefore, it is important...
详细信息
Cavitation is a dominant failure mode that accelerates the wear and deterioration of pumps. Cavitation can lead to pump malfunction and, eventually, catastrophic failure of the whole system. Therefore, it is important to avoid cavitation in the pump. This paper proposes a semi-supervised learning method that detects cavitation in centrifugal pumps. One-dimensional (1D) vibration signals are converted into two-dimensional (2D) images by the short time Fourier transform. The severity of the cavitation is determined using the variational autoencoder and Mahalanobis distance. The effectiveness of the proposed method is evaluated using the data collected from a 0.75 kW hydraulic pump testbed. It is confirmed that the proposed method can detect cavitation with different severities and help avoid the cavitation phenomenon.
Classification is among the core tasks in machine learning. Existing classification algorithms are typically based on the assumption of at least roughly balanced data classes. When performing tasks involving imbalance...
详细信息
Classification is among the core tasks in machine learning. Existing classification algorithms are typically based on the assumption of at least roughly balanced data classes. When performing tasks involving imbalanced data, such classifiers ignore the minority data in consideration of the overall accuracy. The performance of traditional classification algorithms based on the assumption of balanced data distribution is insufficient because the minority-class samples are often more important than others, such as positive samples, in disease diagnosis. In this study, we propose a cost-sensitive variational autoencoding classifier that combines data-level and algorithm-level methods to solve the problem of imbalanced data classification. Cost-sensitive factors are introduced to assign a high cost to the misclassification of minority data, which biases the classifier toward minority data. We also designed misclassification costs closely related to tasks by embedding domain knowledge. Experimental results show that the proposed method performed the classification of bulk amorphous materials well.
We propose a training method for a heterogeneous multi-agent system to improve the learning efficiency in sparse-reward environments. Although extensive research on multi-agent deep reinforcement learning are conducte...
详细信息
We propose a training method for a heterogeneous multi-agent system to improve the learning efficiency in sparse-reward environments. Although extensive research on multi-agent deep reinforcement learning are conducted actively, these studies often assume that all agents are homogeneous to share/utilize learning parameters in their networks. Unfortunately, this is not always the case in real-world applications where heterogeneous autonomous agents, i.e., those with different capabilities and perspectives, must properly cooperate and coordinate with each other. In our learning method, which is an extension of the shared experience actor-critic (SEAC) for a heterogeneous agent environment, agents are classified depending on their features (such as trajectories of the observations, actions and received rewards) using variational autoencoder, and share their experience among agents within each cluster to train their individual agents for improving the learning efficiency in a sparse-reward environment. Our experimental evaluation shows that the proposed method is capable of more efficient cooperative/coordinated behaviors than the baselines while remaining the advantages of SEAC.
Recently, network architecture search is gaining popularity. The neural network representation as a directed acyclic graph is considered for subsequent architecture optimization. Currently, most of the existing encode...
详细信息
Recently, network architecture search is gaining popularity. The neural network representation as a directed acyclic graph is considered for subsequent architecture optimization. Currently, most of the existing encoders rely only on the model layer properties and do not take into account the attributes of layers. This work proposes an algorithm for mapping a CNN network to a vector space considering the layer attributes, such as different dimensions of a particular layer. The proposed algorithm was compared with D-VAE and DVAE-EMB and showed less information loss caused by the mapping of a network to a vector space. As the results show, the performance of the model was shown after direct conversion to embedding and reverse conversion to architecture. The method allows more accurate neural network architecture mapping into a vector form, which will improve the search for the best architecture. The method implementation is publicly available at https://***/Turukmokto/GraphEmbedding-dev .
Accurate 1-day global total electron content (TEC) forecasting is essential for ionospheric monitoring and satellite communications. However, it faces challenges due to limited data and difficulty in modeling long-ter...
详细信息
Accurate 1-day global total electron content (TEC) forecasting is essential for ionospheric monitoring and satellite communications. However, it faces challenges due to limited data and difficulty in modeling long-term dependencies. This study develops a highly accurate model for 1-day global TEC forecasting. We utilized generative TEC data augmentation based on the International Global Navigation Satellite Service (IGS) data set from 1998 to 2017 to enhance the model's prediction ability. Our model takes the TEC sequence of the previous 2 days as input and predicts the global TEC value for each hourly step of the next day. We compared the performance of our model with 1-day predicted ionospheric products provided by both the Center for Orbit Determination in Europe (C1PG) and Beihang University (B1PG). We proposed a two-step framework: (a) a time series generative model to produce realistic synthetic TEC data for training, and (b) an auto-correlation-based transformer model designed to capture long-range dependencies in the TEC sequence. Experiments demonstrate that our model significantly improves 1-day forecast accuracy over prior approaches. On the 2018 benchmark data set, the global root mean squared error (RMSE) of our model is reduced to 1.17 TEC units (TECU), while the RMSE of the C1PG model is 2.07 TECU. Reliability is higher in middle and high latitudes but lower in low latitudes (RMSE < 2.5 TECU), indicating room for improvement. This study highlights the potential of using data augmentation and auto-correlation-based transformer models trained on synthetic data to achieve high-quality 1-day global TEC forecasting.
Successful applications of brain-computer interface (BCI) approaches to motor imagery (MI) are still limited. In this paper, we propose a classification framework for MI electroencephalogram (EEG) signals that combine...
详细信息
Successful applications of brain-computer interface (BCI) approaches to motor imagery (MI) are still limited. In this paper, we propose a classification framework for MI electroencephalogram (EEG) signals that combines a convolutional neural network (CNN) architecture with a variational autoencoder (VAE) for classification. The decoder of the VAE generates a Gaussian distribution, so it can be used to fit the Gaussian distribution of EEG signals. A new representation of input was developed by combining the time, frequency, and channel information from the EEG signal, and the CNN-VAE method was designed and optimized accordingly for this form of input. In this network, the classification of the extracted CNN features is performed via the deep network VAE. Our framework, with an average kappa value of 0.564, outperforms the best classification method in the literature for BCI Competition IV dataset 2b with a 3% improvement. Furthermore, using our own dataset, the CNN-VAE framework also yields the best performance for both three-electrode and five-electrode EEGs and achieves the best average kappa values 0.568 and 0.603, respectively. Our results show that the proposed CNN-VAE method raises performance to the current state of the art.
Collaborative filtering (CF) is a widely used method in recommendation systems. Linear models are still the mainstream of collaborative filtering research methods, but non-linear probabilistic models are beyond the li...
详细信息
Collaborative filtering (CF) is a widely used method in recommendation systems. Linear models are still the mainstream of collaborative filtering research methods, but non-linear probabilistic models are beyond the limit of linear model capacity. For example, variational autoencoders (VAEs) have been extensively used in CF, and have achieved excellent results. Aiming at the problem of the prior distribution for the latent codes of VAEs in traditional CF is too simple, which makes the implicit variable representations of users and items too poor. This paper proposes a variational autoencoder that uses a Gaussian mixture model for latent factors distribution for CF, GVAE-CF. On this basis, an optimization function suitable for GVAE-CF is proposed. In our experimental evaluation, we show that the recommendation performance of GVAE-CF outperforms the previously proposed VAE-based models on several popular benchmark datasets in terms of recall and normalized discounted cumulative gain (NDCG), thus proving the effectiveness of the algorithm.
暂无评论