Sharing electronic health record data is essential for advanced analysis, but may put sensitive information at risk. Several studies have attempted to address this risk using contextual embedding, but with many hospit...
详细信息
Sharing electronic health record data is essential for advanced analysis, but may put sensitive information at risk. Several studies have attempted to address this risk using contextual embedding, but with many hospitals involved, they are often inefficient and inflexible. Thus, we propose a bilingual autoencoder-based model to harmonize local embeddings in different spaces. Cross-hospital reconstruction of embeddings makes encoders map embeddings from hospitals to a shared space and align them spontaneously. We also suggest two-phase training to prevent distortion of embeddings during harmonization with hospitals that have biased information. In experiments, we used medical event sequences from the Medical Information Mart for Intensive Care-III dataset and simulated the situation of multiple hospitals. For evaluation, we measured the alignment of events from different hospitals and the prediction accuracy of a patient's diagnosis in the next admission in three scenarios in which local embeddings do not work. The proposed method efficiently harmonizes embeddings in different spaces, increases prediction accuracy, and gives flexibility to include new hospitals, so is superior to previous methods in most cases. It will be useful in predictive tasks to utilize distributed data while preserving private information. (c) 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BYNC-ND license (http://***/licenses/by-nc-nd/4.0/).
Anomaly detection for hydropower turbine unit is a requirement for the safety of hydropower system. An unsupervised anomaly detection method employing variational modal decomposition (VMD) and deep autoencoder is prop...
详细信息
Anomaly detection for hydropower turbine unit is a requirement for the safety of hydropower system. An unsupervised anomaly detection method employing variational modal decomposition (VMD) and deep autoencoder is proposed. VMD is employed to the data collected by multiple sensors to obtain the sub signal of each data. These sub signals in each time-period constitute two-dimensional data. The autoencoder based on convolutional neural network is used to complete unsupervised learning, and the reconstruction residual of autoencoder is used for anomaly detection. The experimental results show that the deep autoencoder can increase the interval between abnormal and normal data distribution, and VMD can effectively reduce the number of samples in the overlapping area. Compared with traditional autoencoder method, the proposed method improves the recall, precision and F1 scores by 0.140, 0.205 and 0.175, respectively. The proposed method achieves better anomaly detection performance than other methods. (C) 2021 The Author(s). Published by Elsevier Ltd.
The proliferation of the Internet of Things (IoT) has led to the emergence of crowdsensing applications, where a multitude of interconnected devices collaboratively collect and analyze data. Ensuring the authenticity ...
详细信息
The proliferation of the Internet of Things (IoT) has led to the emergence of crowdsensing applications, where a multitude of interconnected devices collaboratively collect and analyze data. Ensuring the authenticity and integrity of the data collected by these devices is crucial for reliable decision-making and maintaining trust in the system. Traditional authentication methods are often vulnerable to attacks or can be easily duplicated, posing challenges to securing crowdsensing applications. Besides, current solutions leveraging device behavior are mostly focused on device identification, which is a simpler task than authentication. To address these issues, an individual IoT device authentication framework based on hardware behavior fingerprinting and Transformer autoencoders is proposed in this work. To support the design, a threat model details the security problems faced when performing hardware-based authentication in IoT. This solution leverages the inherent imperfections and variations in IoT device hardware to differentiate between devices with identical specifications. By monitoring and analyzing the behavior of key hardware components, such as the CPU, GPU, RAM, and Storage on devices, unique fingerprints for each device are created. The performance samples are considered as time series data and used to train outlier detection transformer models, one per device and aiming to model its normal data distribution. Then, the framework is validated within a spectrum crowdsensing system leveraging Raspberry Pi devices. After a pool of experiments, the model from each device is able to individually authenticate it between the 45 devices employed for validation. An average True Positive Rate (TPR) of 0.74 +/- 0.13 and an average maximum False Positive Rate (FPR) of 0.06 +/- 0.09 demonstrate the effectiveness of this approach in enhancing authentication, security, and trust in crowdsensing applications.
The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep Convolutional autoencoder (CAE) and deep support vector data description (SVDD) have been universally us...
详细信息
The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep Convolutional autoencoder (CAE) and deep support vector data description (SVDD) have been universally used and have demonstrated significant success in detecting anomalies. However, the over-reconstruction ability of CAE network for anomalous data can easily lead to high false-negative rate in detecting anomalous data. On the other hand, the deep support vector data description (Deep SVDD) model has the drawback of feature collapse, which leads to a decrease in detection accuracy for anomalies. To address these problems, we propose the Improved autoencoder with LSTM module and Kullback-Leibler divergence (IAE-LSTM-KL) model in this article. An LSTM network is added after the encoder to memorize feature representations of normal data. Meanwhile, the phenomenon of feature collapse can also be mitigated by penalizing the featured input to SVDD module via KL divergence. The efficacy of the IAE-LSTM-KL model is validated through experiments on both synthetic and real-world datasets. Experimental results show that IAE-LSTM-KL model yields higher detection accuracy for anomalies. In addition, it is also found that the IAE-LSTM-KL model demonstrates enhanced robustness to contaminated outliers in the dataset.
In this paper, we propose a method of missing data recovery using an autoencoder for multi-channel signals. Recently, many deep neural network-based classification methods using multi-channel signals have been propose...
详细信息
ISBN:
(纸本)9789082797091
In this paper, we propose a method of missing data recovery using an autoencoder for multi-channel signals. Recently, many deep neural network-based classification methods using multi-channel signals have been proposed. The advantage of using multi-channel signals is that both frequency and spatial information can be used. However, systems using such signals are vulnerable to missing data because of a mismatch between the training and testing data. To minimize the mismatch, some techniques that include simulated missing data to the training data have been proposed. However, it is difficult to prepare missing data covering all possible mismatch situations. Therefore, we focus on using an autoencoder to recover the missing data without any assumptions of missing situations. In the case of multi-channel data inputted into an autoencoder, channel relationships are compressed into a low-dimensional hidden layer. Then, the autoencoder outputs data to reconstruct the input data from the layer. Therefore, when multi-channel input into the autoencoder has some missing channels, the output of the autoencoder is expected to recover some missing channel information by using the hidden layer. Since the autoencoder is regarded as a preprocessing of acoustic tasks, we evaluated the proposed method using an acoustic classification task. From the experimental results, we confirmed that the proposed method can recover missing data and improve the classification performance.
The internet is formed by thousands of interconnected Autonomous Systems (ASes). The Border Gateway Protocol (BGP) exchanges routing information between autonomous domains. Anomalies in BGP are exceptional (misconfigu...
详细信息
ISBN:
(纸本)9781665426329
The internet is formed by thousands of interconnected Autonomous Systems (ASes). The Border Gateway Protocol (BGP) exchanges routing information between autonomous domains. Anomalies in BGP are exceptional (misconfiguration, outage, and attacks). When they happen, the consequences can be widely hurtful. Anomaly detection in the internet routing section does not follow the normal behavior as an element. It is more important than anything else to detect internet routing anomalies using the BGP update messages quickly and accurately. This paper will propose a Short-Term Long Memory-based autoencoders network (LSTM-AE) method for internet routing anomaly detection and trains itself to repeat clean datasets effectively. These memory units are a type of artificial Recurrent Neural Networks (RNN) architecture used in deep learning, are convenient for historical study time series modeling and anomaly detection using LSTM memory units instead of ordinary neurons to build the coder. Using LSTM-AE detects anomalies in 11 events to four kinds of anomalies through the data collected as time series.
Maintenance of the machinery is a crucial task in industrial production sectors working with machinery. The most important aspect of maintenance is timing. Executing maintenances more frequently or sparsely than the n...
详细信息
ISBN:
(纸本)9781665450928
Maintenance of the machinery is a crucial task in industrial production sectors working with machinery. The most important aspect of maintenance is timing. Executing maintenances more frequently or sparsely than the necessary amount causes separate problems resulting with unnecessary expenses or halts in the production. To prevent these problems, a smart system to decide the timing of the maintenance must be established. In this study, we develop an auto-encoder extension of previously proposed deep convolutional network that is trained successfully on the modelling of electroencephalogram (EEG) signals with high performance. The auto-encoder extracts features from the vibration signals collected from the machinery. This method allows us to synthesize multi-channel vibration data which we use to classify the type of the failure that the machinery bearing is going to face, without expert field knowledge and with a high accuracy. The performance of the proposed network is tested on the publicly available Case Western Reserve University (CWRU) bearing dataset with the classification accuracy. Proposed network showed a better classification performance, allowed smaller bottleneck feature sizes and faster training times compared to the Normalized Sparse Auto-Encoder - Locally Connected Network (NSAE-LCN), which is one of the best performing networks on the same dataset.
Since scientific investigations have demonstrated that aberrant expression of miRNAs brings about the incidence of numerous intricate diseases, precise determination of miRNA-disease relationships greatly contributes ...
详细信息
Since scientific investigations have demonstrated that aberrant expression of miRNAs brings about the incidence of numerous intricate diseases, precise determination of miRNA-disease relationships greatly contributes to the advancement of human medical progress. To tackle the issue of inefficient conventional experimental approaches, numerous computational methods have been proposed to predict miRNA-disease association with enhanced accuracy. However, constructing miRNA-gene-disease heterogeneous network by incorporating gene information has been relatively under-explored in existing computational techniques. Accordingly, this paper puts forward a technique to predict miRNA-disease association by applying autoencoder and implementing random walk on miRNA-gene-disease heterogeneous network(AE-RW). Firstly, we integrate association information and similarities between miRNAs, genes, and diseases to construct a miRNA-genedisease heterogeneous network. Subsequently, we consolidate two network feature representations extracted independently via an autoencoder and a random walk procedure. Finally, deep neural network(DNN) are utilized to conduct association prediction. The experimental results demonstrate that the AE-RW model achieved an AUC of 0.9478 through 5-fold CV on the HMDD v3.2 dataset, outperforming the five most advanced existing models. Additionally, case studies were implemented for breast and lung cancer, further validated the superior predictive capabilities of our model.
Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deterioratin...
详细信息
Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deteriorating the performance of the traditional classification algorithms. The previous dealing methods often focus on resampling and algorithm adjustment, but ignore enhancing the ability of feature learning. In this study, we have proposed a novel algorithm for imbalanced data classification: Maximum Mean Discrepancy-Encouraging Convolutional autoencoder (MMD-CAE), from the perspective of feature learning. The algorithm adopts a two-phase target training process. The cross entropy loss is employed to calculate reconstruction loss of data, and the Maximum Mean Discrepancy (MMD) with intra-variance constraint is used to stimulate the feature discrepancy in bottleneck layer. By encouraging maximization of MMD between two-class samples, and mapping the original space to a higher dimension space via kernel skills, the features can be learned to form a more effective feature space. The proposed algorithm is tested on ten groups of samples with different imbalance ratios. The performance metrics of recall rate, F1 score, G-means and AUC verify that the proposed algorithm surpasses the existing state-of-the-art methods in this field, also with stronger generalization ability. This study could shed new lights on the related studies in terms of constituting more effective feature space via the proposed MMD with intra-variance constraint method, and the holistic MMD-CAE algorithm for imbalanced data classification.
Deep autoencoder-based methods are the majority of deep anomaly detection. An autoencoder learning on training data is assumed to produce higher reconstruction error for the anomalous samples than the normal samples a...
详细信息
Deep autoencoder-based methods are the majority of deep anomaly detection. An autoencoder learning on training data is assumed to produce higher reconstruction error for the anomalous samples than the normal samples and thus can distinguish anomalies from normal data. However, this assumption does not always hold in practice, especially in unsupervised anomaly detection, where the training data is anomaly contaminated. We observe that the autoencoder generalizes so well on the training data that it can reconstruct both the normal data and the anomalous data well, leading to poor anomaly detection performance. Besides, we find that anomaly detection performance is not stable when using reconstruction error as anomaly score, which is unacceptable in the unsupervised scenario. Because there are no labels to guide on selecting a proper model. To mitigate these drawbacks for autoencoder-based anomaly detection methods, we propose an Improved autoencoder for unsupervised Anomaly Detection (IAEAD). Specifically, we manipulate feature space to make normal data points closer using anomaly detection-based loss as guidance. Different from previous methods, by integrating the anomaly detection-based loss and autoencoder's reconstruction loss, IAEAD can jointly optimize for anomaly detection tasks and learn representations that preserve the local data structure to avoid feature distortion. Experiments on five image data sets empirically validate the effectiveness and stability of our method.
暂无评论