Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view cluste...
详细信息
Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view clustering algorithms. However, the high-dimension and heterogeneity of multiomics data make great effects on the performance of these methods. In this paper, we propose to learn the informative latent representation based on autoencoder (AE) to naturally capture nonlinear omic features in lower dimensions, which is helpful for identifying the similarity of patients. Moreover, to take advantage of survival information or clinical information, a multi-omic survival analysis approach is embedded when integrating the similarity graph of heterogeneous data at the multi-omics level. Then, the clustering method is performed on the integrated similarity to generate subtype groups. In the experimental part, the effectiveness of the proposed framework is confirmed by evaluating five different multi-omics datasets, taken from The Cancer Genome Atlas. The results show that AEassisted multi-omics clustering method can identify clinically significant cancer subtypes.
Part-in-whole retrieval (PWR) is an important problem in the field of computer-aided design (CAD) with applications in design reuse, feature recognition and suppression and so on. Initially, we present a non parametri...
详细信息
Part-in-whole retrieval (PWR) is an important problem in the field of computer-aided design (CAD) with applications in design reuse, feature recognition and suppression and so on. Initially, we present a non parametric (and hence threshold independent) algorithm for segmenting CAD models (represented as meshes) which does not require any user intervention. As there is no labelled segmented dataset available for part clustering, we propose the use of autoencoders, one of the approaches used in deep networks along with hierarchical clustering. The features for autoencoder is derived from the Gauss map of the segments. The autoencoder network is then trained and validated using a hierarchical clustering-based approach that generates a dictionary of labels for each segment. PWR is then done by testing a query model with the network that retrieves models having the query as their subset. Comparison of the segmentation algorithm with the state-of-the-art approaches indicate that it performs better or on par. The algorithm was also tested for noisy models. Results of the part clustering and PWR are also presented for models from a CAD dataset along with the discussions. (C) 2019 Elsevier Ltd. All rights reserved.
Variations in commands executed as part of the attack process can be used to determine the behavioural patterns of IoT attacks. Existing approaches rely on the domain knowledge of security experts to identify the beha...
详细信息
Variations in commands executed as part of the attack process can be used to determine the behavioural patterns of IoT attacks. Existing approaches rely on the domain knowledge of security experts to identify the behavioural patterns, categorise and classify cyber attacks. We proposed an autoencoder (AE)-based feature construction approach to remove the dependency of manually correlating commands and generate an efficient representation by automatically learning the semantic similarity between input features extracted through commands data. We applied three clustering algorithms, i.e., K-means, Gaussian Mixture Models and Density-based spatial clustering of applications with noise, on our data set of AE features. We discussed the clustering arrangements for understanding the impact of changes in commands on behavioural patterns of attacks and how attacks are grouped in the same or different clusters. Evaluation of our feature construction approach shows that the clustering algorithm grouped attacks with more common features values compared to clustering with original features. Moreover, we performed a comparative analysis of two existing feature extraction approaches on our data set considering the type of analysis in the process, generalisability of applying features, coverage to the data set and clustering arrangements. We found that challenges identified in applying existing approaches can be addressed with our proposed approach and improving features with AE resulted in providing meaningful clustering interpretations. (c) 2021 Elsevier B.V. All rights reserved.
The idea of employing deep autoencoders (AEs) has been recently proposed to capture the end-to-end performance in the physical layer of communication systems. However, most of the current methods for applying AEs are ...
详细信息
The idea of employing deep autoencoders (AEs) has been recently proposed to capture the end-to-end performance in the physical layer of communication systems. However, most of the current methods for applying AEs are developed based on the assumption that there exists an explicit channel model for training that matches the actual channel model in the online transmission. The variation of the actual channel indeed imposes a major limitation on employing AE-based systems. In this paper, without relying on an explicit channel model, we propose an adaptive scheme to increase the reliability of an AE-based communication system over different channel conditions. Specifically, we partition channel coefficient values into sub-intervals, train an AE for each partition in the offline phase, and constitute a bank of AEs. Then, based on the actual channel condition in the online phase and the average block error rate (BLER), the optimal pair of encoder and decoder is selected for data transmission. To gain knowledge about the actual channel conditions, we assume a realistic scenario in which the instantaneous channel is not known, and propose to blindly estimate it at the Rx, i.e., without any pilot symbols. Our simulation results confirm the superiority of the proposed adaptive scheme over existing methods in terms of the average power consumption. For instance, when the target average BLER is equal to 10-4, our proposed algorithm with 5 pairs of AE can achieve a performance gain over 1.2 dB compared with a non-adaptive scheme.
Advances in electron microscopy and data processing techniques are leading to increasingly large and complete microscale connectomes. At the same time, advances in artificial neural networks have produced model system...
详细信息
Advances in electron microscopy and data processing techniques are leading to increasingly large and complete microscale connectomes. At the same time, advances in artificial neural networks have produced model systems that perform comparably rich computations with perfectly specified connectivity. This raises an exciting scientific opportunity for the study of both biological and artificial neural networks: to infer the underlying circuit function from the structure of its connectivity. A potential roadblock, however, is that - even with well constrained neural dynamics - there are in principle many different connectomes that could support a given computation. Here, we define a tractable setting in which the problem of inferring circuit function from circuit connectivity can be analyzed in detail: the function of input compression and reconstruction, in an autoencoder network with a single hidden layer. Here, in general there is substantial ambiguity in the weights that can produce the same circuit function, because largely arbitrary changes to input weights can be undone by applying the inverse modifications to the output weights. However, we use mathematical arguments and simulations to show that adding simple, biologically motivated regularization of connectivity resolves this ambiguity in an interesting way: weights are constrained such that the latent variable structure underlying the inputs can be extracted from the weights by using nonlinear dimensionality reduction methods. (C) 2021 Elsevier Ltd. All rights reserved.
This paper explores the application of autoencoder algorithms in Automated Fault Detection (AFD) for Heating, Ventilation, and Air Conditioning (HVAC) systems, specifically focusing on Fan Coil Units (FCUs). The begin...
详细信息
This paper explores the application of autoencoder algorithms in Automated Fault Detection (AFD) for Heating, Ventilation, and Air Conditioning (HVAC) systems, specifically focusing on Fan Coil Units (FCUs). The begins by reviewing the current state of Fault Detection and Diagnostics (FDD), emphasizing the limitations the potential of unsupervised learning techniques like autoencoders and transfer learning to fill these gaps. data from a full-scale building case study featuring five Fan Coil Units (FCUs), the research develops and uates autoencoder-based AFD models that models effectively compress multivariate inputs into a reduced space, enabling accurate and efficient fault detection. The paper makes two novel contributions: (1) It introduces a methodology to distinguish between equipment-level and system-level faults;and (2) It demonstrates generalizability of the approach across different types of FCUs through cross-testing and transfer learning. results indicate that autoencoders outperform other dimensionality reduction algorithms and separate predictors in fault detection accuracy and efficiency. The paper concludes by discussing the implications of these findings for future research and practical applications in building management.
This paper introduces an algorithm for the detection of change-points and the identification of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such data has become in...
详细信息
This paper introduces an algorithm for the detection of change-points and the identification of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such data has become increasingly important due to growing availability in many industrial fields. Labeling, sorting or filtering highly transient measurement data for training Condition-based Maintenance (CbM) models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements by simple thresholds or finding change-points based on changes in mean value and variation. But a robust diagnosis of a component within a component group for example, which has a complex non-linear correlation between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent measurement data, which could be used for training a CbM model, would emerge. Therefore, we introduce an algorithm that uses a recurrent neural network (RNN) based autoencoder (AE) which is iteratively trained on incoming data. The scoring function uses the reconstruction error and latent space information. A model of the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online and offline in conjunction with mechatronic systems.
The recent evolution of machine learning (ML) algorithms and the high level of expertise required to use them have fuelled the demand for non-experts solutions. The selection of an appropriate algorithm and the config...
详细信息
The recent evolution of machine learning (ML) algorithms and the high level of expertise required to use them have fuelled the demand for non-experts solutions. The selection of an appropriate algorithm and the configuration of its hyperparameters is among the most complicated tasks while applying ML to new problems. It necessitates well awareness and knowledge of ML algorithms. The algorithm selection problem (ASP) is defined as the process of identifying the algorithm (s) that can deliver top performance for a particular problem, task, and evaluation measure. In this context, meta-learning is one of the approaches to achieve this objective by using prior learning experiences to assist the learning process on unseen problems and tasks. As a data-driven approach, appropriate data characterization is of vital importance for the meta-learning. Nonetheless, the recent literature witness a variety of data characterization techniques including simple, statistical and information theory based measures. However, their quality still needs to be improved. In this paper, a new autoencoder-kNN (AeKNN) based meta-model with built-in latent features extraction is proposed. The approach is aimed to extract new characterizations of the data, with lower dimensionality but more significant and meaningful features. AeKNN internally uses a deep autoencoder as a latent features extractor from a set of existing meta-features induced from the dataset. From this new features vectors the computed distances are more significant, thus providing a way to accurately recommending top-performing pipelines for previously unseen datasets. In an application on a large-scale hyperparameters optimization task for 400 real world datasets with varying schemas as a meta-learning task, we show that AeKNN offers considerable improvements of the classical kNN as well as traditional meta-models in terms of performance.
Clustering is performed to partition samples into disjoint groups for facilitating the discovery of hidden patterns in the data. Many real-world applications involve various clustering methods, most of which only prod...
详细信息
Clustering is performed to partition samples into disjoint groups for facilitating the discovery of hidden patterns in the data. Many real-world applications involve various clustering methods, most of which only produce a single clustering. As a response to this issue, multiple clustering that aims to generate diverse and high-quality clustering, has emerged recently. This study proposes a novel autoencoder-like semi-nonnegative matrix factorization (NMF) multiple clustering (ASNMFMC) model that generates multiple non-redundant, high-quality clustering. The nonnegative property of the semi-NMF is utilized by the algorithm to enforce non-redundancy. Extensive experimental results demonstrate that the ASNMFMC is superior to the existing multiple clustering methods and can explore diverse high-quality clustering. (c) 2021 Elsevier Inc. All rights reserved.
Textual emotion detection is a challenge in computational linguistics and affective computing study as it involves the discovery of all associated emotions expressed within a given piece of text. It becomes an even mo...
详细信息
Textual emotion detection is a challenge in computational linguistics and affective computing study as it involves the discovery of all associated emotions expressed within a given piece of text. It becomes an even more difficult problem when applied to conversation transcripts, as we need to model the spoken utterances between speakers, keeping in mind the context of the entire conversation. In this paper, we propose a semisupervised multilabel method of predicting emotions from conversation transcripts. The corpus contains conversational quotes extracted from movies. A small number of them are annotated, while the rest are used for unsupervised training. We use the word2vec word-embedding method to build an emotion lexicon from the corpus and to embed the utterances into vector representations. A deep-learning autoencoder is then used to discover the underlying structure of the unsupervised data. We fine-tune the learned model on labeled training data, and measure its performance on a test set. The experiment result suggests that the method is effective and is only slightly behind human annotators.
暂无评论