In the era of Big Data, more and more IoT devices are generating huge amounts of high-dimensional, real-time and dynamic data streams. As a result, there is a growing interest in how to cluster this data effectively a...
详细信息
ISBN:
(纸本)9781450399449
In the era of Big Data, more and more IoT devices are generating huge amounts of high-dimensional, real-time and dynamic data streams. As a result, there is a growing interest in how to cluster this data effectively and efficiently. Although a number of popular two-stage data stream clustering algorithms have been proposed, these algorithms still have some problems that are difficult to solve in the face of real-world data streams: poor handling of high-dimensional data streams and difficulty in effective dimensionality reduction;a slow clustering process that makes it difficult to meet real-time requirements;and too many manually defined parameters that make it difficult to cope with evolving data streams. This paper proposes an autoencoder-based fast online clustering algorithm for evolving data stream(AFOCEDS). The algorithm uses a stacked denoising autoencoder to reduce the dimensionality of the data, a multi-threaded approach to improve response speed, and a mechanism to automatically update parameters to cope with evolving data streams. The experiments on several realistic data streams show that AFOCEDS outperforms other algorithms in terms of effectiveness and speed.
Dimensionality reduction is a crucial first step for many unsupervised learning tasks including anomaly detection and clustering. autoencoder is a popular mechanism to accomplish dimensionality reduction. In order to ...
详细信息
Dimensionality reduction is a crucial first step for many unsupervised learning tasks including anomaly detection and clustering. autoencoder is a popular mechanism to accomplish dimensionality reduction. In order to make dimensionality reduction effective for high-dimensional data embedding nonlinear low-dimensional manifold, it is understood that some sort of geodesic distance metric should be used to discriminate the data samples. Inspired by the success of geodesic distance approximators such as ISOMAP, we propose to use a minimum spanning tree (MST), a graph-based algorithm, to approximate the local neighborhood structure and generate structure-preserving distances among data points. We use this MST-based distance metric to replace the euclidean distance metric in the embedding function of autoencoders and develop a new graph regularized autoencoder, which outperforms a wide range of alternative methods over 20 benchmark anomaly detection datasets. We further incorporate the MST regularizer into two generative adversarial networks and find that using the MST regularizer improves the performance of anomaly detection substantially for both generative adversarial networks. We also test our MST regularized autoencoder on two datasets in a clustering application and witness its superior performance as well.
Outlier detection technologies play an important role in various application domains. Most existing outlier detection algorithms have difficulty detecting outliers that are mixed within normal object regions or around...
详细信息
Outlier detection technologies play an important role in various application domains. Most existing outlier detection algorithms have difficulty detecting outliers that are mixed within normal object regions or around dense clusters. To address this problem, we propose a novel graph neural network structure called the graph autoencoder (GAE), which is capable of handling the task of outlier detection in Euclidean structured data. The GAE can perform feature value propagation in the form of a neural network that changes the distribution pattern of the original dataset, which can accurately detect outliers with low deviation. This method first converts the Euclidean structured dataset into a graph using the graph generation module, then inputs the dataset together with its corresponding graph into the GAE for training, and finally determines the top-n objects that are difficult to reconstruct in the output layer of the GAE as outliers. The results of comparing eight state-of-the-art algorithms on eight real-world datasets showed that GAE achieved the highest area under the receiver operating characteristic curve (ROC AUC) on six datasets. By comparing GAE with the autoencoder-based outlier detection algorithm, it was discovered that the proposed method improved the AUC by 16.9% on average for eight datasets. (C) 2022 Elsevier Inc. All rights reserved.
The capability of deep learning (DL) techniques for dealing with non-linear, dynamic and correlated data has paved the way for DL-based fault detection and diagnosis (FDD). Among them, autoencoders (AEs) have shown th...
详细信息
The capability of deep learning (DL) techniques for dealing with non-linear, dynamic and correlated data has paved the way for DL-based fault detection and diagnosis (FDD). Among them, autoencoders (AEs) have shown their potential to serve as the fault detection network. However, misclassifying faulty samples that share similar patterns to normal samples is a common drawback of AEs. In this work, a sourceaware autoencoder (SAAE) is proposed as an extension of AEs to incorporate faulty samples in the training stage. In SAAE, flexibility in tuning recall and precision trade-off, ability to detect unseen faults and applicability in imbalanced data sets are achieved. Bidirectional long short-term memory (BiLSTM) with skip connections SAAE is designed as the structure of the fault detection network. Further, a deep network with BiLSTM and residual neural network (ResNet) is proposed for the subsequent fault diagnosis step to avoid randomness imposed by the order of the input features. A framework for combining fault detection and fault diagnosis networks is also presented without the assumption of having a perfect fault detection network. A comprehensive comparison among relevant existing techniques in the literature and SAAE-ResNet is also conducted on the Tennessee-Eastman process, which shows the superiority of the proposed FDD method. (C) 2021 Elsevier B.V. All rights reserved.
Recently, with the advance in information technology, pure data-driven approaches such as machine learnings have been widely applied in status diagnosis. However, the accuracy of those predictions strongly relies on t...
详细信息
Recently, with the advance in information technology, pure data-driven approaches such as machine learnings have been widely applied in status diagnosis. However, the accuracy of those predictions strongly relies on the original data, which largely depends on the selected sensors and signal features. Furthermore, for unsupervised machine learning schemes, although it could avoid the concern of labeling in training, it lacks a quantified evaluation of the prediction results. These concerns significantly limit the effectiveness of modern machine learning and thus should be investigated. Meanwhile, ball bearings are fundamental key machine elements in rotating machinery and their condition monitoring should be critical for both quality control and longevity assessment. In this paper, by utilizing ball bearing failure diagnosis as the main theme, the flow of feature selection and evaluation, as well as the evaluation flow for multiple failure diagnosis, is developed for accessing the status of bearings in their imbalance, lubrication, and grease contamination levels based on unsupervised machine learning. The experimental results indicated that with proper feature selection, the failure identification could be more definite. Finally, a novel model based on the second norm to quantify the classification level of each cluster in hyperspace is proposed as the measure for unsupervised machine learning as the basis for performance evaluation and optimization of unsupervised machine learning schemes and should benefit related machine reliability evaluation studies and applications.
Depth map estimation from a single RGB image is a fundamental computer vision and image processing task for various applications. Deep learning based depth map estimation has improved prediction accuracy compared with...
详细信息
Depth map estimation from a single RGB image is a fundamental computer vision and image processing task for various applications. Deep learning based depth map estimation has improved prediction accuracy compared with traditional approaches by learning huge numbers of RGB-D images, but challenging issues remain for distorted and blurry reconstruction in object boundaries because the features are not enforced during training. This paper presents a multi-view attention autoencoder embedded in a deep neural network to emphasize self-representative features, which provide robust depth maps by simultaneously accentuating useful features and reducing redundant features to improve depth map estimation performance. Qualitative and quantitative experiments were conducted to verify the proposed network effectiveness, which can be utilized for three-dimensional scene reconstruction and understanding.
Despite its great success,deep learning severely suffers from robustness;i.e.,deep neural networks are very vulnerable to adversarial attacks,even the simplest *** by recent advances in brain science,we propose the de...
详细信息
Despite its great success,deep learning severely suffers from robustness;i.e.,deep neural networks are very vulnerable to adversarial attacks,even the simplest *** by recent advances in brain science,we propose the denoised internal models(DIM),a novel generative autoencoder-based model to tackle this *** the pipeline in the human brain for visual signal processing,DIM adopts a two-stage *** the first stage,DIM uses a denoiser to reduce the noise and the dimensions of inputs,reflecting the information pre-processing in the *** by the sparse coding of memory-related traces in the primary visual cortex,the second stage produces a set of internal models,one for each *** evaluate DIM over 42 adversarial attacks,showing that DIM effectively defenses against all the attacks and outperforms the SOTA on the overall robustness on the MNIST(Modified National Institute of Standards and Technology)dataset.
Network embedding plays a critical role in many applications. Node classification, link prediction, and network visualization are examples of such applications. Attributed network embedding aims to learn the low-dimen...
详细信息
Network embedding plays a critical role in many applications. Node classification, link prediction, and network visualization are examples of such applications. Attributed network embedding aims to learn the low-dimensional representation of network nodes by integrating network architecture and attribute information. The network architectures of many real-world applications are complex, and the relations between network architectures and their attributed nodes are opaque. Thus, shallow models fail to capture deep nonlinear information when an attributed network is embedded, leading to unreliable embedding. In the present paper, a Deep Attributed Network Embedding via Weisfeiler-Lehman and autoencoder (DANE-WLA) is proposed in order to capture high nonlinearity and preserve the many proximities in the network attribute information of nodes and structures. Weisfeiler-Lehman proximity schema was used to capture the node dependency between both node edges and node attributes based on information sequences. Then, a deep autoencoder was applied to invest complex nonlinear information. Extensive experiments were conducted on benchmark datasets to verify that DANE-WLA is computationally efficient for various tasks requiring network embedding. The experimental results show that our model outperforms the state-of-the-art network embedding models.
Semiconductor manufacturers use the wafer bin map recognition (WBMR) system to identify failure modes in processing. This study proposes an WBMR system embedded with three modules: data preprocessing, region classific...
详细信息
Semiconductor manufacturers use the wafer bin map recognition (WBMR) system to identify failure modes in processing. This study proposes an WBMR system embedded with three modules: data preprocessing, region classification, and systematic pattern recognition. After using a revised Jaccard index to separate random patterns from systematic patterns, we compare three data augmentation techniques, particularly autoencoder-based, to find the best augmented method that addresses any data imbalance problems between the defect classes. We propose an adaptive algorithm to determine the amount of generated data. We describe the two tools, t-distributed stochastic neighbor embedding (t-SNE) and earth mover's distances (EMD) we use to quantify and visualize the information content of the augmented dataset. Finally, we use an inception architecture of convolutional neural network (CNN) to improve the WBMR system's recognition accuracy. An empirical study of the semiconductor assembly manufacturer and a public dataset validate that our proposed WBMR system effectively recognizes different types of defective patterns.
Breast cancer(BC)is the most widely recognized cancer in women *** 2018,627,000 women had died of breast cancer(World Health Organization Report 2018).To diagnose BC,the evaluation of tumours is achieved by analysis o...
详细信息
Breast cancer(BC)is the most widely recognized cancer in women *** 2018,627,000 women had died of breast cancer(World Health Organization Report 2018).To diagnose BC,the evaluation of tumours is achieved by analysis of histological *** present,the Nottingham Bloom Richardson framework is the least expensive approach used to grade BC *** contemplate three elements,*** count,*** formation,and *** atypia,which is a laborious process that witness’s variations in expert’s ***,some algorithms have been proposed for the detection of mitotic cells,but nuclear atypia in breast cancer histopathology has not received much *** atypia analysis is performed not only to grade BC but also to provide critical information in the discrimination of normal breast,non-invasive breast(usual ductal hyperplasia,atypical ductal hyperplasia)and pre-invasive breast(ductal carcinoma in situ)and invasive breast *** proposed a deep-stacked multi-layer autoencoder ensemble with a softmax layer for the feature extraction and classification *** classification results show the value of the multilayer autoencoder model in the evaluation of nuclear *** proposed method has indicated promising results,making them more fit in breast cancer grading.
暂无评论