Instance selection plays a crucial role in improving the efficiency of machine learning models, especially when dealing with large datasets. Traditional instance selection methods often struggle to balance data reduct...
详细信息
Instance selection plays a crucial role in improving the efficiency of machine learning models, especially when dealing with large datasets. Traditional instance selection methods often struggle to balance data reduction with preserving essential information, particularly in high-dimensional and complex datasets. This paper introduces a novel approach, instance selection by combining clustering and autoencoders (CAIR), designed specifically for large-scale data. CAIR addresses key gaps in the literature by integrating clustering techniques to group similar data points and using autoencoders to reduce dimensionality while retaining critical boundary instances. Unlike conventional methods that focus primarily on either boundary or inner instances, CAIR effectively balances the removal of redundant data with the preservation of instances crucial for classification. Experimental results on 24 large datasets from the KEEL repository demonstrate that CAIR achieves superior data reduction while maintaining high classification accuracy compared to state-of-the-art methods, including k-nearest neighbor (KNN), edited nearest neighbors (ENN), DROP3, ATISA1, and RIS. CAIR fills a significant gap by providing an effective solution for large-scale data reduction without compromising performance.
The Internet of things(IoT)is an emerging paradigm that integrates devices and services to collect real-time data from surroundings and process the information at a very high speed to make a *** several advantages,the...
详细信息
The Internet of things(IoT)is an emerging paradigm that integrates devices and services to collect real-time data from surroundings and process the information at a very high speed to make a *** several advantages,the resource-constrained and heterogeneous nature of IoT networks makes them a favorite target for cybercriminals.A single successful attempt of network intrusion can compromise the complete IoT network which can lead to unauthorized access to the valuable information of consumers and *** overcome the security challenges of IoT networks,this article proposes a lightweight deep autoencoder(DAE)based cyberattack detection *** proposed approach learns the normal and anomalous data patterns to identify the various types of network *** most significant feature of the proposed technique is its lower complexity which is attained by reducing the number of *** optimally train the proposed DAE,a range of hyperparameters was determined through extensive experiments that ensure higher attack detection *** efficacy of the suggested framework is evaluated via two standard and open-source *** proposed DAE achieved the accuracies of 98.86%,and 98.26%for NSL-KDD,99.32%,and 98.79%for the UNSW-NB15 dataset in binary class and multi-class *** performance of the suggested attack detection framework is also compared with several state-of-the-art intrusion detection *** outcomes proved the promising performance of the proposed scheme for cyberattack detection in IoT networks.
Graph anomaly detection, aimed at identifying anomalous patterns that significantly differ from other nodes, has drawn widespread attention in recent years. Due to the complex topological structures and attribute info...
详细信息
ISBN:
(纸本)9789819755714;9789819755721
Graph anomaly detection, aimed at identifying anomalous patterns that significantly differ from other nodes, has drawn widespread attention in recent years. Due to the complex topological structures and attribute information inherent in graphs, conventional methods often struggle to effectively identify anomalies. Deep anomaly detection methods based on Graph Neural Networks (GNNs) have achieved significant success. However, they face the challenge of not only obtaining limited neighborhood information but over-smoothing. Over-smoothing is the phenomenon where the representations of nodes gradually become similar and flattened across multiple convolutional layers, thereby limiting the comprehensive learning of neighborhood information. Therefore, we propose a novel anomaly detection framework, TransGAD, to address these challenges. Inspired by the Graph Transformer, we introduce a Transformer-based autoencoder. Treating each node as a sequence and its neighborhood as tokens in the sequence, this autoencoder captures both local and global information. We incorporate cosine positional encoding and masking strategy to obtain more informative node representations and leverage reconstruction error for improved anomaly detection. Experimental results on seven datasets demonstrate that our approach outperforms the state-of-the-art methods.
Images captured with wrong exposure conditions inevitably produce unsatisfactory visual effects. Thus, multiple exposure correction has drawn much attention, which should correct for degraded images due to various wro...
详细信息
Images captured with wrong exposure conditions inevitably produce unsatisfactory visual effects. Thus, multiple exposure correction has drawn much attention, which should correct for degraded images due to various wrong exposure conditions. However, the problem of handling the different nature of underexposed and overexposed images makes this task challenging. In this work, we introduce the novel multiple exposure correction transformer, named MECFormer, to tackle this problem. MECFormer consists of autoencoder, encoder, and dual-path aggregation decoder. First, the autoencoder extracts multi-scale exposure features representing the level of input exposure. Second, the encoder embeds input images into multi-scale image features. Third, the dual-path aggregation decoder sequentially restores exposures by effectively aggregating multi-scale exposure features and image features. MECFormer achieves the state-of-the art performance on two multi-exposure correction datasets. Also, we provide extensive ablation studies to show the effectiveness of the proposed components.
Zero-shot learning aims to learn a visual classifier for a category which has no training samples leveraging its semantic information and its relationship to other categories. It is common, yet vital, in practical vis...
详细信息
Zero-shot learning aims to learn a visual classifier for a category which has no training samples leveraging its semantic information and its relationship to other categories. It is common, yet vital, in practical visual scenarios, and particularly prominent in the uncharted ocean field. Phytoplankton plays an important part in the marine ecological environment. It is common to encounter the zero-shot recognition problem during the in situ observation. Therefore, we propose a dual autoencoder model, which contains two similar encoder-decoder structures, to tackle the zero-shot recognition problem. The first one is used for the projection from the visual feature space to a latent space, then to the semantic space. Inversely, the second one projects from the semantic space to another latent space, then back to the visual feature space. This structure guarantees the projection from the visual feature space to the semantic space to be more effective, through the stable mutual mapping. Experimental results on four benchmarks demonstrate that the proposed dual autoencoder model achieves competitive performance compared with six recent state-of-the-art methods. Furthermore, we apply our algorithm to phytoplankton classification. We manually annotated phytoplankton attributes to develop a practical dataset for this real and special domain application, i.e., Zero-shot learning dataset for PHYtoplankton (ZeroPHY). Experiment results show that our method achieves the best performance on this real-world application.
As regional economy develops rapidly, the ecological environment exposes a series of problems, with the characteristics of ecological resistance surfaces being obviously differentiated. In the context of the increasin...
详细信息
As regional economy develops rapidly, the ecological environment exposes a series of problems, with the characteristics of ecological resistance surfaces being obviously differentiated. In the context of the increasing advance in computer science, this research introduced a deep learning algorithm, autoencoder, to analyze the ecological resistance surfaces over a 20-year period in 23 counties and districts in the Three Gorges Reservoir Area. The main results are as follows: (1) the overall resistance surfaces were at a low level, with high values concentrated in the core urban areas and along the Yangtze River. (2) The spatial distributions of the autoencoder results were smoother and more focused compared to the results of analytic hierarchy process. (3) The XGBoost detection results indicated land cover types as the highest contributing factor to the resistance. (4) Habitat quality analysis revealed a high degree of spatial heterogeneity, with the northeastern regions exhibiting more favorable conditions and the densely populated urban centers in the southwest displaying degraded quality. (5) The regionalization analysis outlined key conservation area, key restoration area and moderate development area, proposing a strategic framework of "One Axis, Two Zones and Four Protection Screens" for guiding the long-term ecological development in the research region.
In the satellite operation domain, the accurate pre-diction of the Remaining Useful Life (RUL) of satellite subsystems and components is fundamental for an effective management of the mission. The accuracy of the RUL ...
详细信息
In this paper, we analyze the capabilities of several multiscale convolutional autoencoder architectures for reduced-order modeling of two-dimensional unsteady turbulent flow over a cylinder and the collapse of the wa...
详细信息
In this paper, we analyze the capabilities of several multiscale convolutional autoencoder architectures for reduced-order modeling of two-dimensional unsteady turbulent flow over a cylinder and the collapse of the water column. The results demonstrate the significance of multiscale convolution design for precision. Multiscale convolution, on the other hand, leads to the creation of millions of training parameters that require a large amount of memory. This results in an increase in the computational cost of the system. As a solution to this problem, we propose using modified convolution to reduce the number of training parameters within the model. As far as accuracy and computational efficiency are concerned, separable convolution yields the most efficient results in terms of accuracy. Moreover, the encoder component of the architecture is responsible for encoding high-dimensional data into a latent space with a low dimension. The latent spaces are transferred to the recurrent neural-type network and decoder section for the temporal evolution of the latent spaces and reconstruction of the flow field. In addition, the results demonstrate that GRU has fewer parameters than LSTM while maintaining the same accuracy.
In this paper, a neural network model is presented to identify fraudulent visits to a website, which are significantly different from visits of human users. Such unusual visits are most often made by automated softwar...
详细信息
ISBN:
(纸本)9783031234798;9783031234804
In this paper, a neural network model is presented to identify fraudulent visits to a website, which are significantly different from visits of human users. Such unusual visits are most often made by automated software, i.e. bots. Bots are used to perform advertising scams or to do scraping, i.e., automatic scanning of website content frequently not in line with the intentions of website authors. The model proposed in this paper works on data extracted directly from a web browser when a user or a bot visits a website. This data is acquired by way of using JavaScript. When bots appear on the website, collected parameter values are significantly different from the values collected during usual visits made by human website users. However, just knowing what values these parameters have is simply not enough to identify bots as they are being constantly modified and new values that have not yet been accounted for appear. Thus, it is not possible to know all the data generated by bots. Therefore, this paper proposes a neural network with an autoencoder structure that makes it possible to detect deviations in parameter values that depart from the learned data from usual users. This enables detection of anomalies, i.e., data generated by bots. The effectiveness of the presented model is demonstrated on authentic data extracted from several online stores.
Image anomaly detection (AD) is widely researched on many occasions in computer vision tasks. High -dimensional data, such as image data, with noise and complex background is still challenging to detect anomalies unde...
详细信息
Image anomaly detection (AD) is widely researched on many occasions in computer vision tasks. High -dimensional data, such as image data, with noise and complex background is still challenging to detect anomalies under the situation that imbalanced or incomplete data are available. Some deep learning methods can be trained in an unsupervised way and map the original input into low-dimensional manifolds to predict larger differences in anomalies according to normal ones by dimension reduction. However, training a single low-dimension latent space is limited to present the low-dimensional features due to the fact that the noise and irreverent features are mapped into this space, resulting in that the manifolds are not discriminative for detecting anomalies. To address this problem, a new autoencoder framework is proposed in this study with two trainable mutually orthogonal complementary subspaces in the latent space, by latent subspace projection (LSP) mechanism, which is named as LSP-CAE. Specifically, latent subspace projection is used to train the latent image subspace (LIS) and the latent kernel subspace (LKS) in the latent space of the autoencoder-like model respectively, which can enhance learning power of different features from the input instance. The features of normal data are projected into the latent image subspace, while the latent kernel subspace is trained to extract the irrelevant information according to normal features by end-to-end training. To verify the generality and effectiveness of the proposed method, we replace the convolutional network with the fully-connected network contucted in the real-world medical datasets. The anomaly score based on projection norms in two subspace is used to evaluate the anomalies in the testing. Consequently, our proposed method can achieve the best performance according to four public datasets in comparison of the state-of-the-art methods.
暂无评论