Cross-modal retrieval has gained lots of attention in the era of the multimedia data explosion. Taking advantage of low storage cost and fast retrieval speed, hash learning-based methods become more and more popular i...
详细信息
Cross-modal retrieval has gained lots of attention in the era of the multimedia data explosion. Taking advantage of low storage cost and fast retrieval speed, hash learning-based methods become more and more popular in this field. The crucial bottlenecks of cross-modal retrieval are twofold: the heterogeneous gap in different modalities and the semantic gap among similar data with various modalities. To address these issues, we adopt self-supervised fashion to bridge the heterogeneous gap by generating the cohesive features of different instances. To mitigate the semantic gap, we use triplet sampling to optimize the semantic loss in inter-modal and intra-modal, which increase the discriminability of our approach. Experimental on two benchmark datasets show the efficiency and robustness of our method, and the extended experiments show the scalability.
An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed...
详细信息
An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effectively extract emotion-oriented features. In this paper, we propose a novel algorithm, an autoencoder with emotion embedding, to extract deep emotion features. Unlike many previous works, instance normalization, which is a common technique in the style transfer field, is introduced into our model rather than batch normalization. Furthermore, the emotion embedding path in our method can lead the autoencoder to efficiently learn a priori knowledge from the label. It can enable the model to distinguish which features are most related to human emotion. We concatenate the latent representation learned by the autoencoder and acoustic features obtained by the openSMILE toolkit. Finally, the concatenated feature vector is utilized for emotion classification. To improve the generalization of our method, a simple data augmentation approach is applied. Two publicly available and highly popular databases, IEMOCAP and EMODB, are chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems.
Image-based sensing of jellyfish is important as they can cause great damage to the fisheries and seaside facilities and need to be properly controlled. In this paper, we present a deep-learning-based technique to gen...
详细信息
Image-based sensing of jellyfish is important as they can cause great damage to the fisheries and seaside facilities and need to be properly controlled. In this paper, we present a deep-learning-based technique to generate a synthetic image of the jellyfish easily with autoencoder-combined generative adversarial networks. The proposed system can easily generate simple images with a smaller number of data sets compared with other generative networks. The generated output showed high similarity with the real-image data set. The application using a fully convolutional network and regression network to estimate the size of the jellyfish swarm was also demonstrated, and showed high accuracy during the estimation test.
In this study, we propose a novel autoencoder framework based on orthogonal projection constraint (OPC) for anomaly detection (AD) on both complex image and vector datasets. Orthogonal projection is useful to capture ...
详细信息
In this study, we propose a novel autoencoder framework based on orthogonal projection constraint (OPC) for anomaly detection (AD) on both complex image and vector datasets. Orthogonal projection is useful to capture the null subspace that consists of noisy information for AD, which is explicitly ignored in the existing approaches. The exploration of double subspaces, called normal space (NS) and abnormal space (AS) can improve the discriminative manifold information. Therefore, in this study, autoencoder framework based on the OPC learning method is proposed that combines the orthogonal subspace score and the reconstruction error score in the target tasks for AD. To the best of our knowledge, this is the first study that introduces an autoencoder-based model with two orthogonal subspaces for AD. Through the orthogonality, the anomaly-free data and abnormalnnosiy information are projected into the NS and the AS, respectively. Thus, it potentially addresses the problem of the distribution of generative model by combining the abilities of two subspaces that can appropriately learn the features and establish a strict boundaries around the normal data. For image datasets, we propose a convolutional autoencoder based on OPC. Additionally, the generalization and adaptability of the proposed method in AD was investigated using vector datasets by implementing a fully-connected layer-based OPC in the encoder-decoder structure. The effectiveness of the proposed framework for AD was evaluated through the comparison with state-of-the-art approaches. (c) 2021 Elsevier B.V. All rights reserved.
The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feed-forward neural networks, provides efficient unified learning solutions for the applications of reg...
详细信息
The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feed-forward neural networks, provides efficient unified learning solutions for the applications of regression and classification. Although, it provides promising performance and robustness and has been used for various applications, the single-layer architecture possibly lacks the effectiveness when applied for natural signals. In order to over come this shortcoming, the following work indicates a new architecture based on multilayer network framework. The significant contribution of this paper are as follows: 1) unlike existing multilayer ELM, in which hidden nodes are obtained randomly, in this paper all hidden layers with invertible functions are calculated by pulling the network output back and putting it into hidden layers. Thus, the feature learning is enriched by additional information, which results in better performance;2) in contrast to the existing multilayer network methods, which are usually efficient for classification applications, the proposed architecture is implemented for dimension reduction and image reconstruction;and 3) unlike other iterative learning-based deep networks (DL), the hidden layers of the proposed method are obtained via four steps. Therefore, it has much better learning efficiency than DL. Experimental results on 33 datasets indicate that, in comparison to the other existing dimension reduction techniques, the proposed method performs competitively better with fast training speeds.
Breaks or cracks in eggshells offer substantial food safety issues. Bacteria and viruses, in particular, are more likely to enter the egg through breaks and cracks, increasing the risk of food poisoning. Furthermore, ...
详细信息
Breaks or cracks in eggshells offer substantial food safety issues. Bacteria and viruses, in particular, are more likely to enter the egg through breaks and cracks, increasing the risk of food poisoning. Furthermore, deformations in the shell may compromise the integrity of the protective shell, exposing the egg to more external variables and causing it to lose freshness and decay faster. To reduce such hazards, this research created an innovative crack detection system based on an autoencoder (AE) that uses acoustic signals from eggshells. A system that creates an acoustic effect by hitting the eggshell without damaging it was designed, and these effects were recorded through a microphone. Acoustic signal data of size 1 x 1000 was fed into k nearest neighbor (kNN), decision tree (DT), and support vector machine (SVM) classifiers. AE was employed to reduce data size in order to accommodate the raw data's unique features. This AE model, which reduces data size, was used with many classifiers and was able to accurately distinguish between intact and cracked eggs. The built AE-based classifier model completed the classification procedure with 100% accuracy, including microcracks that are invisible to the naked eye.
Collaborative filtering (CF) is a widely used technique in recommender systems by automatically predicting the user's latent interests based on many users' historical rating data. To improve the performance of...
详细信息
Collaborative filtering (CF) is a widely used technique in recommender systems by automatically predicting the user's latent interests based on many users' historical rating data. To improve the performance of the CF-based recommender systems, users' rating data should be pre-processed to avoid noise and enhance data reliability. Many researchers studied anomaly detection to remove malicious noise caused by shilling attacks, but anomalies can still exist in non-attacked real user data, which is called natural noise, as the ratings of users can be impacted by unpredictable factors such as other users' ratings and anchoring bias. In this paper, we propose an autoencoder-based recommendation system for exploiting the ability of both anomaly detection and CF. The proposed system detects the natural noise in the rating data based on the reconstruction errors after training. By removing the detected natural noise, CF can predict the unrated ratings with noise-free data. Our experiments show that the proposed model showed better performance than the traditional method by reducing the error by up to 5% compared to the method that does not consider natural noise detection and reducing the error by up to 4% compared to the conventional rating classification based natural noise detection methods.
This paper investigates an autoencoder-based quantize-forward (QF) relay system that includes a source, a destination, and a relay, each equipped with multiple antennas. The existing phase quantization (PQ) algorithm ...
详细信息
This paper investigates an autoencoder-based quantize-forward (QF) relay system that includes a source, a destination, and a relay, each equipped with multiple antennas. The existing phase quantization (PQ) algorithm at the relay has limitations in capturing the amplitude differences of received signals, leading to performance saturation with increasing quantization bits. To address these limitations, we propose a novel relay algorithm, amplitude-phase quantization (APQ), which quantizes both the phase and the amplitude. Moreover, we introduce neural networks into the relay process, resulting in PQ with neural networks (PQNN) and APQ with neural networks (APQNN), which is expected to further improve system performance at the expense of additional computational load at the relay. We also propose a sub-message one-hot encoding method and a retraining approach for the worst-performing sub-message to reduce computational complexity and improve performance in autoencoder-based systems. Simulation results demonstrate that the autoencoder-based QF relay system, with various relay algorithms and the sub-message one-hot encoding method, achieves excellent performance with reduced memory usage at the relay and significantly reduced complexity at the source and destination.
We propose a new latent factor conditional asset pricing model. Like Kelly, Pruitt, and Su (KPS, 2019), our model allows for latent factors and factor exposures that depend on covariates such as asset characteristics....
详细信息
We propose a new latent factor conditional asset pricing model. Like Kelly, Pruitt, and Su (KPS, 2019), our model allows for latent factors and factor exposures that depend on covariates such as asset characteristics. But, unlike the linearity assumption of KPS, we model factor exposures as a flexible nonlinear function of covariates. Our model retrofits the workhorse unsupervised dimension reduction device from the machine learning literature - autoencoder neural networks - to incorporate information from covariates along with returns themselves. This delivers estimates of nonlinear conditional exposures and the associated latent factors. Furthermore, our machine learning framework imposes the economic restriction of no-arbitrage. Our autoencoder asset pricing model delivers out-of-sample pricing errors that are far smaller (and generally insignificant) compared to other leading factor models. (c) 2020 Elsevier B.V. All rights reserved.
Industrial processes usually exhibit great nonlinearity generated from the effects of complex mechanisms, system integrations and multiple working conditions. Although a variety of dictionary learning algorithms have ...
详细信息
Industrial processes usually exhibit great nonlinearity generated from the effects of complex mechanisms, system integrations and multiple working conditions. Although a variety of dictionary learning algorithms have been proposed in recent years for industrial process fault diagnosis, most of them only model the process data via a linear combination of a few dictionary atoms, which cannot effectively characterize the nonlinear relationships among variables and may lead to limited diagnosis performance. Recent improvements in multilayer neural networks, especially the autoencoders, offer opportunities to tackle the nonlinear problem. However, the overall limited availability of fault samples poses great challenges in achieving satisfactory performance. To address the mentioned issues simultaneously, the present study proposes an autoencoder Embedded Dictionary Learning approach (AEDL) for nonlinear industrial process fault diagnosis. First, an autoencoder is employed to learn a nonlinear mapping that maps the linearly inseparable industrial process data to a high-dimensional space, where a desired dictionary is learned according to the basic dictionary learning algorithm. Next, two supervised graphs, leveraging the priors of industrial process data, are introduced into the learning process to make the proposed approach robust to training samples. After obtaining the dictionary, the coding coefficients of the process data over the dictionary can be used for fault diagnosis via a simple classifier. As revealed from the encouraging experimental results on the Tennessee Eastman process, the developed approach outperforms several dictionary learning approaches and some other nonlinear fault diagnosis methods. (C) 2021 Published by Elsevier Ltd.
暂无评论