The frequency response function (FRF), defined as the ratio between the Fourier transform of the time-domain output and the Fourier transform of the time-domain input, is a common tool to analyze the relationships bet...
详细信息
ISBN:
(纸本)9780791885079
The frequency response function (FRF), defined as the ratio between the Fourier transform of the time-domain output and the Fourier transform of the time-domain input, is a common tool to analyze the relationships between inputs and outputs of a mechanical system. Learning the FRF for mechanical systems can facilitate system identification, condition-based health monitoring, and improve performance metrics, by providing an input-output model that describes the system dynamics. Existing FRF identification assumes there is a one-to-one mapping between each input frequency component and output frequency component. However, during dynamic operations, the FRF can present complex dependencies with frequency cross-correlations due to modulation effects, nonlinearities, and mechanical noise. Furthermore, existing FRFs assume linearity between inputoutput spectrums with varying mechanical loads, while in practice FRFs can depend on the operating conditions and show high nonlinearities. Outputs of existing neural networks are typically low-dimensional labels rather than real-time high-dimensional measurements. This paper proposes a vector regression method based on deep neural networks for the learning of runtime FRFs from measurement data under different operating conditions. More specifically, a neural network based on an encoder-decoder with a symmetric compression structure is proposed. The deep encoder-decoder network features simultaneous learning of the regression relationship between input and output embeddings, as well as a discriminative model for output spectrum classification under different operating conditions. The learning model is validated using experimental data from a high-pressure hydraulic test rig. The results show that the proposed model can learn the FRF between sensor measurements under different operating conditions with high accuracy and denoising capability. The learned FRF model provides an estimation for sensor measurements when a physical sensor
In this paper, we present a simple but novel framework to train a non-parallel many-to-many voice conversion (VC) model based on the encoder-decoder architecture. It is observed that an encoder-decoder text-to-speech ...
详细信息
ISBN:
(纸本)9781728176055
In this paper, we present a simple but novel framework to train a non-parallel many-to-many voice conversion (VC) model based on the encoder-decoder architecture. It is observed that an encoder-decoder text-to-speech (TTS) model and an encoder-decoder VC model have the same structure. Thus, we propose to pre-train a multi-speaker encoder-decoder TTS model and transfer knowledge from the TTS model to a VC model by (1) adopting the TTS acoustic decoder as the VC acoustic decoder, and (2) forcing the VC speech encoder to learn the same speaker-agnostic linguistic features from the TTS text encoder so as to achieve speaker disentanglement in the VC encoder output. We further control the conversion of the pitch contour from source speech to target speech, and condition the VC decoder on the converted pitch contour during inference. Subjective evaluation shows that our proposed model is able to handle VC between any speaker pairs in the training speech corpus of over 200 speakers with high naturalness and speaker similarity.
This paper presents a methodology to control greenhouse operations based on deep learning. The proposed methodology employs Artificial Intelligence algorithms working on edge devices, allowing the detection of anomali...
详细信息
ISBN:
(纸本)9789897585128
This paper presents a methodology to control greenhouse operations based on deep learning. The proposed methodology employs Artificial Intelligence algorithms working on edge devices, allowing the detection of anomalies in plants growth and greenhouse control equipment, in view of taking possible corrective actions. Edge Intelligence allows the greenhouse to work independently of the network to which it is connected. It also guarantees privacy to the processed data and contributes to fast and efficient decision-making. In this work, a Long-Short Time Memory encoder-decoder architecture is used for greenhouse anomaly detection. The best performance is achieved when using one LSTM layer and 64 LSTM units.
Movie reviews have always been a popular and enduring subject of interest among researchers. Sentiment analysis plays a significant role in this domain. The utilization of machine learning and natural language process...
详细信息
Movie reviews have always been a popular and enduring subject of interest among researchers. Sentiment analysis plays a significant role in this domain. The utilization of machine learning and natural language processing techniques can provide valuable insights into the emotional responses of audiences towards movies, as well as facilitate the appraisal of their reputation and market potential. This is achieved through the analysis of sentiment expressed in movie reviews. Furthermore, this approach is highly valuable in various application domains such as data mining, web mining, and social media analysis. This paper aims to conduct a comparative analysis by utilizing typical models based on machine learning and neural networks,along with the integration of natural language processing techniques. The IMDB database, which contains 50,000 reviews, will be used, and data preprocessing will be performed before applying these models. By comparing the accuracy of each model, insights regarding movie reviews can be derived.
Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure ...
详细信息
Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row/column separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and text modalities. Moreover, we achieve a higher precision in our experiments through providing additional textual features. Finally, we process the merging of these basic table grids in a self-regression manner. The corresponding merging results are learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97 . 11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing. Extensive experiments on other publicly available datasets further demonstrate the effectiveness of our proposed approach. (c) 2022 Elsevier Ltd. All rights reserved.
Partial differential equations (PDEs) play a fundamental role in modeling and simulating problems across a wide range of disciplines. Recent advances in deep learning have shown the great potential of physics-informed...
详细信息
Partial differential equations (PDEs) play a fundamental role in modeling and simulating problems across a wide range of disciplines. Recent advances in deep learning have shown the great potential of physics-informed neural networks (PINNs) to solve PDEs as a basis for data-driven modeling and inverse analysis. However, the majority of existing PINN methods, based on fully-connected NNs, pose intrinsic limitations to low-dimensional spatiotemporal parameterizations. Moreover, since the initial/boundary conditions (I/BCs) are softly imposed via penalty, the solution quality heavily relies on hyperparameter tuning. To this end, we propose the novel physics-informed convolutional-recurrent learning architectures (PhyCRNet and PhyCRNet-s) for solving PDEs without any labeled data. Specifically, an encoder-decoder convolutional long short-term memory network is proposed for low-dimensional spatial feature extraction and temporal evolution learning. The loss function is defined as the aggregated discretized PDE residuals, while the I/BCs are hard-encoded in the network to ensure forcible satisfaction (e.g., periodic boundary padding). The networks are further enhanced by autoregressive and residual connections that explicitly simulate time marching. The performance of our proposed methods has been assessed by solving three nonlinear PDEs (e.g., 2D Burgers' equations, the lambda-omega and FitzHugh Nagumo reaction-diffusion equations), and compared against the start-of-the-art baseline algorithms. The numerical results demonstrate the superiority of our proposed methodology in the context of solution accuracy, extrapolability and generalizability. (C) 2021 Elsevier B.V. All rights reserved.
The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content a...
详细信息
The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content and sentence and fail to exploit special characteristics of the remote sensing images. We introduce a novel recurrent attention and semantic gate (RASG) framework to facilitate the remote sensing image captioning in this article, which integrates competitive visual features and a recurrent attention mechanism to generate a better context vector for the images every time as well as enhances the representations of the current word state. Specifically, we first project each image into competitive visual features by taking the advantage of both static visual features and multiscale features. Then, a novel recurrent attention mechanism is developed to extract the high-level attentive maps from encoded features and nonvisual features, which can help the decoder recognize and focus on the effective information for understanding the complex content of the remote sensing images. Finally, the hidden states from the long short-term memory (LSTM) and other semantic references are incorporated into a semantic gate, which contributes to more comprehensive and precise semantic understanding. Comprehensive experiments on three widely used datasets, Sydney-Captions, UCM-Captions, and Remote Sensing Image Captioning Dataset, have demonstrated the superiority of the proposed RASG over a series of attentive models based on image captioning methods.
Aspect-based sentiment analysis (ABSA) is a fine-grained task that detects the sentiment polarities of particular aspect words in a sentence. With the rise of graph convolution networks (GCNs), current ABSA models mos...
详细信息
Aspect-based sentiment analysis (ABSA) is a fine-grained task that detects the sentiment polarities of particular aspect words in a sentence. With the rise of graph convolution networks (GCNs), current ABSA models mostly use graph-based methods. These methods construct a dependency tree for each sentence, and regard each word as a unique node. To be more specific, they conduct classification using aspect representations instead of sentence representations, and update them with GCNs. However, this kind of method relies too much on the quality of the dependency tree and may lose the global sentence information, which is also helpful for classification. To deal with these, we design a new ABSA model AG-VSR. Two kinds of representations are proposed to perform the final classification, Attention-assisted Graph-based Representation (A2GR) and Variational Sentence Representation (VSR). A2GR is produced by the GCN module, which inputs a dependency tree modified by the attention mechanism. Furthermore, VSR is sampled from a distribution learned by a VAE-like encoder-decoder structure. Extensive experiments show that our model AG-VSR achieves competitive results. Our code and data have been released in https://***/wangbing1416/VAGR.(c) 2022 Elsevier B.V. All rights reserved.
In the sintering process, it is difficult to obtain the key quality variables in real time, so there is lack of real-time information to guide the production process. Furthermore, these labeled data are too few, resul...
详细信息
In the sintering process, it is difficult to obtain the key quality variables in real time, so there is lack of real-time information to guide the production process. Furthermore, these labeled data are too few, resulting in poor performance of conventional soft sensor models. Therefore, a novel semi-supervised dynamic feature extraction framework (SS-DTFEE) based on sequence pre-training and fine-tuning is proposed in this paper. Firstly, based on the DTFEE model, the time features of the sequences are extended and extracted. Secondly, a novel weighted bidirectional LSTM unit (BiLSTM) is designed to extract the latent variables of original sequence data. Based on improved BiLSTM, an encoder-decoder model is designed as a pre-training model with unsupervised learning to obtain the hidden information in the process. Next, through model migration and fine-tuning strategy, the prediction performance of labeled datasets is improved. The proposed method is applied in the actual sintering process to estimate the FeO content, which shows a significant improvement of the prediction accuracy, compared to traditional methods.
暂无评论