Conversational systems are the perfect examples of human-machine interactions. The conversational agents while interacting with humans lack the ability to express emotions and behave inconsistently, making the convers...
详细信息
ISBN:
(纸本)9781728169262
Conversational systems are the perfect examples of human-machine interactions. The conversational agents while interacting with humans lack the ability to express emotions and behave inconsistently, making the conversations boring and non-interactive. In this work, we propose the task of persona aware emotional response generation in which the system can generate specific and consistent responses in accordance to the provided personality information and the conversational history. To make the responses interactive and interesting we intend to infuse the emotions in the responses that help in making the responses more human-like. We propose a persona aware attention framework employing an encoder-decoder approach. We investigate different ways to include the desired emotions in the responses. Experimental results on the PersonaChat dataset shows that our proposed framework outperforms the baseline models and can generate interactive and emotional responses.
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT ...
详细信息
ISBN:
(纸本)9781509066315
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches (1).
Polarization image fusion aims to integrate intensity and degree of linear polarization images into one with more details, which is beneficial to improve the ability of targets detection under complex background. The ...
详细信息
Polarization image fusion aims to integrate intensity and degree of linear polarization images into one with more details, which is beneficial to improve the ability of targets detection under complex background. The fusion strategies in conventional methods are designed in a hand-crafted way and not robust to different fusion tasks. In this paper, we propose a novel and deep network to address the polarization image fusion issue with self-learned strategy. The network consists of encoder, Fusion, and decoder layers. Feature maps extracted by encoder are fused, then fed into decoder to generate fused images. Besides, a novel loss function is adopted to train the network in an unsupervised way, without ground truth of fused images. To verify the advantage, the network trained on polarization images is also used to infrared and visible images fusion, and multi-focus image fusion. Experimental results showed that our method outperforms several state-of-the-art methods in terms of visual quality and quantitative measurement. The proposed fused method can be applied in the military and civilian fields such as camouflage and hidden targets detection, medical diagnosis, and environmental monitoring. (c) 2021 Elsevier Ltd. All rights reserved.
Accurate and up-to-date road network information is very important for the Geographic Information System (GIS) database, traffic management and planning, automatic vehicle navigation, emergency response and urban poll...
详细信息
Accurate and up-to-date road network information is very important for the Geographic Information System (GIS) database, traffic management and planning, automatic vehicle navigation, emergency response and urban pollution sources investigation. In this paper, we use vector field learning to extract roads from high resolution remote sensing imaging. This method is usually used for skeleton extraction in nature image, but seldom used in road extraction. In order to improve the accuracy of road extraction, three vector fields are constructed and combined respectively with the normal road mask learning by a two-task network. The results show that all the vector fields are able to significantly improve the accuracy of road extraction, no matter the field is constructed in the road area or completely outside the road. The highest F1 score is 0.7618, increased by 0.053 compared with using only mask learning.
The reconstruction of seismic data with missing traces has been a long-standing issue in seismic data processing. Deep learning (DL) has emerged as a popular tool for seismic interpolation;it learns priors from traini...
详细信息
The reconstruction of seismic data with missing traces has been a long-standing issue in seismic data processing. Deep learning (DL) has emerged as a popular tool for seismic interpolation;it learns priors from training data sets of incomplete/complete data pairs. However, these DL methods are restricted to training data because they are supervised. When the features of the testing and training data are different, the recovery performance decreases, which prevents practical application. We have introduced a "deep-seismic-prior-based" approach via a convolution neural network (CNN), which captures priors based on the particular structure of the CNN, but it does not need any training data set. The ill-posed inverse problem in seismic interpolation is thus solved using the CNN structure as a prior, and the learned network weights are the parameters that represent the seismic data. Because the convolutional filter weights are shared to achieve spatial invariance, the CNN structure can function as a regularizer to guide network learning. In our method, corrupted seismic data are reconstructed during the iterative process by minimizing the mean square error between the network output and the original data. We applied our method for interpolating irregularly and regularlymissing traces in prestack and poststack seismic data. The experimental results indicate that our approach outperforms the traditional singular spectrum analysis and the dealiased Cadzow methods commonly used in the reconstruction of such data.
Although attention mechanisms are exploited widely in encoder-decoder neural network-based image captioning framework, the relation between the selection of salient image regions and the supervision of spatial informa...
详细信息
Although attention mechanisms are exploited widely in encoder-decoder neural network-based image captioning framework, the relation between the selection of salient image regions and the supervision of spatial information on local and global representation learning was overlooked, thereby degrading captioning performance. Consequently, we propose an image captioning scheme based on adaptive spatial information attention (ASIA), extracting a sequence of spatial information of salient objects in a local image region or an entire image. Specifically, in the encoding stage, we extract the object-level visual features of salient objects and their spatial bounding-box. We obtain the global feature maps of an entire image, which are fused with local features and the fused features are fed into the LSTM-based language decoder. In the decoding stage, our adaptive attention mechanism dynamically selects the corresponding image regions specified by an image description. Extensive experiments conducted on two datasets demonstrate the effectiveness of the proposed method.
The feedback mechanism method of simulating the biological vision system has not been widely used in deep learning dehazing algorithms. To alleviate the difficulty of feature interaction, we combine the feedback mecha...
详细信息
The feedback mechanism method of simulating the biological vision system has not been widely used in deep learning dehazing algorithms. To alleviate the difficulty of feature interaction, we combine the feedback mechanism with dense skip connections to fuse features of different levels in a dehazing network. Inspired by the feedback network in which previous network layers can have access to rich information processed by the following network layers, we propose an end-to-end dense feedback network (DFBDehazeNet) for single image dehazing that implements the feedback mechanism using hidden states of constrained RNN. The low-level hazy feature information can be continuously corrected by the high-level feature information obtained from the dense feedback block via the recurrent feedback connection. The top-down feedback mechanism is adopted in DFBDehazeNet to refine the low-level hazy feature information, thereby achieving a powerful image restoration effect. The ablation experiment proves that the iterative structure of DFBDehazeNet and the projection unit play an important role in removing haze from images. The experimental results show that the results of image haze removal are superior to the great majority of existing methods both qualitatively and quantitatively. (c) 2021 SPIE and IS&T [DOI: 10.1117/***.30.3.033004] Under the influence of severe weather, the quality of images collected by the camera system drops sharply. These degraded images not only affect people's subjective judgment but also lead to poor results on advanced computer vision tasks such as object detection,1 semantic segmentation,2 action recognition,3 and so on. The restoration of images in severe weather4 is of great significance to computer vision. Image dehazing algorithms are designed to recover high-quality clear images from low-quality hazy images for advanced computer vision tasks. The image dehazing task assists in the driving technology of unmanned vehicles and the unmanned toll sys
Future pedestrian trajectory prediction offers great prospects for many practical applications. Most existing methods focus on social interaction among pedestrians but ignore the fact that in addition to pedestrians t...
详细信息
Future pedestrian trajectory prediction offers great prospects for many practical applications. Most existing methods focus on social interaction among pedestrians but ignore the fact that in addition to pedestrians there are other kinds of objects (cars, dogs, bicycles, motorcycles, etc.) with a great influence on the subject pedestrian's future trajectory. Most existing methods neglect the intentions of the pedestrian, which can be obtained by the key points of the subject pedestrian's face. Therefore, rich category information about the subject pedestrian's surroundings and face key points plays a great role in promoting the modeling of pedestrian movement. Motivated by this idea, this paper tries to predict a pedestrian's future trajectory by jointly using various categories and the relative positions of the subject pedestrian's surroundings and the key points in his face. We propose a data modeling method to effectively unify rich visual features about categories, interaction and face key points into a multi-channel tensor and build an end-to-end fully convolutional encoder-decoder attention model based on convolutional long-short-term memory utilizing this tensor. We evaluate and compare our method with several existing methods on 5 crowded video sequences from the public dataset multi-object tracking (MOT)-16. Experimental results show that our method outperforms state-of-the-art approaches, with less prediction error. (C) 2021 Elsevier B.V. All rights reserved.
The growth of machine learning (ML) in environmental science can be divided into a slow phase lasting till the mid-2010s and a fast phase thereafter. The rapid transition was brought about by the emergence of powerful...
详细信息
The growth of machine learning (ML) in environmental science can be divided into a slow phase lasting till the mid-2010s and a fast phase thereafter. The rapid transition was brought about by the emergence of powerful new ML methods, allowing ML to successfully tackle many problems where numerical models and statistical models have been hampered. Deep convolutional neural network models greatly advanced the use of ML on 2D or 3D data. Transfer learning has allowed ML to progress in climate science, where data records are generally short for ML. ML and physics are also merging in new areas, for example: (a) using ML for general circulation model parametrization, (b) adding physics constraints in ML models, and (c) using ML in data assimilation. Impact Statement This perspective paper reviews the evolution and growth of machine learning (ML) models in environmental science. The opaque nature of ML models led to decades of slow growth, but exponential growth commenced around the mid-2010s. Novel ML models which have contributed to this exponential growth (e.g., deep convolutional neural networks, encoder-decoder networks, and generative-adversarial networks) are reviewed, as well as approaches to merging ML models with physics-based models.
暂无评论