This paper presents a vision-based unmanned aerial vehicle (UAV) indoor obstacle avoidance using a deep reinforcement learning (DRL). The system consists of two parts a depth map compression and a UAV control. For the...
详细信息
ISBN:
(纸本)9798350300673
This paper presents a vision-based unmanned aerial vehicle (UAV) indoor obstacle avoidance using a deep reinforcement learning (DRL). The system consists of two parts a depth map compression and a UAV control. For the depth map compression part, the pre-trained variational autoencoder model is used to improve obstacle avoidance and reduce the training time. The states include the UAV information, gray images and compressed features. A dueling double deep recurrent Q network model is used to control the UAV. The network is trained in the AirSim simulation. To validate the performance of the proposed algorithm some simulations are conducted. The results show that the proposed algorithm can avoid obstacles at a fast speed in a narrow space, and fly through a difficult L-shaped corner in an indoor simulation.
Monitoring of air pollutants across space and time is critical in understanding pollution trends and reporting air quality. The Air Quality Index (AQI) is a tool used to communicate air quality that incorporates atmos...
详细信息
ISBN:
(数字)9781510661653
ISBN:
(纸本)9781510661646;9781510661653
Monitoring of air pollutants across space and time is critical in understanding pollution trends and reporting air quality. The Air Quality Index (AQI) is a tool used to communicate air quality that incorporates atmospheric concentrations of five major pollution indicators: ground-level ozone, particulate matter, carbon monoxide, sulfur dioxide, and nitrogen dioxide. The ability to accurately forecast these concentrations and identify unusual levels is of particular importance. In this work, we develop a generative time series model for air quality indicators and use it for long and short-term probabilistic forecasts. Air quality data are multivariate and exhibit high variability across indicators in both space and time. Marginal indicator distributions are typically skewed and contain substantial zeros, while indicator-wise cross-correlations can be highly non-linear. We find that hourly measurements additionally exhibit substantial temporal cross-correlation, long-term dependence, and daily periodicity. To capture these complexities, we employ a recurrent extension of the variational autoencoder (VAE) to sequential data. The VAE is a generative neural network architecture capable of learning complex, high dimensional manifolds on which data are distributed. Furthermore, recurrent architectures can capture non-linear and long-term temporal qualities of time series data. We train the proposed time series model on historical air quality measurements at multiple locations and demonstrate its ability to capture observed indicator-wise and temporal complexities. We additionally use the trained model to compute probabilistic forecasts and credible intervals of air quality indicators.
Recent research has employed reinforcement learning (RL) algorithms to optimize long-term user engagement in recommender systems, thereby avoiding common pitfalls such as user boredom and filter bubbles. They capture ...
详细信息
ISBN:
(纸本)9781450394079
Recent research has employed reinforcement learning (RL) algorithms to optimize long-term user engagement in recommender systems, thereby avoiding common pitfalls such as user boredom and filter bubbles. They capture the sequential and interactive nature of recommendations, and thus offer a principled way to deal with long-term rewards and avoid myopic behaviors. However, RL approaches are intractable in the slate recommendation scenario - where a list of items is recommended at each interaction turn due to the combinatorial action space. In that setting, an action corresponds to a slate that may contain any combination of items. While previous work has proposed well-chosen decompositions of actions so as to ensure tractability, these rely on restrictive and sometimes unrealistic assumptions. Instead, in this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. Then, the RL agent selects continuous actions in this latent space, which are ultimately decoded into the corresponding slates. By doing so, we are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates instead of independent items, in particular by enabling diversity. Our experiments performed on a wide array of simulated environments confirm the effectiveness of our generative modeling of slates over baselines in practical scenarios where the restrictive assumptions underlying the baselines are lifted. Our findings suggest that representation learning using generative models is a promising direction towards generalizable RL-based slate recommendation.
Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for hum...
详细信息
ISBN:
(纸本)9781665488679
Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for human-robot collaboration. The use of machine learning techniques to handle such type of collaboration requires data that are less feasible to produce than in a typical HRC setup. This work outlines scenarios of concurrent tasks for non-dyadic HRC applications. Based upon these concepts, this study also proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing, to reduce the effort involved in producing recordings of pair settings. To validate this statement, 3D skeleton poses of activity of single users were collected and merged in pairs. After this, such datapoints were used to separately train a long shortterm memory (LSTM) network and a variational autoencoder (VAE) composed of spatio-temporal graph convolutional networks (STGCN) to recognise the joint activities of the pairs of people. The results showed that it is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings, relieving from the technical difficulties involved in producing these data. The related code and collected data are publicly available(1).
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangle...
详细信息
ISBN:
(数字)9789819706013
ISBN:
(纸本)9789819706006;9789819706013
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based [7] end-to-end accent transfer model named Accent-VITS. Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable effective and stable accent transfer. We leverage a hierarchical CVAE structure to model accent pronunciation information and acoustic features, respectively, using bottleneck features and mel spectrums as constraints. Moreover, the text-to-wave mapping in VITS is decomposed into text-to-accent and accent-to-wave mappings in Accent-VITS. In this way, the disentanglement of accent and speaker timbre becomes be more stable and effective. Experiments on multi-accent and Mandarin datasets show that Accent-VITS achieves higher speaker similarity, accent similarity and speech naturalness as compared with a strong baseline (Demos: https://***/AccentVITS/).
To ensure the safety and reliability of complex industrial processes are very important. Therefore, extracting multiple features of data effectively is a great significance to improve the accuracy of modeling for faul...
详细信息
Recently, the real-time audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis. The RAVE method is based on a variational autoencoder and employs a two-stage training strat...
详细信息
Bathrooms can be slippery, increasing the risk of falling. In addition, because people enter the bathroom alone, it is difficult to detect accidents immediately when they occur. Therefore, a system is required to quic...
详细信息
Recognition and expression of emotion are key factors to the success of multi-turn conversations. Emotion recognition that can help model the relationship between query and response is used to be employed in single-tu...
详细信息
Recognition and expression of emotion are key factors to the success of multi-turn conversations. Emotion recognition that can help model the relationship between query and response is used to be employed in single-turn conversation models. However, little work focuses on infusing the emotional factor in multi-turn conversation generation so far. To alleviate these problems, we propose Multi-turn Emotional Conversation Model (MECM) by using multi-task learning, which improves the ability to represent emotions in multi-turn conversations. MECM is based on hierarchical latent variable model, that utilizes context hidden to sharing the common information. Besides it also contains an emotion classifier to help the model recognize the emotion in the conversation, and a conversation generator to maintain consistency of content and transformation of emotion. Experimental results show that our model significantly improves the quality of responses in terms of diversity and empathy, and keeps better performance on semantic similarity compared with baseline methods.
The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we co...
详细信息
The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment;and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
暂无评论