This dissertation presents the development of sensorimotor primitives as a means of constructing a language-agnostic model of speech communication. Insights from major theories in speech science and linguistics are us...
详细信息
This dissertation presents the development of sensorimotor primitives as a means of constructing a language-agnostic model of speech communication. Insights from major theories in speech science and linguistics are used to develop a conceptual framework for sensorimotor primitives in the context of control and information theory. Within this conceptual framework, sensorimotor primitives are defined as a system transformation that simplifies the interface to some high dimensional and/or nonlinear system. In the context of feedback control, sensorimotor primitives take the form of a feedback transformation. In the context of communication, sensorimotor primitives are represented as a channel encoder and decoder pair. Using a high fidelity simulation of articulatory speech synthesis, these realizations of sensorimotor primitives are respectively applied to feedback control of the articulators, and communication via the acoustic speech signal. Experimental results demonstrate the construction of a model of speech communication that is capable of both transmitting and receiving information, and imitating simple utterances.
Visual perception is, by large, the main source of information used by humans when driving. Therefore, it is natural and appropriate to rely heavily on vision analysis for autonomous driving, as done in most projects....
详细信息
ISBN:
(纸本)9781728131405
Visual perception is, by large, the main source of information used by humans when driving. Therefore, it is natural and appropriate to rely heavily on vision analysis for autonomous driving, as done in most projects. However, there is a significant difference between the common approach of vision in autonomous driving, and visual perceptions in humans when driving. Essentially, image analysis is often regarded as an isolated and autonomous module, which high level output drives the control modules of the vehicle. The direction here presented is different, we try to take inspiration from the brain architecture that makes humans so effective in learning tasks as complex as the one of driving. There are two key theories about biological perception grounding our development. The first is the view of the thinking activity as a simulation of perceptions and action, as theorized by Hesslow. The second is the Convergence-Divergence Zones (CDZs) mechanism of mental simulation connecting the process of extracting features from a visual scene, to the inverse process of imagining a scene content by decoding features stored in memory. We will show how our model, based on semi-supervised variational autoencoder, is a rather faithful implementation of these two basic neurocognitive theories.
Gene-expression profiling enables researchers to quantify transcription levels in cells, thus providing insight into functional mechanisms of diseases and other biological processes. However, because of the high dimen...
详细信息
Gene-expression profiling enables researchers to quantify transcription levels in cells, thus providing insight into functional mechanisms of diseases and other biological processes. However, because of the high dimensionality of these data and the sensitivity of measuring equipment, expression data often contains unwanted confounding effects that can skew analysis. For example, collecting data in multiple runs causes nontrivial differences in the data (known as batch effects), known covariates that are not of interest to the study may have strong effects, and there may be large systemic effects when integrating multiple expression datasets. Additionally, many of these confounding effects represent higher-order interactions that may not be removable using existing techniques that identify linear patterns. We created Confounded to remove these effects from expression data. Confounded is an adversarial variational autoencoder that removes confounding effects while minimizing the amount of change to the input data. We tested the model on artificially constructed data and commonly used gene expression datasets and compared against other common batch adjustment algorithms. We also applied the model to remove cancer-type-specific signal from a pan-cancer expression dataset. Our software is publicly available at https://***/jdayton3/Confounded.
The development of data-driven approaches, such as deep learning, has led to the emergence of systems that have achieved human-like performance in wide variety of tasks. For robotic tasks, deep data-driven models are ...
详细信息
The development of data-driven approaches, such as deep learning, has led to the emergence of systems that have achieved human-like performance in wide variety of tasks. For robotic tasks, deep data-driven models are introduced to create adaptive systems without the need of explicitly programming them. These adaptive systems are needed in situations, where task and environment changes remain unforeseen.
Convolutional neural networks (CNNs) have become the standard way to process visual data in robotics. End-to-end neural network models that operate the entire control task can perform various complex tasks with little feature engineering. However, the adaptivity of these systems goes hand in hand with the level of variation in the training data. Training end-to-end deep robotic systems requires a lot of domain-, task-, and hardware-specific data, which is often costly to provide.
In this work, we propose to tackle this issue by employing a deep neural network with a modular architecture, consisting of separate perception, policy, and trajectory parts. Each part of the system is trained fully on synthetic data or in simulation. The data is exchanged between parts of the system as low-dimensional representations of affordances and trajectories. The performance is then evaluated in a zero-shot transfer scenario using the Franka Panda robotic arm. Results demonstrate that a low-dimensional representation of scene affordances extracted from an RGB image is sufficient to successfully train manipulator policies.
Over the past decade, bottleneck features within an i-Vector framework have been used for state-of-the-art language/dialect identification (LID/DID). However, traditional bottleneck feature extraction requires additio...
详细信息
Over the past decade, bottleneck features within an i-Vector framework have been used for state-of-the-art language/dialect identification (LID/DID). However, traditional bottleneck feature extraction requires additional transcribed speech information. Alternatively, two types of unsupervised deep learning methods are introduced in this study. To address this limitation, an unsupervised bottleneck feature extraction approach is proposed, which is derived from the traditional bottleneck structure but trained with estimated phonetic labels. In addition, based on a generative modeling autoencoder, two types of latent variable learning algorithms are introduced for speech feature processing, which have been previous considered for image processing/reconstruction. Specifically, a variational autoencoder and adversarial autoencoder are utilized on alternative phase of speech processing. To demonstrate the effectiveness of the proposed methods, three corpora are evaluated: 1) a four Chinese dialect dataset, 2) a five Arabic dialect corpus, and 3) multigenre broadcast challenge corpus (MGB-3) for arabic DID. The proposed features are shown to outperform traditional acoustic feature MFCCs consistently across three corpora. Taken collectively, the proposed features achieve up to a relative +58% improvement in Cavg for LID/DID without the need of any secondary speech corpora.
Deep learning is usually applied to static datasets. If used for classification based on data streams, it is not easy to take into account a non-stationarity. This thesis presents work in progress on a new method for ...
详细信息
Deep learning is usually applied to static datasets. If used for classification based on data streams, it is not easy to take into account a non-stationarity. This thesis presents work in progress on a new method for online deep classifi- cation learning in data streams with slow or moderate drift, highly relevant for the application domain of malware detection. The method uses a combination of multilayer perceptron and variational autoencoder to achieve constant mem- ory consumption by encoding past data to a generative model. This can make online learning of neural networks more accessible for independent adaptive sys- tems with limited memory. First results for real-world malware stream data are presented, and they look promising. 1
We present SAGNet, a structure-aware generative model for 3D shapes. Given a set of segmented objects of a certain class, the geometry of their parts and the pairwise relationships between them (the structure) are joi...
详细信息
We present SAGNet, a structure-aware generative model for 3D shapes. Given a set of segmented objects of a certain class, the geometry of their parts and the pairwise relationships between them (the structure) are jointly learned and embedded in a latent space by an autoencoder. The encoder intertwines the geometry and structure features into a single latent code, while the decoder disentangles the features and reconstructs the geometry and structure of the 3D model. Our autoencoder consists of two branches, one for the structure and one for the geometry. The key idea is that during the analysis, the two branches exchange information between them, thereby learning the dependencies between structure and geometry and encoding two augmented features, which are then fused into a single latent code. This explicit intertwining of information enables separately controlling the geometry and the structure of the generated models. We evaluate the performance of our method and conduct an ablation study. We explicitly show that encoding of shapes accounts for both similarities in structure and geometry. A variety of quality results generated by SAGNet are presented.
We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respec...
详细信息
We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respects the global part structure of a shape collection, e.g., chairs, airplanes, etc. Our key observation is that while the overall structure of a 3D shape can be complex, the shape can usually be decomposed into a set of parts, each homeomorphic to a box, and the finer-scale geometry of the part can be recovered by deforming the box. The architecture of SDM-NET is that of a two-level variational autoencoder (VAE). At the part level, a PartVAE learns a deformable model of part geometries. At the structural level, we train a Structured Parts VAE (SP-VAE), which jointly learns the part structure of a shape collection and the part geometries, ensuring the coherence between global shape structure and surface details. Through extensive experiments and comparisons with the state-of-the-art deep generative models of shapes, we demonstrate the superiority of SDM-NET in generating meshes with visual quality, flexible topology, and meaningful structures, benefiting shape interpolation and other subsequent modeling tasks.
BackgroundWe examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimens...
详细信息
BackgroundWe examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological *** use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 s aggregate sampling), villin head piece (single trajectory of 125 s) and - - (BBA) protein (223 + 102 s sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural ***, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.
We present a generative neural network that enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are i...
详细信息
We present a generative neural network that enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional;it is a recursive neural network, or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recursive autoencoder, or RvNN-VAE, which performs scene object grouping during its encoding phase and scene generation during decoding. Specifically, a set of encoders are recursively applied to group 3D objects based on support, surround, and co-occurrence relations in a scene, encoding information about objects' spatial properties, semantics, and relative positioning with respect to other objects in the hierarchy. By training a variational autoencoder (VAE), the resulting fixed-length codes roughly follow a Gaussian distribution. A novel 3D scene can be generated hierarchically by the decoder from a randomly sampled code from the learned distribution. We coin our method GRAINS, for Generative Recursive autoencoders for INdoor Scenes. We demonstrate the capability of GRAINS to generate plausible and diverse 3D indoor scenes and compare with existing methods for 3D scene synthesis. We show applications of GRAINS including 3D scene modeling from 2D layouts, scene editing, and semantic scene segmentation via PointNet whose performance is boosted by the large quantity and variety of 3D scenes generated by our method.
暂无评论