检索结果-内蒙古大学图书馆

Weakly-Supervised Video Summarization Using variational Encoder-Decoder and Web Prior 1

15th European Conference on Computer Vision (ECCV)

作者： Cai, Sijia Zuo, Wangmeng Davis, Larry S. Zhang, Lei Hong Kong Polytech Univ Dept Comp Kowloon Hong Kong Peoples R China DAMO Acad Alibaba Grp Hangzhou Peoples R China Harbin Inst Technol Sch Comp Sci & Technol Harbin Peoples R China Univ Maryland Dept Comp Sci College Pk MD 20742 USA

ISBN: (数字)9783030012649

ISBN: (纸本)9783030012649;9783030012632

Video summarization is a challenging under-constrained problem because the underlying summary of a single video strongly depends on users' subjective understandings. Data-driven approaches, such as deep neural networks, can deal with the ambiguity inherent in this task to some extent, but it is extremely expensive to acquire the temporal annotations of a large-scale video dataset. To leverage the plentiful web-crawled videos to improve the performance of video summarization, we present a generative modelling framework to learn the latent semantic video representations to bridge the benchmark data and web data. Specifically, our framework couples two important components: a variational autoencoder for learning the latent semantics from web videos, and an encoder-attention-decoder for saliency estimation of raw video and summary generation. A loss term to learn the semantic matching between the generated summaries and web videos is presented, and the overall framework is further formulated into a unified conditional variational encoder-decoder, called variational encoder-summarizer-decoder (VESD). Experiments conducted on the challenging datasets CoSum and TVSum demonstrate the superior performance of the proposed VESD to existing state-of-the-art methods. The source code of this work can be found at https://***/cssjcai/vesd.

关键词： Video summarization variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Text Generation Based on Generative Adversarial Nets with Latent Variables 22nd

Text Generation Based on Generative Adversarial Nets with La...

引用

22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

作者： Wang, Heng Qin, Zengchang Wan, Tao Beihang Univ Sch ASEE Intelligent Comp & Machine Learning Lab Beijing 100191 Peoples R China Beihang Univ Beijing Adv Innovat Ctr Biomed Engn Sch Biol Sci & Med Engn Beijing 100191 Peoples R China

ISBN: (纸本)9783319930374;9783319930367

In this paper, we propose a model using generative adversarial net (GAN) to generate realistic text. Instead of using standard GAN, we combine variational autoencoder (VAE) with generative adversarial net. The use of high-level latent random variables is helpful to learn the data distribution and solve the problem that generative adversarial net always emits the similar data. We propose the VGAN model where the generative model is composed of recurrent neural network and VAE. The discriminative model is a convolutional neural network. We train the model via policy gradient. We apply the proposed model to the task of text generation and compare it to other recent neural network based models, such as recurrent neural network language model and Seq-GAN. We evaluate the performance of the model by calculating negative log-likelihood and the BLEU score. We conduct experiments on three benchmark datasets, and results show that our model outperforms other previous models.

关键词： Generative adversarial net variational autoencoder VGAN Text generation

来源：评论

学校读者我要写书评

暂无评论

Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion 19

Investigation of using disentangled and interpretable repres...

引用

19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)

作者： Mohammadi, Seyed Hamidreza Kim, Taehwan ObEN Inc Pasadena CA 91103 USA

ISBN: (纸本)9781510872219

We study the problem of cross-lingual voice conversion in non-parallel speech corpora and one-shot learning setting. Most prior work require either parallel speech corpora or enough amount of training data from a target speaker. However, we convert an arbitrary sentences of an arbitrary source speaker to target speaker's given only one target speaker training utterance. To achieve this, we formulate the problem as learning disentangled speaker-specific and context-specific representations and follow the idea of [1] which uses Factorized Hierarchical variational autoencoder (FHVAE). After training FHVAE on multi speaker training data, given arbitrary source and target speakers' utterance, we estimate those latent representations and then reconstruct the desired utterance of converted voice to that of target speaker. We investigate the effectiveness of the approach by conducting voice conversion experiments with varying size of training utterances and it was able to achieve reasonable performance with even just one training utterance. We also examine the speech representation and show that World vocoder outperforms Short-time Fourier Transform (STFT) used in [1]. Finally, in the subjective tests, for one language and cross-lingual voice conversion, our approach achieved significantly better or comparable results compared to VAE-STFT and GMM baselines in speech quality and similarity.

关键词： voice conversion one-shot learning cross lingual variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

UNSUPERVISED REPRESENTATION LEARNING OF SPEECH FOR DIALECT IDENTIFICATION

UNSUPERVISED REPRESENTATION LEARNING OF SPEECH FOR DIALECT I...

引用

IEEE Workshop on Spoken Language Technology (SLT)

作者： Shon, Suwon Hsu, Wei-Ning Glass, James MIT Comp Sci & Artificial Intelligence Lab 77 Massachusetts Ave Cambridge MA 02139 USA

ISBN: (纸本)9781538643341

In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID). An FHVAE can learn a latent space that separates the more static attributes within an utterance from the more dynamic attributes by encoding them into two different sets of latent variables. Useful factors for dialect identification, such as phonetic or linguistic content, are encoded by a segmental latent variable, while irrelevant factors that are relatively constant within a sequence, such as a channel or a speaker information, are encoded by a sequential latent variable. The disentanglement property makes the segmental latent variable less susceptible to channel and speaker variation, and thus reduces degradation from channel domain mismatch. We demonstrate that on fully-supervised DID tasks, an end-to-end model trained on the features extracted from the FHVAE model achieves the best performance, compared to the same model trained on conventional acoustic features and an i-vector based system. Moreover, we also show that the proposed approach can leverage a large amount of unlabeled data for FHVAE training to learn domain-invariant features for DID, and significantly improve the performance in a low-resource condition, where the labels for the in-domain data are not available.

关键词： language recognition dialect identification variational autoencoder unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

Deep Incremental Learning for Efficient High-Fidelity Face Tracking

引用

ACM TRANSACTIONS ON GRAPHICS 2018年第6期37卷 234-234页

作者： Wu, Chenglei Shiratori, Takaaki Sheikh, Yaser Facebook Real Labs Pittsburgh PA 15213 USA

In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors. Our Geo-Tex VAE model extends the convolutional variational autoencoder for face tracking, and jointly learns and represents deformations and variations in geometry and texture from tracked meshes and texture maps. To accurately model variations in facial geometry and texture, we introduce the decomposition layer in the Geo-Tex VAE architecture which decomposes the facial deformation into global and local components. We train the global deformation with a fully-connected network and the local deformations with convolutional layers. Despite running this model on each frame independently - thereby enabling a high amount of parallelization - we validate that our framework achieves sub-millimeter accuracy on synthetic data and outperforms existing methods. We also qualitatively demonstrate high-fidelity, long-duration facial performance tracking on several actors.

关键词： facial performance tracking variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

druGAN: An Advanced Generative Adversarial autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico

引用

MOLECULAR PHARMACEUTICS 2017年第9期14卷 3098-3104页

作者： Kadurin, Artur Nikolenko, Sergey Khrabrov, Kuzma Aliper, Alex Zhavoronkov, Alex Johns Hopkins Univ Eastern Emerging Technol Ctr Insilico Med Inc Pharmaceut Artificial Intelligence Dept Baltimore MD 21218 USA Natl Res Univ Higher Sch Econ St Petersburg 190008 Russia Steklov Math Inst St Petersburg St Petersburg 191023 Russia Mail Ru Grp Ltd Search Dept Moscow 125167 Russia Biogerontol Res Fdn Trevissome Pk Truro TR4 8UN England Moscow Inst Phys & Technol Dolgoprudnyi 141701 Russia Kazan Fed Univ Kazan 420008 Republic Of Tat Russia

Deep generative adversarial networks (GANs) are the emerging technology in drug discovery and biomarker development. In our recent work, we demonstrated a proof-of-concept of implementing deep generative adversarial autoencoder (AAE) to identify new molecular fingerprints with predefined anticancer properties. Another popular generative model is the variational autoencoder (VAE), which is based on deep neural architectures. In this work, we developed an advanced AAE model for molecular feature extraction problems, and demonstrated its advantages compared to VAE in terms of (a) adjustability in generating molecular fingerprints;(b) capacity of processing very large molecular data sets;and (c) efficiency in unsupervised pretraining for regression model. Our results suggest that the proposed AAE model significantly enhances the capacity and efficiency of development of the new molecules with specific anticancer properties using the deep generative models.

关键词： adversarial autoencoder deep learning drug discovery variational autoencoder generative adversarial network

来源：评论

学校读者我要写书评

暂无评论

Automatic analysis of faulty low voltage network asset using deep neural networks

引用

JOURNAL OF ENGINEERING-JOE 2018年第15期2018卷 851-855页

作者： Mastroleo, Marcello Ugolotti, Roberto Mussi, Luca Vicari, Emilio Sassi, Federico Sciocchetti, Francesco Beasant, Bob McIlroy, Colin Camlin Italy Str Budellungo 2 Parma Italy Camlin Technol 31 Ferguson Dr Lisburn North Ireland

Electrical distribution network is constantly ageing worldwide. Therefore, probability of cable faults is increasing over time. Fast recovering of damaged networks is of vital importance and a quick and automatic identification of the failure source may help to promptly recover the functionality of the network. The scenario we are taking into consideration is a vast number of recording devices spread across a network that constantly monitor low voltage cables. When the current of a cable reaches a very high value, data is sent to a central server which analyses it through a variant of a variational Auto Encoder (VAE), a deep neural network. This VAE has been trained by using historical data collected from several hundreds of faults recorded, but in which only a handful of them has been labelled by an on-site analysis of the fault. Data used for training is simply the recorded levels of voltages and currents, after a simple pre-processing step. The final goal is to let the network distinguish if the fault occurred in a point of the cable, on a joint, or at the pot-end located at the termination. A preliminary evaluation of its ability to generalise over the non-labelled samples shows encouraging results.

关键词： probability neural nets fault diagnosis power engineering computing power distribution faults data analysis power distribution reliability power cables deep neural network electrical distribution network electric vehicles distribution network operators recording devices low-voltage cables automatic faulty LV network asset analysis power system heat pumps cable fault probabilty damaged network fast recovery automatic failure source identification variational autoencoder VAE data analysis

来源：评论

学校读者我要写书评

暂无评论

Modeling and Transforming Speech using variational autoencoders 17

Modeling and Transforming Speech using Variational Autoencod...

引用

17th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2016)

作者： Blaauw, Merlijn Bonada, Jordi Univ Pompeu Fabra Mus Technol Grp Barcelona Spain

ISBN: (纸本)9781510833135

Latent generative models can learn higher-level underlying factors from complex data in an unsupervised manner. Such models can be used in a wide range of speech processing applications, including synthesis, transformation and classification. While there have been many advances in this field in recent years, the application of the resulting models to speech processing tasks is generally not explicitly considered. In this paper we apply the variational autoencoder (VAE) to the task of modeling frame-wise spectral envelopes. The VAE model has many attractive properties such as continuous latent variables, prior probability over these latent variables, a tractable lower bound on the marginal log likelihood, both generative and recognition models, and end-to-end training of deep models. We consider different aspects of training such models for speech data and compare them to more conventional models such as the Restricted Boltzmann Machine (RBM). While evaluating generative models is difficult, we try to obtain a balanced picture by considering both performance in terms of reconstruction error and when applying the model to a series of modeling and transformation tasks to get an idea of the quality of the learned features.

关键词： generative models variational autoencoder acoustic modeling deep learning

来源：评论

学校读者我要写书评

暂无评论

VAE-SPACE: DEEP GENERATIVE MODEL OF VOICE FUNDAMENTAL FREQUENCY CONTOURS

VAE-SPACE: DEEP GENERATIVE MODEL OF VOICE FUNDAMENTAL FREQUE...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Kou Tanaka Hirokazu Kameoka Kazuho Morikawa NTT Communication Science Laboratories NTT Corporation Japan Graduate School of informatics Nagoya University Japan

ISBN: (纸本)9781538646595

Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F_0) contours of normal speech and singing voices. The generative model we propose in this paper 1) is able to accurately decompose an F_0 contour into the sum of phrase and accent components of the Fujisaki model, a mathematical model describing the control mechanism of vocal fold vibration, without an iterative algorithm, and 2) can represent/generate F_0 contours of both normal speech and singing voices reasonably well.

关键词： Deep generative model voice F_0 contour singing voice variational autoencoder gated convolutional network contour Voice fundamental frequencies control mechanism Speech Iterative algorithms Vocal music Singing Vocal Cords Outlines

来源：评论

学校读者我要写书评

暂无评论

PLAYING TECHNIQUE CLASSIFICATION BASED ON DEEP COLLABORATIVE LEARNING OF variational AUTO-ENCODER AND GAUSSIAN PROCESS

PLAYING TECHNIQUE CLASSIFICATION BASED ON DEEP COLLABORATIVE...

引用

IEEE International Conference on Multimedia and Expo

作者： Sih-Huei Chen Yuan-Shan Lee Min-Che Hsieh Jia-Ching Wang Dept. of Computer Science and Information Engineering National Central University Taiwan

Modeling musical timbre is critical for various music information retrieval (MIR) tasks. This work addresses the task of classifying playing techniques, which involves extremely subtle variations of timbre among different categories. A deep collaborative learning framework is proposed to represent a music with greater discriminative power than previously achieved. Firstly, a novel variational autoencoder (VAE) is developed to eliminate the variation of acoustic features within a class. Secondly, a Gaussian process classifier is jointly learned to distinguish the variations of timbres between classes, which increases the discriminative power of the learned representations. We derive a new lower bound that guides a VAE-based representation. Experiments were conducted on a database of seven classes of guitar playing techniques. The experimental results demonstrated that the proposed method outperforms baselines in terms of the F1-score and accuracy.

关键词： variational autoencoder Gaussian process Collaborative learning Playing technique classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：