When designing variational autoencoders (VAEs) or other types of latent space models, the dimensionality of the latent space is typically defined upfront. In this process, it is possible that the number of dimensions ...
详细信息
ISBN:
(纸本)9780738133669
When designing variational autoencoders (VAEs) or other types of latent space models, the dimensionality of the latent space is typically defined upfront. In this process, it is possible that the number of dimensions is under- or overprovisioned for the application at hand. In case the dimensionality is not predefined, this parameter is usually determined using time- and resource-consuming cross-validation. For these reasons we have developed a technique to shrink the latent space dimensionality of VAEs automatically and on-the-fly during training using Generalized ELBO with Constrained Optimization (GECO) and the L-0-Augment-REINFORCE-Merge (L-0-ARM) gradient estimator. The GECO optimizer ensures that we are not violating a predefined upper bound on the reconstruction error. This paper presents the algorithmic details of our method along with experimental results on five different datasets. We find that our training procedure is stable and that the latent space can be pruned effectively without violating the GECO constraints.
To enjoy fishing indoors, we study a hardware-type fishing simulator that employs a real fishing rod. In this paper, as the first step of our research, we develop a pull force acquisition system and a winding system t...
详细信息
ISBN:
(纸本)9781665449588
To enjoy fishing indoors, we study a hardware-type fishing simulator that employs a real fishing rod. In this paper, as the first step of our research, we develop a pull force acquisition system and a winding system that consists of a motor, reel, and controller. The pull force acquisition system obtains an actual fish pull force. The time variation of the pull force represents the pull pattern of the fish. We show that the winding system can reproduce a pull pattern similar to the original pull pattern obtained at the pull force acquisition system. A lot of pull pattern of fish has to be acquired to represent a specific pull pattern to the fish species. It is inefficient to obtain them with fieldwork. We use a variational autoencoder (VAE) to generate multiple pull patterns similar to the original pull pattern. Here we assume that the fish species-specific pull pattern maintains its rough shape of movements. Simulation results showed that VAE generated multiple pull patterns roughly maintaining the original shape.
The use of AI has led to the era of pervasive intelligence, marked by a proliferation of smart devices in our daily lives. Federated Learning (FL) enables machine learning at the edge without having to share user-spec...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The use of AI has led to the era of pervasive intelligence, marked by a proliferation of smart devices in our daily lives. Federated Learning (FL) enables machine learning at the edge without having to share user-specific private data with an untrusted third party. Conventional FL techniques are supervised learning methods, where a fundamental challenge is to ensure that data is reliably annotated at the edge. Another approach is to obtain rich and informative representations of unlabeled data, which is suitable for downstream tasks. We propose a novel IS-FedVAE framework where we use importance sampling to federate a global VAE framework, allowing us to learn the global latent space distribution using local latent space distributions at the edge. We evaluate the representation in a stand-alone manner using linear probe, where we freeze the backbone representation and measure the accuracy of a downstream classifier. Furthermore, we demonstrate that IS-FedVAE outperforms state-of-the-art unsupervised FL learning baselines. We also show that IS-FedVAE is insensitive to varying levels of heterogeneity, scalable to varying numbers of clients, and robust to the changing number of local epochs.
Video Salient Document Detection (VSDD) is an essential task of practical computer vision, which aims to highlight visually salient document regions in video frames. Previous techniques for VSDD focus on learning feat...
详细信息
ISBN:
(纸本)9798350397444
Video Salient Document Detection (VSDD) is an essential task of practical computer vision, which aims to highlight visually salient document regions in video frames. Previous techniques for VSDD focus on learning features without considering the cooperation among and across the appearance and motion cues and thus fail to perform in practical scenarios. Moreover, most of the previous techniques demand high computational resources, which limits the usage of such systems in resource-constrained settings. To handle these issues, we propose VS-Net, which captures multi-scale spatiotemporal information with the help of dilated depth-wise separable convolution and Approximation Rank Pooling. VS-Net extracts the key features locally from each frame across embedding sub-spaces and forwards the features between adjacent and parallel nodes, enhancing model performance globally. Our model generates saliency maps considering both the background and foreground simultaneously, making it perform better in challenging scenarios. The immense experiments regulated on the benchmark MIDV-500 dataset show that the VS-Net model outperforms state-of-the-art approaches in both time and robustness measures.
As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is t...
详细信息
ISBN:
(纸本)9781450350228
As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original data samples by compact binary codes through hashing. A spectrum of machine learning methods have been utilized, but they often lack expressiveness and flexibility in modeling to learn effective representations. The recent advances of deep learning in a wide range of applications has demonstrated its capability to learn robust and powerful feature representations for complex data. Especially, deep generative models naturally combine the expressiveness of probabilistic generative models with the high capacity of deep neural networks, which is very suitable for text modeling. However, little work has leveraged the recent progress in deep learning for text hashing. In this paper, we propose a series of novel deep document generative models for text hashing. The first proposed model is unsupervised while the second one is supervised by utilizing document labels/tags for hashing. The third model further considers document-specific factors that affect the generation of words. The probabilistic generative formulation of the proposed models provides a principled framework for model extension, uncertainty estimation, simulation, and interpretability. Based on variational inference and reparameterization, the proposed models can be interpreted as encoder-decoder deep neural networks and thus they are capable of learning complex nonlinear distributed representations of the original documents. We conduct a comprehensive set of experiments on four public testbeds. The experimental results have demonstrated the effectiveness of the proposed supervised learning models for text hashing.
Text-based speech editing systems are developed to enable users to modify speech based on the transcript. Existing state-of-the-art editing systems based on neural networks do partial inferences with no exception, tha...
详细信息
Text-based speech editing systems are developed to enable users to modify speech based on the transcript. Existing state-of-the-art editing systems based on neural networks do partial inferences with no exception, that is, only generate new words that need to be replaced or inserted. This manner usually leads to the prosody of the edited part being inconsistent with the surrounding speech and a failure to handle the alteration of intonation. To address these problems, we propose a cross-utterance conditioned coherent speech editing system, that first does the entire reasoning at the inference time. Our proposed system can generate speech by utilizing speaker information, context, acoustic features, and the mel-spectrogram from the original audio. Experiments conducted on subjective and objective metrics demonstrate that our approach outperforms the baseline on various editing operations regarding naturalness and prosody consistency.
A statistical appearance model of blood vessels based on variational autoencoder (VAE) is well adapted to image intensity variations. However, images reconstructed with such a statistical model may have topological de...
详细信息
ISBN:
(纸本)9783030874445;9783030874438
A statistical appearance model of blood vessels based on variational autoencoder (VAE) is well adapted to image intensity variations. However, images reconstructed with such a statistical model may have topological defects, such as loss of bifurcation and creation of undesired hole. In order to build a 3D anatomical model of blood vessels, we incorporate topological prior into the statistical modeling. Qualitative and quantitative results on 2567 real CT volume patches and on 10000 artificial ones show the efficiency of the proposed framework.
In modern clinical medicine, electrocardiogram (ECG) is a common diagnosis technique of cardiovascular diseases. The purpose of this paper is to propose a novel model-based clustering approach for analyzing ECG data. ...
详细信息
ISBN:
(纸本)9781450380379
In modern clinical medicine, electrocardiogram (ECG) is a common diagnosis technique of cardiovascular diseases. The purpose of this paper is to propose a novel model-based clustering approach for analyzing ECG data. Our approach is composed of two modules: representation learning and ECG data clustering. In the module of representation learning, a deep generative model referred to as the hyperspherical variational recurrent autoencoder (HVRAE) is developed to extract the representation of observed ECG data, based on the variational autoencoder (VAE) with long short-term memory (LSTM) networks. In the module of ECG data clustering, we develop a nonparametric hidden Markov model (NHMM) based on Dirichlet process in which the number of hidden states is inferred automatically during the learning process. Moreover, the emission density of each hidden state of our NHMM follows a mixture of von Mises-Fisher (VMF) distributions which have better capability for modeling ECG representations than other commonly used distributions (such as the Gaussian distribution). To learn the proposed VMF-based NHMM, we theoretically develop an effective learning algorithm based on variational Bayes. The merits of our model-based clustering approach for analyzing ECG data are verified through experiments on publicly available ECG data sets.
Wafer manufacturing is a complex, expensive, and time-consuming process that involves multiple steps. By closely monitoring the corresponding variations in process parameters, potential anomalies and faults in the pro...
详细信息
ISBN:
(纸本)9798350384567;9798350384550
Wafer manufacturing is a complex, expensive, and time-consuming process that involves multiple steps. By closely monitoring the corresponding variations in process parameters, potential anomalies and faults in the process can be timely detected and identified, effectively improving equipment utilization and product yield. However, the anomaly detection methods currently in use rely on extracting feature values from sensor data and analyzing the statistical information of these features. This approach fails to identify complex anomaly patterns, and constructing meaningful feature values can be challenging. In this paper, we propose an unsupervised anomaly detection solution based on the variational autoencoder (VAE) model. We optimized the training strategy of the VAE model to address the issue of posterior collapse. By reconstructing the sensor data using the VAE model, we can effectively convert anomalous patterns into normal ones. The experimental results confirm the effectiveness of the model across various sensor datasets, demonstrating the capability of the proposed solution to accurately identify abnormal patterns within the sensor data.
Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit dis...
详细信息
ISBN:
(纸本)9781665468916
Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentanglement is achieved by a novel Federated Dual variational autoencoder (FedDVA), which employs two encoders to infer the two types of representations. FedDVA can produce a better understanding of the trade-off between global knowledge sharing and local personalization in PFL. Moreover, it can be integrated with existing FL methods and turn them into personalized models for heterogeneous downstream tasks. Extensive experiments validate the advantages caused by disentanglement and show that models trained with disentangled representations substantially outperform those vanilla methods.
暂无评论