Stacking multiple layers of attention networks can significantly improve a model's performance. However, this also increases the model's time and space complexity, making it difficult for the model to capture ...
详细信息
Stacking multiple layers of attention networks can significantly improve a model's performance. However, this also increases the model's time and space complexity, making it difficult for the model to capture detailed information on the underlying features. We propose a novel sentence matching model (VSCA) that uses a new attention mechanism based on variational autoencoders (VAE), which exploits the contextual information in sentences to construct a basic attention feature map and combines it with VAE to generate multiple sets of related attention feature maps for fusion. Furthermore, VSCA introduces a spatial attention mechanism that combines visual perception to capture multilevel semantic information. The experimental results show that our proposed model outperforms pretrained models such as BERT on the LCQMC dataset and performs well on the PAWS-X data. Our work consists of two parts. The first part compares the proposed sentence matching model with state-of-the-art pretrained models such as BERT. The second part conducts innovative research on applying VAE and spatial attention mechanisms in NLP. The experimental results on the related datasets show that the proposed method has satisfactory performance, and VSCA can capture rich attentional information and detailed information with less time and space complexity. This work provides insights into the application of VAE and spatial attention mechanisms in NLP.
Parametric Modeling, Generative Design, and Performance-Based Design have gained increasing attention in the AEC field as a way to create a wide range of design variants while focusing on performance attributes rather...
详细信息
Parametric Modeling, Generative Design, and Performance-Based Design have gained increasing attention in the AEC field as a way to create a wide range of design variants while focusing on performance attributes rather than building codes. However, the relationships between design parameters and performance attributes are often very complex, resulting in a highly iterative and unguided process. In this paper, we argue that a more goal-oriented design process is enabled by an inverse formulation that starts with performance attributes instead of design parameters. A Deep Conditional Generative Design workflow is proposed that takes a set of performance attributes and partially defined design features as input and produces a complete set of design parameters as output. A model architecture based on a Conditional variational autoencoder is presented along with different approximate posteriors, and evaluated on four different case studies. Compared to Genetic Algorithms, our method proves superior when utilizing a pre-trained model.
Patterns in charts refer to interesting visual features or forms. Identifying patterns not only helps analysts understand the 'shape' of the data but also supports better and faster decision-making. Existing s...
详细信息
Patterns in charts refer to interesting visual features or forms. Identifying patterns not only helps analysts understand the 'shape' of the data but also supports better and faster decision-making. Existing solutions for identifying patterns in charts require a large number of labeled data instances, making it intractable without user supervision. In this paper, we propose ChartNavigator, an interactive pattern identification and annotation framework for unlabeled visualization charts. ChartNavigator leverages a novel chart-sensitive deep factor model to map patterns into a low-dimensional factor representation space, and facilitates rich analysis with the derived representations. We design and implement a visual interface to support efficient identification and annotation of potential patterns in charts. Evaluations with multiple datasets show that our approach outperforms the baseline models in identifying and annotating patterns.
Laser powder bed fusion is at the forefront of manufacturing metallic objects, particularly those with complex geometries or those produced in limited quantities. However, this 3D printing method is susceptible to sev...
详细信息
ISBN:
(纸本)9781510670136;9781510670129
Laser powder bed fusion is at the forefront of manufacturing metallic objects, particularly those with complex geometries or those produced in limited quantities. However, this 3D printing method is susceptible to several printing defects due to the complexities of using a high-power laser with ultra-fast actuation. Accurate online print defect detection is therefore in high demand, and this defect detection must maintain a low computational profile to enable low-latency process intervention. In this work, we propose a low-latency LPBF defect detection algorithm based on fusion of images from high-speed cameras in the visible and short-wave infrared (SWIR) spectrum ranges. First, we design an experiment to print an object while both imposing porosity defects on the print, and recording the laser's melt pool with the high-speed cameras. We then train variational autoencoders on images from both cameras to extract and fuse two sets of corresponding features. The melt pool recordings are then annotated with pore densities extracted from the printed object's CT scan. These annotations are then used to train and evaluate the ability of a fast neural network model to predict the occurrence of porosity from the fused features. We compare the prediction performance of our sensor fused model with models trained on image features from each camera separately. We observe that the SWIR imaging is sensitive to keyhole porosity while the visible-range optical camera is sensitive to lack-of-fusion porosity. By fusing features from both cameras, we are able to accurately predict both pore types, thus outperforming both single camera systems.
Recent neural information retrieval models using dense text representations generated by pre-trained models commonly face two issues. First, a pre-trained model (e.g., BERT) usually truncates a long document before gi...
详细信息
ISBN:
(纸本)9798400704314
Recent neural information retrieval models using dense text representations generated by pre-trained models commonly face two issues. First, a pre-trained model (e.g., BERT) usually truncates a long document before giving its representation, which may cause the loss of some important semantic information. Second, although pre-training models like BERT have been widely used in generating sentence embeddings, a substantial body of literature has shown that the pre-training models often represent sentence embeddings in a homogeneous and narrow space, known as the problem of representation anisotropy, which hurts the quality of dense vector retrieval. In this paper, we split the query and the document in information retrieval into two sets of natural sentences and generate their sentence embeddings with BERT, the most popular pre-trained model. Before aggregating the sentence embeddings to get the entire embedding representations of the input query and document, to alleviate the usual representation degeneration problem of sentence embeddings from BERT, we sample the variational auto-encoder's latent space distribution to obtain isotropic sentence embeddings and utilize supervised contrastive learning to uniform the distribution of these sentence embeddings in the representation space. Our proposed model undergoes training optimization for both the query and the document in the abovementioned aspects. Our model performs well in evaluating three extensively researched neural information retrieval datasets.
Traditionally, research on three-dimensional (3D) facial reconstruction has focused heavily on methods that use 3D Morphable Models (3DMMs) based on principal component analysis (PCA). Because such methods are linear,...
详细信息
Traditionally, research on three-dimensional (3D) facial reconstruction has focused heavily on methods that use 3D Morphable Models (3DMMs) based on principal component analysis (PCA). Because such methods are linear, they are robust to external noise. The PCA method has limitations when restoring faces that deviate from the training data distribution, particularly when recovering fine details. By contrast, restoration methods utilizing Graph Convolution Networks (GCN) offer the advantages of non-linearity and direct regression of vertex coordinates and colors. However, GCN-based approaches can be prone to overfitting, making them less stable. This study presents a face restoration approach that aims to regress the vertex coordinates and colors of a 3D face model directly from a single wilds 2D facial image. This method demonstrates greater stability and higher accuracy compared to conventional techniques. In addition, Graph Attention Networks (GAT) enhance the restoration performance while separating the networks responsible for facial shape and color, reducing noise caused by interference between different data attributes. Through experiments, we demonstrate the most optimized network structures and training methods and demonstrate improved performance compared to existing approaches.
With the popularization of environmental protection ideas, people are increasingly valuing low-carbon lifestyles and the economy. Electric vehicles play a crucial role in this transformation to reduce carbon emissions...
详细信息
ISBN:
(纸本)9789819609130;9789819609147
With the popularization of environmental protection ideas, people are increasingly valuing low-carbon lifestyles and the economy. Electric vehicles play a crucial role in this transformation to reduce carbon emissions. However, integrating electric vehicles into the power grid poses challenges, especially the possibility of destructive load peaks, which may endanger the stability and safety of the power grid. Accurately predicting the load of electric vehicles and managing grid scheduling are crucial for solving this problem. The current solutions are mainly divided into two categories: statistics-based methods and machine learning-based methods. Statistical methods require a large amount of long-term data modeling, making data collection a significant challenge. Similarly, machine learning-based methods have good long-term prediction performance on high-quality data, but they do not perform well in terms of short-term prediction accuracy. To overcome these obstacles, a comprehensive electric vehicle charging load prediction framework is proposed, which utilizes an innovative variational autoencoder to generate adversarial networks (VAE-GAN) for data processing, Principal Component Analysis (PCA) for feature extraction, and an improved CNN-GRU model for prediction. The experimental results show that the accuracy of short-term power load prediction is significantly improved, which verifies the effectiveness of the framework in processing small sample load data and provides advanced tools for intelligent management of electric vehicle charging stations.
Money laundering hides illegal money’s origin by making it seem legal. Detecting suspicious activity quickly in financial data is key to stopping fraud and money laundering. Real-time detection is popular approach fo...
详细信息
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can b...
详细信息
Most vision-based 3D pose estimation approaches typically rely on knowledge of object's 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object's 3D pose from only RGB images encoded in its latent space without knowing the object's 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects' category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://***/JZhao12/CVML-Pose.
Activity pattern prediction is a critical part of urban computing, urban planning, intelligent transportation, and so on. Based on a dataset with more than 10 million GPS trajectory records collected by mobile sensors...
详细信息
Activity pattern prediction is a critical part of urban computing, urban planning, intelligent transportation, and so on. Based on a dataset with more than 10 million GPS trajectory records collected by mobile sensors, this research proposed a CNN-BiLSTM-VAE-ATT-based encoder-decoder model for fine-grained individual activity sequence prediction. The model combines the long-term and short-term dependencies crosswise and also considers randomness, diversity, and uncertainty of individual activity patterns. The proposed results show higher accuracy compared to the ten baselines. The model can generate high diversity results while approximating the original activity patterns distribution. Moreover, the model also has interpretability in revealing the time dependency importance of the activity pattern prediction.
暂无评论