Estimating three-dimensional (3D) human poses from a single camera is usually implemented by searching pose candidates with image descriptors. Existing methods usually suppose that the mapping from feature space to po...
详细信息
Estimating three-dimensional (3D) human poses from a single camera is usually implemented by searching pose candidates with image descriptors. Existing methods usually suppose that the mapping from feature space to pose space is linear, but in fact, their mapping relationship is highly nonlinear, which heavily degrades the performance of 3D pose estimation. We propose a method to recover 3D pose from a silhouette image. It is based on the multiview feature embedding (MFE) and the locality-sensitive autoencoders (LSAEs). On the one hand, we first depict the manifold regularized sparse low-rank approximation for MFE and then the input image is characterized by a fused feature descriptor. On the other hand, both the fused feature and its corresponding 3D pose are separately encoded by LSAEs. A two-layer back-propagation neural network is trained by parameter fine-tuning and then used to map the encoded 2D features to encoded 3D poses. Our LSAE ensures a good preservation of the local topology of data points. Experimental results demonstrate the effectiveness of our proposed method. (C) 2017 SPIE and IS&T
It is almost seventy years after the publication of Claude Shannon's "A Mathematical Theory of Communication" [1] and Norbert Wiener's "Extrapolation, Interpolation and Smoothing of Stationary T...
详细信息
ISBN:
(纸本)9781509041183
It is almost seventy years after the publication of Claude Shannon's "A Mathematical Theory of Communication" [1] and Norbert Wiener's "Extrapolation, Interpolation and Smoothing of Stationary Time Series" [2]. The pioneering works of Shannon and Wiener lay the foundation of communication, data storage, control, and other information technologies. This paper briefly reviews Shannon and Wiener's perspectives on the problem of message transmission over noisy channel and also experimentally evaluates the feasibility of integrating these two perspectives to train autoencoders close to the information limit. To this end, the principle of relevant information (PRI) is used and validated to optimally encode input imagery in the presence of noise.
Insufficient and imbalance data samples often prevent the development of accurate deep learning models for manufacturing defect detection. By applying data augmentation methods - including VAE latent space oversamplin...
详细信息
Insufficient and imbalance data samples often prevent the development of accurate deep learning models for manufacturing defect detection. By applying data augmentation methods - including VAE latent space oversampling and random data generation, and GAN multi-modal complementary data generation, we overcome the dataset limitations and achieve Pass/No-Pass accuracies of over 90%.
Electronic noses (e-noses) are instruments that can be used to measure gas samples conveniently. Based on the measured signal, the type and concentration of the gas can be predicted by pattern recognition algorithms. ...
详细信息
Electronic noses (e-noses) are instruments that can be used to measure gas samples conveniently. Based on the measured signal, the type and concentration of the gas can be predicted by pattern recognition algorithms. However, e-noses are often affected by influential factors, such as instrumental variation and time-varying drift. From the viewpoint of pattern recognition, the factors make the posterior distribution of the test data drift from that of the training data, thus will degrade the accuracy of the prediction models. In this paper, we propose drift correction autoencoder (DCAE) to address this problem. DCAE learns to model and correct the influential factors explicitly with the help of transfer samples. It generates drift-corrected and discriminative representation of the original data, which can then be applied to various prediction algorithms. We evaluate DCAE on data sets with instrumental variation and complex time-varying drift. Prediction models are trained on samples collected with one device or in the initial time period, then tested on other devices or time periods. Experimental results show that the DCAE outperforms typical drift correction algorithms and autoencoder-based transfer learning methods. It can improve the robustness of e-nose systems and greatly enhance their performance in real-world applications.
We demonstrate a new deep learning autoencoder network, trained by a nonnegativity constraint algorithm (nonnegativity-constrained autoencoder), that learns features that show part-based representation of data. The le...
详细信息
We demonstrate a new deep learning autoencoder network, trained by a nonnegativity constraint algorithm (nonnegativity-constrained autoencoder), that learns features that show part-based representation of data. The learning algorithm is based on constraining negative weights. The performance of the algorithm is assessed based on decomposing data into parts and its prediction performance is tested on three standard image data sets and one text data set. The results indicate that the nonnegativity constraint forces the autoencoder to learn features that amount to a part-based representation of data, while improving sparsity and reconstruction quality in comparison with the traditional sparse autoencoder and nonnegative matrix factorization. It is also shown that this newly acquired representation improves the prediction performance of a deep neural network.
Cross-media analysis exploits social data with different modalities from multiple sources simultaneously and synergistically to discover knowledge and better understand the world. There are two levels of cross media s...
详细信息
Cross-media analysis exploits social data with different modalities from multiple sources simultaneously and synergistically to discover knowledge and better understand the world. There are two levels of cross media social data. One is the element, which is made up of text, images, voice, or any combinations of modalities. Elements from the same data source can have different modalities. The other level of cross media social data is the new notion of aggregative subject (AS) a collection of time-series social elements sharing the same semantics (i.e., a collection of tweets, photos, blogs, and news of emergency events). While traditional feature learning methods focus on dealing with single modality data or data fused across multiple modalities, in this study, we systematically analyze the problem of feature learning for cross-media social data at the previously mentioned two levels. The general purpose is to obtain a robust and uniform representation from the social data in time-series and across different modalities. We propose a novel unsupervised method for cross-modality element-level feature learning called cross autoencoder (CAE). CAE can capture the cross-modality correlations in element samples. Furthermore, we extend it to the AS using the convolutional neural network (CNN), namely convolutional cross autoencoder (CCAE). We use CAEs as filters in the CCAE to handle cross-modality elements and the CNN framework to handle the time sequence and reduce the impact of outliers in AS. We finally apply the proposed method to classification tasks to evaluate the quality of the generated representations against several real-world social media datasets. In terms of accuracy, CAE gets 7.33% and 14.31% overall incremental rates on two element-level datasets. CCAE gets 11.2% and 60.5% overall incremental rates on two AS-level datasets. Experimental results show that the proposed CAE and CCAE work well with all tested classifiers and perform better than several other bas
We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized par...
详细信息
We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized part structure for the composed shape, leading to high-quality geometry construction. A unique feature of our composition network is that it is not merely learning how to connect parts. Our goal is to produce a coherent and plausible 3D shape, despite large incompatibilities among the input parts. The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss. We design SCORES as a recursive autoencoder network. During encoding, the input parts are recursively grouped to generate a root code. During synthesis, the root code is decoded, recursively, to produce a new, coherent part assembly. Assembled shape structures may be novel, with little global resemblance to training exemplars, yet have plausible substructures. SCORES therefore learns a hierarchical substructure shape prior based on per-node losses. It is trained on structured shapes from ShapeNet, and is applied iteratively to reduce the plausibility loss. We show results of shape composition from multiple sources over different categories of man-made shapes and compare with state-of-the-art alternatives, demonstrating that our network can significantly expand the range of composable shapes for assembly-based modeling.
In this paper, we study self-taught learning for hyperspectral image (HSI) classification. Supervised deep learning methods are currently state of the art for many machine learning problems, but these methods require ...
详细信息
In this paper, we study self-taught learning for hyperspectral image (HSI) classification. Supervised deep learning methods are currently state of the art for many machine learning problems, but these methods require large quantities of labeled data to be effective. Unfortunately, existing labeled HSI benchmarks are too small to directly train a deep supervised network. Alternatively, we used self-taught learning, which is an unsupervised method to learn feature extracting frameworks from unlabeled hyperspectral imagery. These models learn how to extract generalizable features by training on sufficiently large quantities of unlabeled data that are distinct from the target data set. Once trained, these models can extract features from smaller labeled target data sets. We studied two self-taught learning frameworks for HSI classification. The first is a shallow approach that uses independent component analysis and the second is a three-layer stacked convolutional autoencoder. Our models are applied to the Indian Pines, Salinas Valley, and Pavia University data sets, which were captured by two separate sensors at different altitudes. Despite large variation in scene type, our algorithms achieve state-of-the-art results across all the three data sets.
The vast data collected since the enforcement of building energy labelling in Italy has provided valuable information that is useful for planning the future of building energy efficiency. However, the indicators provi...
详细信息
The vast data collected since the enforcement of building energy labelling in Italy has provided valuable information that is useful for planning the future of building energy efficiency. However, the indicators provided through energy certificates are not suitable to support decisions, which target building energy retrofit in a regional scale. Considering the bias of the energy performance index toward a building's shape, decisions based on this index will favor buildings with a specific geometric characteristics. This study tends to overcome this issue by introducing a new indicator, tailored to rank buildings based on retrofitable characteristics. The proposed framework is validated by a case study, in which a large dataset of office buildings are assigned with the new index. Results indicate that the proposed indicator succeeds to extract a single index, which is representative of all building characteristics subject to energy retrofit. A new labeling procedure is also compared with the conventional classification of buildings. It is observed that the proposed labels properly partitions the dataset, according to buildings' potential to undergo energy retrofit.
暂无评论