Egocentric activity recognition has recently generated great popularity in computer vision due to its widespread applications in egocentric video analysis. However, it poses new challenges comparing to the conventiona...
详细信息
Egocentric activity recognition has recently generated great popularity in computer vision due to its widespread applications in egocentric video analysis. However, it poses new challenges comparing to the conventional third-person activity recognition tasks, which are caused by significant body shaking, varied lengths, and poor recoding quality, etc. To handle these challenges, in this paper, we propose deep appearance and motion learning (DAML) for egocentric activity recognition, which leverages the great strength of deep learning networks in feature learning. In contrast to hand- crafted visual features or pre-trained convolutional neural network (CNN) features with limited generality to new egocentric videos, the proposed DAML is built on the deep autoencoder (DAE), and directly extracts appearance and motion feature, the main cue of activities, from egocentric videos. The DAML takes advantages of the great effectiveness and efficiency of the DAE in unsupervised feature learning, which provides a new representation learning framework of egocentric videos. The learned appearance and motion features by the DAML are seamlessly fused to accomplish a rich informative egocentric activity representation which can be readily fed into any supervised learning models for activity recognition. Experimental results on two challenging benchmark datasets show that the DAML achieves high performance on both short- and long-term egocentric activity recognition tasks, which is comparable to or even better than the state-of-the-art counterparts. (C) 2017 Elsevier B.V. All rights reserved.
Deep learning, in particular the deep convolutional neural networks, has received increasing interests in face recognition recently, and a number of deep learning methods have been proposed. This paper summarizes abou...
详细信息
Deep learning, in particular the deep convolutional neural networks, has received increasing interests in face recognition recently, and a number of deep learning methods have been proposed. This paper summarizes about 330 contributions in this area. It reviews major deep learning concepts pertinent to face image analysis and face recognition, and provides a concise overview of studies on specific face recognition problems, such as handling variations in pose, age, illumination, expression, and heterogeneous face matching. A summary of databases used for deep face recognition is given as well. Finally, some open challenges and directions are discussed for future research.
In order to realize automation of the pollutant emission tests of vehicles, a pedal robot is designed instead of a human-driven vehicle. Sometimes, the actual time-speed curve of the vehicle will deviate from the uppe...
详细信息
In order to realize automation of the pollutant emission tests of vehicles, a pedal robot is designed instead of a human-driven vehicle. Sometimes, the actual time-speed curve of the vehicle will deviate from the upper or lower limit of the worldwide light-duty test cycle (WLTC) target curve, which will cause a fault. In this paper, a new fault diagnosis method is proposed and applied to the pedal robot. Since principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and autoencoder cannot extract feature information adequately when they are used alone, three types of feature components extracted by PCA, t-SNE, and autoencoder are fused to form a nine-dimensional feature set. Then, the feature set is reduced into three-dimensional space via Treelet Transform. Finally, the fault samples are classified by Gaussian process classifier. Compared with the methods using only one algorithm to extract features, the proposed method has the minimum standard deviation, 0.0078, and almost the maximum accuracy, 98.17%. The accuracy of the proposed method is only 0.24% lower than that without Treelet Transform, but the processing time is 6.73% less than that without Treelet Transform. These indicate that the multi-features fusion model and Treelet Transform method is quite effective. Therefore, the proposed method is quite helpful for fault diagnosis of the pedal robot.
Online social networks, World Wide Web, media, and technological networks, and other types of so-called Information networks are ubiquitous nowadays. These information networks are inherently heterogeneous and dynamic...
详细信息
Online social networks, World Wide Web, media, and technological networks, and other types of so-called Information networks are ubiquitous nowadays. These information networks are inherently heterogeneous and dynamic. They are heterogeneous as they consist of multi-typed objects and relations, and they are dynamic as they are constantly evolving over time. One of the challenging issues in such heterogeneous and dynamic environments is to forecast those relationships in the network that will appear in the future. In this article, we try to solve the problem of continuous-time relationship prediction in dynamic and heterogeneous information networks. This implies predicting the time it takes for a relationship to appear in the future, given its features that have been extracted by considering both heterogeneity and temporal dynamics of the underlying network. To this end, we first introduce a feature extraction framework that combines the power of meta-path-based modeling and recurrent neural networks to effectively extract features suitable for relationship prediction regarding heterogeneity and dynamicity of the networks. Next, we propose a supervised non-parametric approach, called Non-Parametric Generalized Linear Model (NIP-GLM), which infers the hidden underlying probability distribution of the relationship building time given its features. We then present a learning algorithm to train NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on synthetic data and three real-world datasets, namely Delicious, MovieLens, and DBLP, demonstrate the effectiveness of NP-am in solving continuous-time relationship prediction problem vis-a-vis competitive baselines.
PurposeConvolutional neural networks have become rapidly popular for image recognition and image analysis because of its powerful potential. In this paper, we developed a method for classifying subtypes of lung adenoc...
详细信息
PurposeConvolutional neural networks have become rapidly popular for image recognition and image analysis because of its powerful potential. In this paper, we developed a method for classifying subtypes of lung adenocarcinoma from pathological images using neural network whose that can evaluate phenotypic features from wider area to consider cellular *** order to recognize the types of tumors, we need not only to detail features of cells, but also to incorporate statistical distribution of the different types of cells. Variants of autoencoders as building blocks of pre-trained convolutional layers of neural networks are implemented. A sparse deep autoencoder which minimizes local information entropy on the encoding layer is then proposed and applied to images of size . We applied this model for feature extraction from pathological images of lung adenocarcinoma, which is comprised of three transcriptome subtypes previously defined by the Cancer Genome Atlas network. Since the tumor tissue is composed of heterogeneous cell populations, recognition of tumor transcriptome subtypes requires more information than local pattern of cells. The parameters extracted using this approach will then be used in multiple reduction stages to perform classification on larger *** were able to demonstrate that these networks successfully recognize morphological features of lung adenocarcinoma. We also performed classification and reconstruction experiments to compare the outputs of the variants. The results showed that the larger input image that covers a certain area of the tissue is required to recognize transcriptome subtypes. The sparse autoencoder network with input provides a 98.9% classification *** study shows the potential of autoencoders as a feature extraction paradigm and paves the way for a whole slide image analysis tool to predict molecular subtypes of tumors from pathological features.
This paper presents a novel automatic facial expressions recognition system (AFERS) using the deep network framework. The proposed AFERS consists of four steps: 1) geometric features extraction;2) regional local binar...
详细信息
This paper presents a novel automatic facial expressions recognition system (AFERS) using the deep network framework. The proposed AFERS consists of four steps: 1) geometric features extraction;2) regional local binary pattern (LBP) features extraction;3) fusion of both the features using autoencoders;and 4) classification using Kohonen self-organizing map (SOM)-based classifier. This paper makes three distinct contributions. The proposed deep network consisting of autoencoders and the SOM-based classifier is computationally more efficient and performance wise more accurate. The fusion of geometric features with LBP features using autoencoders provides better representation of facial expression. The SOM-based classifier proposed in this paper has been improved by making use of a soft-threshold logic and a better learning algorithm. The performance of the proposed approach is validated on two widely used databases (DBs): 1) MMI and 2) extended Cohn-Kanade (CK+). An average recognition accuracy of 97.55% in MMI DB and 98.95% in CK+ DB are obtained using the proposed algorithm. The recognition results obtained from fused features are found to be distinctly superior to both recognition using individual features as well as recognition with a direct concatenation of the individual feature vectors. Simulation results validate that the proposed AFERS is more efficient as compared to the existing approaches.
This paper aims to design and implement a system capable of distinguishing between different activities carried out during a tennis match. The goal is to achieve the correct classification of a set of tennis strokes. ...
详细信息
This paper aims to design and implement a system capable of distinguishing between different activities carried out during a tennis match. The goal is to achieve the correct classification of a set of tennis strokes. The system must exhibit robustness to the variability of the height, age or sex of any subject that performs the actions. A new database is developed to meet this objective. The system is based on two sensor nodes using Bluetooth Low Energy (BLE) wireless technology to communicate with a PC that acts as a central device to collect the information received by the sensors. The data provided by these sensors are processed to calculate their spectrograms. Through the application of innovative deep learning techniques with semi-supervised training, it is possible to carry out the extraction of characteristics and the classification of activities. Preliminary results obtained with a data set of eight players, four women and four men have shown that our approach is able to address the problem of the diversity of human constitutions, weight and sex of different players, providing accuracy greater than 96.5% to recognize the tennis strokes of a new player never seen before by the system.
High peak-to-average power ratio (PAPR) has been one of the major drawbacks of orthogonal frequency division multiplexing (OFDM) systems. In this letter, we propose a novel PAPR reduction scheme, known as PAPR reducin...
详细信息
High peak-to-average power ratio (PAPR) has been one of the major drawbacks of orthogonal frequency division multiplexing (OFDM) systems. In this letter, we propose a novel PAPR reduction scheme, known as PAPR reducing network (PRNet), based on the autoencoder architecture of deep learning. In the PRNet, the constellation mapping and demapping of symbols on each subcarrier is determined adaptively through a deep learning technique, such that both the bit error rate (BER) and the PAPR of the OFDM system are jointly minimized. We used simulations to show that the proposed scheme outperforms conventional schemes in terms of BER and PAPR.
We previously have applied deep autoencoder (DAE) for noise reduction arid speech enhancement. However, the DAE was trained using only clean speech. In this study, by using noisy clean training pairs, we further intro...
详细信息
ISBN:
(纸本)9781629934433
We previously have applied deep autoencoder (DAE) for noise reduction arid speech enhancement. However, the DAE was trained using only clean speech. In this study, by using noisy clean training pairs, we further introduce a denoising process in learning the DAE. In training the DAE, we still adopt greedy layer-wised pretraining plus fine tuning strategy. In pretraining, each layer is trained as a one-hidden-layer neural autoencoder (AE) using noisy-clean speech pairs as input and output (or transformed noisy-clean speech pairs by preceding AEs). Fine tuning was done by stacking all AEs with pretrained parameters for initialization. The trained DAE is used as a filter for speech estimation when noisy speech is given. Speech enhancement experiments were done to examine the performance of the trained denoising DAE. Noise reduction, speech distortion, and perceptual evaluation of speech quality (PESQ) criteria are used in the performance evaluations. Experimental results show that adding depth of the DAE consistently increase the performance when a large training data set is given. In addition, compared with a minimum mean square error based speech enhancement algorithm, our proposed denoising DAE provided superior performance on the three objective evaluations.
Sparse code multiple access (SCMA) is a promising code-based non-orthogonal multiple-access technique that can provide improved spectral efficiency and massive connectivity meeting the requirements of 5G wireless comm...
详细信息
Sparse code multiple access (SCMA) is a promising code-based non-orthogonal multiple-access technique that can provide improved spectral efficiency and massive connectivity meeting the requirements of 5G wireless communication systems. We propose a deep learning-aided SCMA (D-SCMA) in which the codebook that minimizes the bit error rate (BER) is adaptively constructed, and a decoding strategy is learned using a deep neural network-based encoder and decoder. One benefit of D-SCMA is that the construction of an efficient codebook can be achieved in an automated manner, which is generally difficult due to the non-orthogonality and multi-dimensional traits of SCMA. We use simulations to show that our proposed scheme provides a lower BER with a smaller computation time than conventional schemes.
暂无评论