Since the proposal of a fast learning algorithm for deep belief networks in 2006, the deep learning techniques have drawn ever-increasing research interests because of their inherent capability of overcoming the drawb...
详细信息
Since the proposal of a fast learning algorithm for deep belief networks in 2006, the deep learning techniques have drawn ever-increasing research interests because of their inherent capability of overcoming the drawback of traditional algorithms dependent on hand-designed features. Deep learning approaches have also been found to be suitable for big data analysis with successful applications to computer vision, pattern recognition, speech recognition, natural language processing, and recommendation systems. In this paper, we discuss some widely used deep learning architectures and their practical applications. An up-to-date overview is provided on four deep learning architectures, namely, autoencoder, convolutional neural network, deep belief network, and restricted Boltzmann machine. Different types of deep neural networks are surveyed and recent progresses are summarized. Applications of deep learning techniques on some selected areas (speech recognition, pattern recognition and computer vision) are highlighted. A list of future research topics are finally given with clear justifications.
Different modalities have been proved to carry various information. This paper aims to study how the multiple face regions/channels and multiple models (e.g., hand-crafted and unsupervised learning methods) answer to ...
详细信息
Different modalities have been proved to carry various information. This paper aims to study how the multiple face regions/channels and multiple models (e.g., hand-crafted and unsupervised learning methods) answer to the face recognition problem. Hand crafted and deep feature learning techniques have been proposed and applied to estimate discriminative features in object recognition problems. In our Multi-Channel Multi-Model feature learning (McMmFL) system, we propose a new autoencoder (AE) optimization that integrates the alternating direction method of multipliers (ADMM). One of the advantages of our AE is dividing the energy formulation into several sub-units that can be used to paralyze/distribute the optimization tasks. Furthermore, the proposed method uses the advantage of K-means clustering and histogram of gradients (HOG) to boost the recognition rates. McMmFL outperforms the best results reported on the literature on three benchmark facial data sets that include AR, Yale, and PubFig83 with 95.04%, 98.97%, 95.85% rates, respectively. (C) 2016 Published by Elsevier B.V.
Multilayer feedforward neural networks (MFNNs) have been widely used for classification or approximation of nonlinear mappings described by a data set consisting of input and output samples. In many MFNN applications,...
详细信息
Multilayer feedforward neural networks (MFNNs) have been widely used for classification or approximation of nonlinear mappings described by a data set consisting of input and output samples. In many MFNN applications, a common compressive sensing task is to find the redundant dimensions of the input data. The aim of a regularization technique presented in this paper is to eliminate the redundant dimensions and to achieve compression of the input layer. It is achieved by introducing an L-1 or L-1/2 regularizer to the input layer weights training. As a comparison, in the existing references, a regularization method is usually applied to the hidden layer for a better representation of the dataset and sparsification of the network. Gradient-descent method is used for solving the resulting optimization problem. Numerical experiments including a simulated approximation problem and three classification problems (Monk, Sonar, and the MNIST data set) have been used to illustrate the algorithm.
In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high...
详细信息
ISBN:
(纸本)9781509041183
In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high expressiveness. The dictionary is trained by each target source without the mixture signal, which makes the system independent from the context where the dictionaries will be used. In separation process, the decoder portions of the trained autoencoders are used as dictionaries to find the activations in a iterative manner such that a summation of the decoder outputs approximates the original mixture. The results of the instruments source separation experiments revealed that the separation performance of the proposed method was superior to that of the NMF.
Abundance and availability of video capture devices, such as mobile phones and surveillance cameras, have instigated research in video face recognition, which is highly pertinent in law enforcement applications. While...
详细信息
Abundance and availability of video capture devices, such as mobile phones and surveillance cameras, have instigated research in video face recognition, which is highly pertinent in law enforcement applications. While the current approaches have reported high accuracies at equal error rates, performance at lower false accept rates requires significant improvement. In this paper, we propose a novel face verification algorithm, which starts with selecting feature-rich frames from a video sequence using discrete wavelet transform and entropy computation. Frame selection is followed by representation learning-based feature extraction, where three contributions are presented: 1) deep learning architecture, which is a combination of stacked denoising sparse autoencoder (SDAE) and deep Boltzmann machine (DBM);2) formulation for joint representation in an autoencoder;and 3) updating the loss function of DBM by including sparse and low rank regularization. Finally, a multilayer neural network is used as the classifier to obtain the verification decision. The results are demonstrated on two publicly available databases, YouTube Faces and Point and Shoot Challenge. Experimental analysis suggests that: 1) the proposed featurerichness-based frame selection offers noticeable and consistent performance improvement compared with frontal only frames, random frames, or frame selection using perceptual no-reference image quality measures and 2) joint feature learning in SDAE and sparse and low rank regularization in DBM helps in improving face verification performance. On the benchmark Point and Shoot Challenge database, the algorithm yields the verification accuracy of over 97% at 1% false accept rate whereas, on the YouTube Faces database, over 95% verification accuracy is observed at equal error rate.
Scene classification of high-resolution remote sensing (HRRS) image is an important research topic and has been applied broadly in many fields. Deep learning method has shown its high potential to in this domain, owin...
详细信息
Scene classification of high-resolution remote sensing (HRRS) image is an important research topic and has been applied broadly in many fields. Deep learning method has shown its high potential to in this domain, owing to its powerful learning ability of characterizing complex patterns. However the deep learning methods omit some global and local information of the HRRS image. To this end, in this article we show efforts to adopt explicit global and local information to provide complementary information to deep models. Specifically, we use a patch based MS-CLBP method to acquire global and local representations, and then we consider a pretrained CNN model as a feature extractor and extract deep hierarchical features from full-connection layers. After fisher vector (FV) encoding, we obtain the holistic visual representation of the scene image. We view the scene classification as a reconstruction procedure and train several class-specific stack denoising autoencoders (SDAEs) of corresponding class, i.e., one SDAE per class, and classify the test image according to the reconstruction error. Experimental results show that our combination method outperforms the state-of-the-art deep learning classification methods without employing fine-tuning.
In this letter, we show that reconstruction of an image passed through a neural network is possible, using only the locations of the max pool activations. This was demonstrated with an architecture consisting of an en...
详细信息
In this letter, we show that reconstruction of an image passed through a neural network is possible, using only the locations of the max pool activations. This was demonstrated with an architecture consisting of an encoder and a decoder. The decoder is amirrored version of the encoder, where convolutions are replaced with deconvolutions and poolings are replaced with unpooling layers. The locations of the max pool switches are transmitted to the corresponding unpooling layer. The reconstruction is computed only from these switches without the use of feature values. Using only the max switch location information of the pool layers, a surprisingly good image reconstruction can be achieved. We examine this effect in various architectures, as well as how the quality of the reconstruction is affected by the number of features. We also compare the reconstruction with an encoder with randomly initialized weights with an encoder pretrained for classification. Finally, we give recommendations for future architecture decisions.
The phonological features are the most basic unit of a speech knowledge hierarchy. This paper reports about detection and classification of phonological features from Bengali continuous speech. The phonological featur...
详细信息
The phonological features are the most basic unit of a speech knowledge hierarchy. This paper reports about detection and classification of phonological features from Bengali continuous speech. The phonological features are based on place and manner of articulation. All the experiments are performed by a deep neural network based framework. Two different models are designed for the detection and classification task. The deep-structured models are pre-trained by stacked autoencoder. The C-DAC speech corpus is used for continuous spoken Bengali speech data. Frame wise cepstral representation is provided in the input layer of the deep-structured model. Speech data from multiple speakers has been used to confirm speaker-independency. In detection task, the system achieved 86.19% average overall accuracy. In the classification task, accuracy for the classification of place of articulation remains low with 50.2% while in manner-based classification, the system delivered an improved performance with 98.9% accuracy.
Secondary structure and solvent accessibility prediction provide valuable information for estimating the three dimensional structure of a protein. As new feature extraction methods are developed the dimensionality of ...
详细信息
Secondary structure and solvent accessibility prediction provide valuable information for estimating the three dimensional structure of a protein. As new feature extraction methods are developed the dimensionality of the input feature space increases steadily. Reducing the number of dimensions provides several advantages such as faster model training, faster prediction and noise elimination. In this work, several dimensionality reduction techniques have been employed including various feature selection methods, autoencoders and PCA for protein secondary structure and solvent accessibility prediction. The reduced feature set is used to train a support vector machine at the second stage of a hybrid classifier. Cross-validation experiments on two difficult benchmarks demonstrate that the dimension of the input space can be reduced substantially while maintaining the prediction accuracy. This will enable the incorporation of additional informative features derived for predicting the structural properties of proteins without reducing the accuracy due to overfitting.
Community detection has long been a fascinating topic in complex networks since the community structure usually unveils valuable information of interest. The prevalence and evolution of deep learning and neural networ...
详细信息
Community detection has long been a fascinating topic in complex networks since the community structure usually unveils valuable information of interest. The prevalence and evolution of deep learning and neural networks have been pushing forward the advancement in various research fields and also provide us numerous useful and off the shelf techniques. In this paper, we put the cascaded stacked autoencoders and the unsupervised extreme learning machine (ELM) together in a two-level embedding process and propose a novel community detection algorithm. Extensive comparison experiments in circumstances of both synthetic and real-world networks manifest the advantages of the proposed algorithm. On one hand, it outperforms the k-means clustering in terms of the accuracy and stability thus benefiting from the determinate dimensions of the ELM block and the integration of sparsity restrictions. On the other hand, it endures smaller complexity than the spectral clustering method on account of the shrinkage in time spent on the eigenvalue decomposition procedure.
暂无评论