In this paper we present several descriptors for feature-based matching based on autoencoders, and we evaluate the performance of these descriptors. In a training phase, we learn autoencoders from image patches extrac...
详细信息
ISBN:
(纸本)9781629935201
In this paper we present several descriptors for feature-based matching based on autoencoders, and we evaluate the performance of these descriptors. In a training phase, we learn autoencoders from image patches extracted in local windows surrounding key points determined by the Difference of Gaussian extractor. In the matching phase, we construct key point descriptors based on the learned autoencoders, and we use these descriptors as the basis for local keypoint descriptor matching. Three types of descriptors based on autoencoders are presented. To evaluate the performance of these descriptors, recall and 1-precision curves are generated for different kinds of transformations, e.g. zoom and rotation, viewpoint change, using a standard benchmark data set. We compare the performance of these descriptors with the one achieved for SIFT. Early results presented in this paper show that, whereas SIFT in general performs better than the new descriptors, the descriptors based on autoencoders show some potential for feature based matching.
The effect of using autoencoders for dimensionality reduction of a medical data set is investigated. A stack of two autoencoders has been trained for popular benchmark medical data set for dermatological disease diagn...
详细信息
ISBN:
(纸本)9781509023868
The effect of using autoencoders for dimensionality reduction of a medical data set is investigated. A stack of two autoencoders has been trained for popular benchmark medical data set for dermatological disease diagnosis. The improvement of the presented approach has been visualized by the Principal Component Analysis method. Results shows that the use of a autoencoders significantly improves the accuracy of dermatological disease diagnosis.
Machine learning models rely on learned parameters adapted to a given set of data to perform a task, such as classifying images or generating sentences. These learned parameters form latent spaces which can adapt vari...
详细信息
Machine learning models rely on learned parameters adapted to a given set of data to perform a task, such as classifying images or generating sentences. These learned parameters form latent spaces which can adapt various properties, to im- pact how well the model performs. Enabling a model to better fit properties of the data in its latent space can im- prove the performance of the model. One criteria for quality is the set of properties expressed by the latent space - that is, for example, topological properties of the learned representation. We develop a model which leverages a variational autoencoder's generative abil- ity and augments it with the ladder network's lateral connections for discrimination. We propose a method to decouple two tasks perfomed by convolutional layers (that of learning useful filters for feature extraction, and that of arranging the learned filters such that the next layer may train effectively) by using interspersed fully-connected layers. Finally, we apply batch normalization to the recurrent state of the pixel-rnn layer and show that it significantly improves convergence speed as well as slightly improving overall performance. We show results applied in unsupervised and supervised settings, and augment models with various inter-layer interractions, such as encoder-to-decoder connec- tions, affine post-layer transformations, and side-network connections. The effects of the proposed methods are assessed by measuring supervised performance or the quality of samples produced by the model and training curves. Models and meth- ods are tested on popular image datasets such as MNIST and CIFAR10 and are compared to the state-of-the-art on the task they are applied to.
This paper mainly is to compare vowel recognition of mandarin isolated words using different neural network architectures including MLP, RBFN, and DNN. MFCC features extracted and preprocessed from voice signal are se...
详细信息
This paper mainly is to compare vowel recognition of mandarin isolated words using different neural network architectures including MLP, RBFN, and DNN. MFCC features extracted and preprocessed from voice signal are served as the input data. We find MLP and the pretrained version of it, DNN, are both comparable as well as superior to RBFN in terms of recognition rate. The properties of each kind of neural network are also graphed and explored. Both MLP and RBFN decrease word error rates rapidly in the early stage of learning. DNN gets a very good start after pretraining. Many tentative methods revising the standard algorithms are further conducted, trying to improve the recognition. With proper design, the speaker-dependent speech recognition rate can achieve 95.4%. Our constructive scheme of neural network also substantially shortens the training time which is an issue for deeper or wider neural networks.
The field of machine learning deals with a huge amount of various algorithms, which are able to transform the observed data into many forms and dimensionality reduction (DR) is one of such transformations. There are m...
详细信息
The field of machine learning deals with a huge amount of various algorithms, which are able to transform the observed data into many forms and dimensionality reduction (DR) is one of such transformations. There are many high quality papers which compares some of the DR's approaches and of course there other experiments which applies them with success. Not everyone is focused on information lost, increase of relevance or decrease of uncertainty during the transformation, which is hard to estimate and only few studies remark it briefly. This study aims to explain these inner features of four different DR's algorithms. These algorithms were not chosen randomly, but in purpose. It is chosen some representative from all of the major DR's groups. The comparison criteria are based on statistical dependencies, such as Correlation Coefficient, Euclidean Distance, Mutual Information and Granger causality. The winning algorithm should reasonably transform the input dataset with keeping the most of the inner dependencies.
This article considers the problem of cross-modal retrieval, such as using a text query to search for images and vice-versa. Based on different autoencoders, several novel models are proposed here for solving this pro...
详细信息
This article considers the problem of cross-modal retrieval, such as using a text query to search for images and vice-versa. Based on different autoencoders, several novel models are proposed here for solving this problem. These models are constructed by correlating hidden representations of a pair of autoencoders. A novel optimal objective, which minimizes a linear combination of the representation learning errors for each modality and the correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimizing the correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimizing the representation learning error makes hidden representations good enough to reconstruct inputs of each modality. To balance the two kind of errors induced by representation learning and correlation learning, we set a specific parameter in our models. Furthermore, according to the modalities the models attempt to reconstruct they are divided into two groups. One group including three models is named multimodal reconstruction correspondence autoencoder since it reconstructs both modalities. The other group including two models is named unimodal reconstruction correspondence autoencoder since it reconstructs a single modality. The proposed models are evaluated on three publicly available datasets. And our experiments demonstrate that our proposed correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multimodal deep models on cross-modal retrieval tasks.
It has been long debated how the so called cognitive map, the set of place cells, develops in rat hippocampus. The function of this organ is of high relevance, since the hippocampus is the key component of the medial ...
详细信息
It has been long debated how the so called cognitive map, the set of place cells, develops in rat hippocampus. The function of this organ is of high relevance, since the hippocampus is the key component of the medial temporal lobe memory system, responsible for forming episodic memory, declarative memory, the memory for facts and rules that serve cognition in humans. Here, a general mechanism is put forth: We introduce the novel concept of Cartesian factors. We show a non-linear projection of observations to a discretized representation of a Cartesian factor in the presence of a representation of a complementing one. The computational model is demonstrated for place cells that we produce from the egocentric observations and the head direction signals. Head direction signals make the observed factor and sparse allothetic signals make the complementing Cartesian one. We present numerical results, connect the model to the neural substrate, and elaborate on the differences between this model and other ones, including Slow Feature Analysis [17] .
The deep learning based trackers can always achieve high tracking precision and strong adaptability in different scenarios. However, due to the fact that the number of the parameter is large and the fine-tuning is cha...
详细信息
The deep learning based trackers can always achieve high tracking precision and strong adaptability in different scenarios. However, due to the fact that the number of the parameter is large and the fine-tuning is challenging, the time complexity is high. In order to improve the efficiency, we proposed a tracker based on fast deep learning through constructing a new network with less redundancy. Based on the theory of deep learning, we proposed a deep neural network to describe essential features of images. Furthermore, fast deep learning can be achieved by restricting the size of network. With the help of GPU, the time complexity of the network training is released to a large extent. Under the framework of particle filter, the proposed method combined the deep learning extractor with an SVM scoring professor to distinguish the target from the background. The condensed network structure reduced the complexity of the model. Compared with some other deep learning based tracker, the proposed method can achieve higher efficiency. The frame rate keeps at 22 frames per second on average. Experiments on an open tracking benchmark demonstrate that both the robustness and the timeliness of the proposed tracker are promising when the appearance of the target changes containing translation, rotation and scale or the interference containing illumination, occlusion and cluttered background. Unfortunately, it is not robust enough when the target moves fast or the motion blur and some similar objects exist.
Identifying drug-target interactions (DTIs) is a major challenge in drug development. Traditionally, similarity-based methods use drug and target similarity matrices to infer the potential drug-target interactions. Bu...
详细信息
ISBN:
(纸本)9781509006212
Identifying drug-target interactions (DTIs) is a major challenge in drug development. Traditionally, similarity-based methods use drug and target similarity matrices to infer the potential drug-target interactions. But these techniques do not handle biochemical data directly. While recent feature-based methods reveal simple patterns of physicochemical properties, efficient method to study large interactive features and precisely predict interactions is still missing. Deep learning has been found to be an appropriate tool for converting high-dimensional features to low-dimensional representations. These deep representations generated from drug-protein pair can serve as training examples for the interaction predictor. In this paper, we propose a promising approach called multi-scale features deep representations inferring interactions (MFDR). We extract the large-scale chemical structure and protein sequence descriptors so as to machine learning model predict if certain human target protein can interact with a specific drug. MFDR use Auto-Encoders as building blocks of deep network for reconstruct drug and protein features to low-dimensional new representations. Then, we make use of support vector machine to infer the potential drug-target interaction from deep representations. The experiment result shows that a deep neural network with Stacked Auto-Encoders exactly output interactive representations for the DTIs prediction task. MFDR is able to predict large-scale drug-target interactions with high accuracy and achieves results better than other feature-based approaches.
Land-use classification using remote sensing images covers a wide range of applications. With more detailed spatial and textural information provided in very high resolution (VHR) remote sensing images, a greater rang...
详细信息
Land-use classification using remote sensing images covers a wide range of applications. With more detailed spatial and textural information provided in very high resolution (VHR) remote sensing images, a greater range of objects and spatial patterns can be observed than ever before. This offers us a new opportunity for advancing the performance of land-use classification. In this paper, we first introduce an effective midlevel visual elements-oriented land-use classification method based on "partlets," which are a library of pretrained part detectors used for midlevel visual elements discovery. Taking advantage of midlevel visual elements rather than low-level image features, a partlets-based method represents images by computing their responses to a large number of part detectors. As the number of part detectors grows, a main obstacle to the broader application of this method is its computational cost. To address this problem, we next propose a novel framework to train coarse-to-fine shared intermediate representations, which are termed "sparselets," from a large number of pretrained part detectors. This is achieved by building a single-hidden-layer autoencoder and a single-hidden-layer neural network with an L0-norm sparsity constraint, respectively. Comprehensive evaluations on a publicly available 21-class VHR land-use data set and comparisons with state-of-the-art approaches demonstrate the effectiveness and superiority of this paper.
暂无评论