Generative opposed Networks (GANs) are a generative model broadly utilized in device mastering, PC vision, and herbal language processing (NLP). GANs hire neural networks, a generator, and a discriminator that are tra...
详细信息
ISBN:
(数字)9798350370249
ISBN:
(纸本)9798350370270
Generative opposed Networks (GANs) are a generative model broadly utilized in device mastering, PC vision, and herbal language processing (NLP). GANs hire neural networks, a generator, and a discriminator that are trained collectively to be able to generate realistic-looking statistics, along with photographs, audio clips, or 3-D scenes. In this painting, we are aware of $3-\mathrm{D}$ scene reconstruction using GAN-based total methods. We aim to generate 3-D scenes from given 2nd pics, allowing us to benefit from insight into the structure and layout of complicated 3-D environments. To this end, we endorse leveraging a method known as inverse rendering to enhance the accuracy of the reconstructed 3-D scenes. We compare our method to the usage of artificial and real-international images, and the effects display the efficacy of our approach. Finally, we talk about capacity destiny guidelines for three-D scene reconstruction and the usage of GAN-based total strategies.
In monocular camera-based end-to-end driving, the vehicle driving parameters such as steering angle, speed etc are directly estimated from the camera using deep learning. On the other hand, in traditional autonomous d...
详细信息
ISBN:
(纸本)9781450366151
In monocular camera-based end-to-end driving, the vehicle driving parameters such as steering angle, speed etc are directly estimated from the camera using deep learning. On the other hand, in traditional autonomous driving, these parameters are estimated using multiple modules such as sensing, behaviour generation, path planning and control. Owing to its ability to directly estimate the driving parameters, the end-to-end driving framework has received significant attention from the research community. In this paper, we present a novel stereo-based deep learning framework for end-to-driving, where the depth and appearance information generated using the stereo camera, are integrated to improve the steering angle prediction accuracy, especially for varying illumination conditions. Validation of the proposed algorithm is performed using multiple sequences of pre-defined driving routes with an expert driver. Each pre-defined driving route is acquired over multiple days with varying illumination conditions. Utilizing the acquired dataset, we show that the steering angle prediction accuracy of stereo-based end-to-end driving is better than monocular camera-based end-to-end driving.
Zero-shot learning (ZSL) for visual recognition aims at identifying the previously unseen class samples given a trained model on the labeled visual samples of seen classes and additional class-level semantic side info...
详细信息
ISBN:
(纸本)9781450366151
Zero-shot learning (ZSL) for visual recognition aims at identifying the previously unseen class samples given a trained model on the labeled visual samples of seen classes and additional class-level semantic side information for all classes. Often ZSL is tackled by learning an embedding function from the visual to semantic space or vice-versa. However, learning this mapping often results in loss of discriminative property of learned embedding space, thus severely compromising the recognition performance on the test samples. In order to ensure improved discrimination in the embedding space, we introduce a ZSL framework by leveraging the intuitive idea of cross-domain triplets based metric learning for learning such a space. Additionally, we introduce a novel graph Laplacian based regularizer which aligns the graph structures of the visual and semantic spaces in the learned embedding space. Simultaneously optimizing boththe criteria results in a compact, discriminative, and meaningful embedding space, which is experimentally found to be superior to most of its existing counterparts on boththe standard ZSL (AwA and CUB) and the challenging generalized ZSL (AwA1, AwA2, CUB) settings.
Deep learning models trained in natural images are commonly used for different classification tasks in the medical domain. Generally, very high dimensional medical images are down-sampled by using interpolation techni...
详细信息
ISBN:
(纸本)9781450366151
Deep learning models trained in natural images are commonly used for different classification tasks in the medical domain. Generally, very high dimensional medical images are down-sampled by using interpolation techniques before feeding them to deep learning models that are imageNet compliant and accept only low-resolution images of size 224 x 224 px. this popular technique may lead to the loss of key information thus hampering the classification. Significant pathological features in medical images typically being small sized and highly affected. To combat this problem, we introduce a convolutional neural network (CNN) based classification approach which learns to reduce the resolution of the image using an autoencoder and at the same time classify it using another network, while boththe tasks are trained jointly. this algorithm guides the model to learn essential representations from high-resolution images for classification along with reconstruction. We have used the publicly available dataset of chest x-rays to evaluate this approach and have outperformed state-of-the-art on test data. Besides, we have experimented withthe effects of different augmentation approaches in this dataset and report baselines using some well known imageNet class of CNNs.
the presence of haze within the atmospheric medium degrades the quality of videos captured by camera sensors. the expulsion of haze, referred to as dehazing, is typically performed subject to the physical degradation ...
详细信息
ISBN:
(纸本)9781450366151
the presence of haze within the atmospheric medium degrades the quality of videos captured by camera sensors. the expulsion of haze, referred to as dehazing, is typically performed subject to the physical degradation display, that involves an explanation of an ill-posed inverse drawback. A few efforts have been made for image dehazing, whereas, video dehazing still remains an unexplored area of research. this paper proposes an approach for video dehazing combining the concepts of single image dehazing, optical stream estimation and Markov Random Field (MRF). the proposed method enhances the temporal and spatial coherence of the hazy video. Assuming that the dark channel of the haze-free picture is zero, we acquire the raw transmission map. In the proposed approach, we focus on the raw transmission map obtained from the dark channel prior using guided filter. We assess the forward and reverse optical streams between the neighboring frames to locate individual pixels using Linear Discriminant Analysis. the color of the haze-free pixels in the frames is approximated by a few hundred discrete colors, which generate a fixed cluster in space and the directions of the pixel. the pixels at a given cluster are spread and can be determined by analyzing the forward and in reverse optical frames to predict its value after haze removal. Largest Margin Nearest Neighbor (LMNN) algorithm is applied to get the smooth transmission map of the foggy frames of the video to approximate the pixel value in the RGB space. the stream fields are utilized in an augmented MRF model on the transmission guide obtained to enhance the temporal and the spatial coherence of the transmission. the proposed method is compared against the state-of-the-art on both real and synthetic videos to preserve the information optimally.
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. they however fail to perform well in unconstrained scenarios, especially when...
ISBN:
(纸本)9781450366151
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. they however fail to perform well in unconstrained scenarios, especially when the images are captured using surveillance cameras. these probe samples suffer from degradations such as noise, poor illumination, low resolution, blur as well as aliasing, when compared to the rich training (gallery) set, comprising mostly of mugshot images captured in laboratory settings. these images in the training (gallery) set are crisp and have high contrast, compared to the probe samples. To cope withthis scenario, we propose a novel dual-pathway generative adversarial network (DP-GAN) which maps low resolution images captured using surveillance camera into their corresponding high resolution images, which are gallery-like, using a novel combination of multi-scale reconstruction and Jensen-Shannon divergence based loss. these images thus obtained are then used to train a deep domain adaptation (deep-DA) network to perform the task of FR. the proposed network achieves superior results (>90%) on four benchmark surveillance face datasets, evident from the rank-1 recognition rates when compared with recent state-of-the-art CNN-based techniques.
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. the proposed technique is based on the assumption that, only a portion of the whole ...
ISBN:
(纸本)9781450366151
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. the proposed technique is based on the assumption that, only a portion of the whole video can be sufficient to identify an activity. Rather, we argue that, for activity recognition in egocentric videos, the proposed approach performs better than any deep learning based method. Because, in egocentric videos, often the person wiring the sensor, becomes static for long time, or moves his head frequently. In boththe cases, it becomes difficult to learn the spatiotemporal pattern of the video during action. the proposed approach divides the video into smaller video segments called Video Units. Spatio-temporal features extracted from the units, are clustered to construct the dictionary of Action Units (AU). the AUs are ranked based upon their score of likeliness. the scores are obtained by constructing a weighted graph withthe AUs as vertices and edge weights calculated based on the frequencies of occurrences of the AUs during the activity. the less significant AUs are pruned out from the dictionary, and the revised dictionary of key AUs are used for activity classification. We test our approach on benchmark egocentric dataset and achieve a good accuracy.
Studies of object detection and localization, particularly pedestrian detection have received considerable attention in recent times due to its several prospective applications such as surveillance, driving assistance...
详细信息
ISBN:
(纸本)9781450366151
Studies of object detection and localization, particularly pedestrian detection have received considerable attention in recent times due to its several prospective applications such as surveillance, driving assistance, autonomous cars, etc. Also, a significant trend of latest research studies in related problem areas is the use of sophisticated Deep Learning based approaches to improve the benchmark performance on various standard datasets. A trade-off between the speed (number of video frames processed per second) and detection accuracy has often been reported in the existing literature. In this article, we present a new but simple deep learning based strategy for pedestrian detection that improves this trade-off. Since training of similar models using publicly available sample datasets failed to improve the detection performance to some significant extent, particularly for the instances of pedestrians of smaller sizes, we have developed a new sample dataset consisting of more than 80K annotated pedestrian figures in videos recorded under varying traffic conditions. Performance of the proposed model on the test samples of the new dataset and two other existing datasets, namely Caltech Pedestrian Dataset (CPD) and CityPerson Dataset (CD) have been obtained. Our proposed system shows nearly 16% improvement over the existing state-of-the-art result.
In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely indian Classical Dance (ICD) classification. the variation in such dance forms in term...
详细信息
ISBN:
(纸本)9781450366151
In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely indian Classical Dance (ICD) classification. the variation in such dance forms in terms of hand and body postures, facial expressions or emotions and head orientation makes pose estimation an extremely challenging task. To circumvent this problem, we construct a pose-oblivious shape signature which is fed to a sequence learning framework. the pose signature representation is done in two-fold process. First, we represent person-pose in first frame of a dance video using symmetric Spatial Transformer Networks (STN) to extract good person object proposals and CNN-based parallel single person pose estimator (SPPE). Next, the pose basis are converted to pose flows by assigning a similarity score between successive poses followed by non-maximal suppression. Instead of feeding a simple chain of joints in the sequence learner which generally hinders the network performance we constitute a feature vector of the normalized distance vectors, flow, angles between anchor joints which captures the adjacency configuration in the skeletal pattern. thus, the kinematic relationship amongst the body joints across the frames using pose estimation helps in better establishing the spatio-temporal dependencies. We present an exhaustive empirical evaluation of state-of-the-art deep network based methods for dance classification on ICD dataset.
Last decade has witnessed rapid growth for the popularity of Convolutional Neural Networks (CNNs), in detecting and classifying objects. the self trainable nature of CNNs makes them the strongest candidate as a classi...
ISBN:
(纸本)9781450366151
Last decade has witnessed rapid growth for the popularity of Convolutional Neural Networks (CNNs), in detecting and classifying objects. the self trainable nature of CNNs makes them the strongest candidate as a classifier and a feature extractor. However, many of the existing CNN architectures fail recognizing texts or objects under input rotation and scaling. this paper introduces an elegant approach, 'Scale and Rotation Corrected CNN (SRC-CNN)' for scale and rotation invariant text recognition, exploiting the concept of principal component of characters. Prior to training and testing with baseline CNN, 'SRC-CNN' maps each character image to a reference orientation and scale, which is again derived from the character image itself. SRC-CNN is capable of recognizing characters in a document, even though they differ in orientation and scale greatly. the proposed method does not demand any training with samples which are scaled or rotated. the performance of proposed approach is validated on different character data sets like MNIST, MNIST_rot_12k and English alphabets and compared with state of the art rotation invariant classification networks. SRC-CNN is a generalized approach and can be extended for rotation and scale invariant classification of many other datasets as well, choosing any appropriate baseline CNN. Here we have demonstrated the generality of the proposed SRC-CNN on MNIST Fashion data set and found to perform well in rotation and scale invariant classification of objects as well. this paper demonstrates how the basic PCA based rotation and scale invariant image recognition can be integrated to CNN for achieving better rotational and scale invariances in classification.
暂无评论