Co-saliency detection refers to the computational process for identification of common but prominent and salient foreground regions in an image. However most of the co-saliency detection methods suffer from the follow...
ISBN:
(纸本)9781450366151
Co-saliency detection refers to the computational process for identification of common but prominent and salient foreground regions in an image. However most of the co-saliency detection methods suffer from the following two limitations. First, co-saliency detection models largely generate superpixel level co-saliency maps that leads to sacrifice of significant information from the pixel level input images. Second, co-saliency detection frameworks mostly involve redesigned models for detection of co-salient objects in an image group, instead of utilization of the existing single image saliency detection models. To address these problems, we propose a novel framework, Co-saliency via Regularized Random Walk Ranking (CR2WR), which provides highly efficient pixel level co-saliency maps and utilizes existing saliency models on a single image to detect co-salient objects in an image sequence. this is achieved by: (1) Introducing Regularized random walk as the ranking function for a two-stage co-saliency detection framework. (2) Novel weighting function to incorporate more image information in graph construction and utilization of normalized Laplacian matrix for efficient cosaliency maps. (3) Generated saliency maps are fused further with high level priors namely, Location and Objectness priors, that enhances detection of co-salient regions. Suitably designed novel objective functions provide an enriched solution. the proposed model is evaluated on challenging benchmark co-saliency datasets. It is demonstrated that the proposed method outperforms prominent state-of-the-art methods in terms of efficiency and computational time.
Deep learning models trained in natural images are commonly used for different classification tasks in the medical domain. Generally, very high dimensional medical images are down-sampled by using interpolation techni...
详细信息
ISBN:
(纸本)9781450366151
Deep learning models trained in natural images are commonly used for different classification tasks in the medical domain. Generally, very high dimensional medical images are down-sampled by using interpolation techniques before feeding them to deep learning models that are imageNet compliant and accept only low-resolution images of size 224 x 224 px. this popular technique may lead to the loss of key information thus hampering the classification. Significant pathological features in medical images typically being small sized and highly affected. To combat this problem, we introduce a convolutional neural network (CNN) based classification approach which learns to reduce the resolution of the image using an autoencoder and at the same time classify it using another network, while boththe tasks are trained jointly. this algorithm guides the model to learn essential representations from high-resolution images for classification along with reconstruction. We have used the publicly available dataset of chest x-rays to evaluate this approach and have outperformed state-of-the-art on test data. Besides, we have experimented withthe effects of different augmentation approaches in this dataset and report baselines using some well known imageNet class of CNNs.
the importance for video-based monitoring systems is on the rise leading to the growth of interest in the field of computervision. Withthe increase of human population, crowd needs to be monitored, be it in a public...
详细信息
image segmentation is a very important aspect in the fields of computervision and pattern recognition. Although Pulse-coupled Neural Network (PCNN) is an effective method for image segmentation, the optimal parameter...
详细信息
ISBN:
(纸本)9789811065712;9789811065705
image segmentation is a very important aspect in the fields of computervision and pattern recognition. Although Pulse-coupled Neural Network (PCNN) is an effective method for image segmentation, the optimal parameters of PCNN are difficult to be decided. In order to effectively find the optimal parameters of the PCNN, Quantum Geese Swarm Optimization (QGSO) is proposed to evolve parameters of PCNN. the proposed QGSO applies quantum computing theory to Geese Swarm Optimization (GSO) for continuous optimization problems. Minimal combined weighting entropy which considers of Shannon-entropy and Cross-entropy is used as the fitness function of QGSO. Experiment results show that the proposed method can obtain better segmented image and has an excellent performance.
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. they however fail to perform well in unconstrained scenarios, especially when...
ISBN:
(纸本)9781450366151
Face Recognition (FR) using Convolutional Neural Network (CNN) based models have achieved considerable success in constrained environments. they however fail to perform well in unconstrained scenarios, especially when the images are captured using surveillance cameras. these probe samples suffer from degradations such as noise, poor illumination, low resolution, blur as well as aliasing, when compared to the rich training (gallery) set, comprising mostly of mugshot images captured in laboratory settings. these images in the training (gallery) set are crisp and have high contrast, compared to the probe samples. To cope withthis scenario, we propose a novel dual-pathway generative adversarial network (DP-GAN) which maps low resolution images captured using surveillance camera into their corresponding high resolution images, which are gallery-like, using a novel combination of multi-scale reconstruction and Jensen-Shannon divergence based loss. these images thus obtained are then used to train a deep domain adaptation (deep-DA) network to perform the task of FR. the proposed network achieves superior results (>90%) on four benchmark surveillance face datasets, evident from the rank-1 recognition rates when compared with recent state-of-the-art CNN-based techniques.
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. the proposed technique is based on the assumption that, only a portion of the whole ...
ISBN:
(纸本)9781450366151
In this paper we present a novel methodology for recognizing human activity in Egocentric video based on the Bag of Visual Features. the proposed technique is based on the assumption that, only a portion of the whole video can be sufficient to identify an activity. Rather, we argue that, for activity recognition in egocentric videos, the proposed approach performs better than any deep learning based method. Because, in egocentric videos, often the person wiring the sensor, becomes static for long time, or moves his head frequently. In boththe cases, it becomes difficult to learn the spatiotemporal pattern of the video during action. the proposed approach divides the video into smaller video segments called Video Units. Spatio-temporal features extracted from the units, are clustered to construct the dictionary of Action Units (AU). the AUs are ranked based upon their score of likeliness. the scores are obtained by constructing a weighted graph withthe AUs as vertices and edge weights calculated based on the frequencies of occurrences of the AUs during the activity. the less significant AUs are pruned out from the dictionary, and the revised dictionary of key AUs are used for activity classification. We test our approach on benchmark egocentric dataset and achieve a good accuracy.
Person re-identification has great applications in video surveillance. It can be viewed as recognizing the same person across non-overlapping cameras. Video-based person re-identification methods are gaining increased...
详细信息
ISBN:
(纸本)9781450366151
Person re-identification has great applications in video surveillance. It can be viewed as recognizing the same person across non-overlapping cameras. Video-based person re-identification methods are gaining increased attention due to the better discriminative nature of spatio-temporal feature representations. Current video-based methods make use of RNN to extract temporal information. In this paper, we propose a novel Moving Average Recurrent Neural Network (MA-RNN) model that can build a strong feature representation by taking both previous and present inputs at each time stamp. Specifically, here the recurrent layer produces a better sequential information by looking back directly in to the past values where as general RNNs has only an indirect dependence on the previous values in the form of hidden-state information. the proposed model is tested on two publicly available datasets: iLIDS-VID and PRID-2011 and it performed better in comparison withthe state-of-the-art methods with a significant margin. We also analyze the effect of the depth of previous input dependence of the MA-RNN model on the matching accuracy.
In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely indian Classical Dance (ICD) classification. the variation in such dance forms in term...
详细信息
ISBN:
(纸本)9781450366151
In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely indian Classical Dance (ICD) classification. the variation in such dance forms in terms of hand and body postures, facial expressions or emotions and head orientation makes pose estimation an extremely challenging task. To circumvent this problem, we construct a pose-oblivious shape signature which is fed to a sequence learning framework. the pose signature representation is done in two-fold process. First, we represent person-pose in first frame of a dance video using symmetric Spatial Transformer Networks (STN) to extract good person object proposals and CNN-based parallel single person pose estimator (SPPE). Next, the pose basis are converted to pose flows by assigning a similarity score between successive poses followed by non-maximal suppression. Instead of feeding a simple chain of joints in the sequence learner which generally hinders the network performance we constitute a feature vector of the normalized distance vectors, flow, angles between anchor joints which captures the adjacency configuration in the skeletal pattern. thus, the kinematic relationship amongst the body joints across the frames using pose estimation helps in better establishing the spatio-temporal dependencies. We present an exhaustive empirical evaluation of state-of-the-art deep network based methods for dance classification on ICD dataset.
In agriculture, determining the total number of chilli fruits to estimate the amount of crop yield plays an important role. Determining manually the total number of chilli fruits in an orchard is a tedious job, also i...
详细信息
ISBN:
(数字)9781728151977
ISBN:
(纸本)9781728151984
In agriculture, determining the total number of chilli fruits to estimate the amount of crop yield plays an important role. Determining manually the total number of chilli fruits in an orchard is a tedious job, also it requires huge human resource, cost and has low accuracy. In this research work, a novel approach is proposed, for detecting and counting ripened chilli fruit from the plant images. It helps farmers to plan the manual labor to harvest the crop, shipment, sales and operations related to the post harvest. computervision techniques can help to precisely count the chilli fruits of the orchard. therefore, an automated determination and counting the number of chilli fruits is introduced in the agricultural farms. the proposed technique achieves fine accuracy when compared towards ground truth chilli fruit images.
Last decade has witnessed rapid growth for the popularity of Convolutional Neural Networks (CNNs), in detecting and classifying objects. the self trainable nature of CNNs makes them the strongest candidate as a classi...
ISBN:
(纸本)9781450366151
Last decade has witnessed rapid growth for the popularity of Convolutional Neural Networks (CNNs), in detecting and classifying objects. the self trainable nature of CNNs makes them the strongest candidate as a classifier and a feature extractor. However, many of the existing CNN architectures fail recognizing texts or objects under input rotation and scaling. this paper introduces an elegant approach, 'Scale and Rotation Corrected CNN (SRC-CNN)' for scale and rotation invariant text recognition, exploiting the concept of principal component of characters. Prior to training and testing with baseline CNN, 'SRC-CNN' maps each character image to a reference orientation and scale, which is again derived from the character image itself. SRC-CNN is capable of recognizing characters in a document, even though they differ in orientation and scale greatly. the proposed method does not demand any training with samples which are scaled or rotated. the performance of proposed approach is validated on different character data sets like MNIST, MNIST_rot_12k and English alphabets and compared with state of the art rotation invariant classification networks. SRC-CNN is a generalized approach and can be extended for rotation and scale invariant classification of many other datasets as well, choosing any appropriate baseline CNN. Here we have demonstrated the generality of the proposed SRC-CNN on MNIST Fashion data set and found to perform well in rotation and scale invariant classification of objects as well. this paper demonstrates how the basic PCA based rotation and scale invariant image recognition can be integrated to CNN for achieving better rotational and scale invariances in classification.
暂无评论