Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too s...
详细信息
ISBN:
(纸本)9781538604571
Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. the end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two video datasets on object detection and semantic segmentation. It significantly advances the practice of video recognition tasks. Code would be released.
the proceedings contain 781 papers. the topics discussed include: exclusivity-consistency regularized multi-view subspace clustering;borrowing treasures from the wealthy: deep transfer learning through selective joint...
the proceedings contain 781 papers. the topics discussed include: exclusivity-consistency regularized multi-view subspace clustering;borrowing treasures from the wealthy: deep transfer learning through selective joint fine-tuning;the more you know: using knowledge graphs for image classification;dynamic edge-conditioned filters in convolutional neural networks on graphs;convolutional neural network architecture for geometric matching;deep affordance-grounded sensorimotor object recognition;on compressing deep models by low rank and sparse decomposition;unsupervised pixel-level domain adaptation with generative adversarial networks;photo-realistic single image super-resolution using a generative adversarial network;a practical method for fully automatic intrinsic camera calibration using directionally encoded light;elastic shape-from-template with spatially sparse deforming forces;and distinguishing the indistinguishable: exploring structural ambiguities via geodesic context.
In several domains, including healthcare and home automation, it is important to unobtrusively monitor the activities of daily living (ADLs) executed by people at home. A popular approach consists in the use of sensor...
详细信息
ISBN:
(纸本)9781538617595
In several domains, including healthcare and home automation, it is important to unobtrusively monitor the activities of daily living (ADLs) executed by people at home. A popular approach consists in the use of sensors attached to everyday objects to capture user interaction, and ADL models to recognize the current activity based on the temporal sequence of used objects. However, both knowledge-based and data-driven approaches to object-based ADL recognition have different issues that limit their applicability in real-world deployments. Hence, in this paper, we pursue an alternative approach, which consists in mining ADL models from the Web. Existing attempts in this sense are mainly based on Web page mining and lexical analysis. One issue withthose attempts relies on the high level of noise found in the textual content of Web pages. In order to overcome that issue, our intuition is that pictures illustrating the execution of a given activity offer much more compact and expressive information than the textual content of a Web page regarding the same activity. Hence, we present a novel method to couple Web mining and computervision for automatically extracting ADL models from visual items. Our method relies on Web image search engines to select the most relevant pictures for each considered activity. We use off-the-shelf computervision APIs and a lexical database to extract the key objects appearing in those pictures. We introduce a probabilistic technique to measure the relevance among activities and objects. through experiments with a large dataset of real-world ADLs, we show that our method significantly improves the existing approach.
We demonstrate the ability of deep architectures, specifically convolutional neural networks, to learn and differentiate the lexical features of different programming languages presented in coding video tutorials foun...
详细信息
ISBN:
(数字)9781450357142
ISBN:
(纸本)9781538661697
We demonstrate the ability of deep architectures, specifically convolutional neural networks, to learn and differentiate the lexical features of different programming languages presented in coding video tutorials found on the Internet. We analyze over 17,000 video frames containing examples of Java, Python, and other textual and non-textual objects. Our results indicate that not only can computervision models based on deep architectures be taught to differentiate among programming languages with over 98% accuracy, but can learn language-specific lexical features in the process. this provides a powerful mechanism for carrying out program comprehension research on repositories where source code is represented with imagery rather than text, while simultaneously avoiding the computational overhead of optical character recognition.
Hand posture recognition is a popular research topic in computervision, on account of its important real-world applications such as sign language recognition. Understanding human gestures is hard because of several c...
详细信息
ISBN:
(纸本)9783319686127;9783319686110
Hand posture recognition is a popular research topic in computervision, on account of its important real-world applications such as sign language recognition. Understanding human gestures is hard because of several challenges like feature extracting. Various algorithms have been employed in gesture recognition, but many of the best results were achieved by Convolutional neural networks (CNN), which are powerful visual models that are widely applied to many fields of patternrecognition, such as image classification [1], face recognition, speech recognition, including hand posture recognition. Inspired by [2] this paper proposes a multi-channel and multi-scale convolutional neural network (MMCNN), provided by two channels with diverse convolution kernel sizes, meanwhile, the input pictures are preprocessed into different sizes. MMCNN could accept the different features of the image as input, and then combines these features for image classification. the multi-channel structure is able to extract image features from multiple spatial scales using convolutional kernels with different sizes, and multi-scale structure input ensures the richness of the input image characteristics. Experiments were performed using two gesture databases, the proposed MMCNN classifies 24 gesture classes with 98.4% accuracy, better than the nearest competitor, enhancing the generalization ability of convolution neural networks.
this paper discusses a possible implementation of the integration of knowledge from a probabilistic ontology in the automatic description of images. this combination not only provides the relations existing between th...
详细信息
this paper discusses a possible implementation of the integration of knowledge from a probabilistic ontology in the automatic description of images. this combination not only provides the relations existing between the different segments, but also improve the classification accuracy, as the context often gives cues suggesting the correct class of the segment.
We are dealing withthe problem of fine-grained vehicle make& model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itsel...
详细信息
ISBN:
(纸本)9781467388511
We are dealing withthe problem of fine-grained vehicle make& model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itself - and feeding it into the deep convolutional neural network boosts the recognition performance considerably. this additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.
In this paper, we address the problem of object discovery in time-varying, large-scale image collections. A core part of our approach is a novel Limited Horizon Minimum Spanning Tree (LH-MST) structure that closely ap...
详细信息
ISBN:
(纸本)9781467388511
In this paper, we address the problem of object discovery in time-varying, large-scale image collections. A core part of our approach is a novel Limited Horizon Minimum Spanning Tree (LH-MST) structure that closely approximates the Minimum Spanning Tree at a small fraction of the latter's computational cost. Our proposed tree structure can be created in a local neighborhood of the matching graph during image retrieval and can be efficiently updated whenever the image database is extended. We show how the LH-MST can be used within both single-link hierarchical agglomerative clustering and the Iconoid Shift framework for object discovery in image collections, resulting in significant efficiency gains and making both approaches capable of incremental clustering with online updates. We evaluate our approach on a dataset of 500k images from the city of Paris and compare its results to the batch version of both clustering algorithms.
the task of Heterogeneous Face recognition consists inmatch face images that were sensed in different modalities, such as sketches to photographs, thermal images to photographs or near infrared to photographs. In this...
详细信息
ISBN:
(纸本)9781509014378
the task of Heterogeneous Face recognition consists inmatch face images that were sensed in different modalities, such as sketches to photographs, thermal images to photographs or near infrared to photographs. In this preliminary work we introduce a novel and generic approach based on Inter-session Variability Modelling to handle this task. the experimental evaluation conducted with two different image modalities showed an average rank-1 identification rates of 96.93% and 72.39% for the CUHK-CUFS (Sketches) and CASIA NIR-VIS 2.0 (Near infra-red) respectively. this work is totally reproducible and all the source code for this approach is made publicly available.
the selection of adequate job candidates is very long and challenging process for each employer. the system presented in this paper is aiming to decrease the time for candidate selection on the pre-employment stage us...
详细信息
ISBN:
(纸本)9781538607343
the selection of adequate job candidates is very long and challenging process for each employer. the system presented in this paper is aiming to decrease the time for candidate selection on the pre-employment stage using automatic personality screening based on visual, audio and lexical cues from short video-clips. the system is build to predict candidate scores of 5 Big Personality Traits and to estimate a final decision, to which degree the person from video-clip has to be invited to the job interview. For each channel a set of relevant features is extracted, which are used to train separately from each other using Deep Learning. In the final stage all three results are fused together into final scores prediction. the experiment was conducted on first impression database and achieved significant performance.
暂无评论