To be successful, vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings. In this work, we develop a methodology to study agent behavior on a skill-speci...
详细信息
ISBN:
(纸本)9798350301298
To be successful, vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings. In this work, we develop a methodology to study agent behavior on a skill-specific basis - examining how well existing agents ground instructions about stopping, turning, and moving towards specified objects or rooms. Our approach is based on generating skill-specific interventions and measuring changes in agent predictions. We present a detailed case study analyzing the behavior of a recent agent and then compare multiple agents in terms of skill-specific competency scores. This analysis suggests that biases from training have lasting effects on agent behavior and that existing models are able to ground simple referring expressions. Our comparisons between models show that skill-specific scores correlate with improvements in overall VLN task performance.
We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relationsbetween individualpersons in ...
详细信息
ISBN:
(纸本)9781728132938
We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relationsbetween individualpersons in image or video. It learns to produce an intermediate spatial representation (activity map) based on individual and group activities. A multi-stage refinement component is responsible for decreasingthe incorrectpredictions in the activity map. Finally, an aggregationcomponent uses the refined information to recognize group activities. Experimental results demonstrate the constructive contribution of the information extracted and represented in the form of the activity map. CRM shows advantages over state-of-the-artmodels on Volleyball and Collective Activity datasets.
Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a diff...
详细信息
ISBN:
(纸本)9781665445092
Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.
Deep convolutional neural network (ConvNet) is a promising approach for high-performance image classification. The behavior of ConvNet is analyzed mainly based on the neuron activations, such as by visualizing them. I...
详细信息
ISBN:
(纸本)9781538664209
Deep convolutional neural network (ConvNet) is a promising approach for high-performance image classification. The behavior of ConvNet is analyzed mainly based on the neuron activations, such as by visualizing them. In this paper, in contrast to the activations, we focus on filters which are main components of ConvNets. Through analyzing two types of filters at convolution and fully-connected (FC) layers, respectively, on various pre-trained ConvNets, we present the methods to efficiently reformulate the filters, contributing to improving both memory size and classification performance of the ConvNets. They render the filter bases formulated in a parameter-free form as well as the efficient representation for the FC layer. The experimental results on image classification show that the methods are favorably applied to improve various ConvNets, including ResNet, trained on ImageNet with exhibiting high transferability on the other datasets.
Accurate identification and localization of anatomical structures of varying size and appearance in laparoscopic imaging are necessary to leverage the potential of computervision techniques for surgical decision supp...
详细信息
ISBN:
(纸本)9798350365474
Accurate identification and localization of anatomical structures of varying size and appearance in laparoscopic imaging are necessary to leverage the potential of computervision techniques for surgical decision support. Segmentation performance of such models is traditionally reported using metrics of overlap such as IoU. However, imbalanced and unrealistic representation of classes in the training data and suboptimal selection of reported metrics have the potential to skew nominal segmentation performance and thereby ultimately limit clinical translation. In this work, we systematically analyze the impact of class characteristics (i.e., organ size differences), training and test data composition (i.e., representation of positive and negative examples), and modeling parameters (i.e., foreground-to-background class weight) on eight segmentation metrics: accuracy, precision, recall, IoU, F1 score (Dice Similarity Coefficient), specificity, Hausdorff Distance, and Average Symmetric Surface Distance. Our findings support two adjustments to account for data biases in surgical data science: First, training on datasets that are similar to the clinical real-world scenarios in terms of class distribution, and second, class weight adjustments to optimize segmentation model performance with regard to metrics of particular relevance in the respective clinical setting.
We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale. The method do...
详细信息
ISBN:
(纸本)9781728132938
We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale. The method does not require any information about the illuminant in the scene and relies on the weak assumption, fulfilled by almost all images available on the Web, that training images have been approximately balanced. Because of this requirement we define our method as quasi-unsupervised. After training, unbalanced images can be processed thanks to the preliminary conversion to grayscale of the input to the neural network. The results of an extensive experimentation demonstrate that the proposed method is able to outperform the other unsupervised methods in the state of the art being, at the same time, flexible enough to be supervisedly fine-tuned to reach performance comparable with those of the best supervised methods.
Deep implicit shape models have become popular in the computervision community at large but less so for biomedical applications. This is in part because large training databases do not exist and in part because biome...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Deep implicit shape models have become popular in the computervision community at large but less so for biomedical applications. This is in part because large training databases do not exist and in part because biomedical annotations are often noisy. In this paper;we show that by introducing templates within the deep learning pipeline we can overcome these problems. The proposed framework, named ImplicitAtlas, represents a shape as a deformation field from a learned template field, where multiple templates could be integrated to improve the shape representation capacity at negligible computational cost. Extensive experiments on three medical shape datasets prove the superiority over current implicit representation methods.
In many different fields interactions between objects play a critical role in determining their behavior. Graph neural networks (GNNs) have emerged as a powerful tool for modeling interactions, although often at the c...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
In many different fields interactions between objects play a critical role in determining their behavior. Graph neural networks (GNNs) have emerged as a powerful tool for modeling interactions, although often at the cost of adding considerable complexity and latency. In this paper, we consider the problem of spatial interaction modeling in the context of predicting the motion of actors around autonomous vehicles, and investigate alternatives to GNNs. We revisit 2D convolutions and show that they can demonstrate comparable performance to graph networks in modeling spatial interactions with lower latency, thus providing an effective and efficient alternative in time-critical systems. Moreover, we propose a novel interaction loss to further improve the interaction modeling of the considered methods.
This paper proposes a novel approach to performing image-to-image translation between unpaired domains. Rather than relying on a cycle constraint, our method takes advantage of collaboration between various GANs. This...
详细信息
ISBN:
(纸本)9781728171685
This paper proposes a novel approach to performing image-to-image translation between unpaired domains. Rather than relying on a cycle constraint, our method takes advantage of collaboration between various GANs. This results in a multi-modal method, in which multiple optional and diverse images are produced for a given image. Our model addresses some of the shortcomings of classical GANs: (1) It is able to remove large objects, such as glasses. (2) Since it does not need to support the cycle constraint, no irrelevant traces of the input are left on the generated image. (3) It manages to translate between domains that require large shape modifications. Our results are shown to outperform those generated by state-of-the-art methods for several challenging applications.
Face anti-spoofing (FAS) is to protect facial recognition systems against presentation attacks. However, recent research on FAS often neglects real-world conditions, such as changing illumination, varying angles of fa...
详细信息
ISBN:
(纸本)9798350365474
Face anti-spoofing (FAS) is to protect facial recognition systems against presentation attacks. However, recent research on FAS often neglects real-world conditions, such as changing illumination, varying angles of face, and motion blur within a video. These conditions lead to inconsistent feature quality across face images, where low-quality features can cause the model to learn unreliable information during training. Moreover, frames with low feature quality within videos result in inaccurate decisions. To address this issue, we propose the Face Image Quality Assessment Based Face Anti-Spoofing System (FIQA-FAS), which integrates a face image quality assessment module with a face anti-spoofing module. FIQA-FAS assesses the feature quality extracted from each face image and uses the quality score to compute a weighted prediction for deciding if the face in a video is live or spoof. We demonstrate the effectiveness of FIQA-FAS through experiments on the SIW and SIW-M datasets. To further demonstrate our model's capabilities, we introduce a novel simulated scenario that mimics the real world, where our model outperforms other SOTA.
暂无评论