Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the met...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the metric consistency from training to testing, is an essential component to the success of metric-based few-shot learning. Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains. The resulting approach, called rectified metric propagation (ReMP), further optimizes an attentive prototype propagation network, and applies a repulsive force to make confident predictions. Extensive experiments demonstrate that the proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
Existing research in the field of face recognition with variations due to disguises focuses primarily on images captured in controlled settings. Limited research has been performed on images captured in unconstrained ...
详细信息
ISBN:
(纸本)9781538661000
Existing research in the field of face recognition with variations due to disguises focuses primarily on images captured in controlled settings. Limited research has been performed on images captured in unconstrained environments, primarily due to the lack of corresponding disguised face datasets. In order to overcome this limitation, this work presents a novel Disguised Faces in the Wild (DFW) dataset, consisting of over 11,000 images for understanding and pushing the current state-of-the-art for disguised face recognition. To the best of our knowledge, DFW is a first-of-a-kind dataset containing images pertaining to both obfuscation and impersonation for understanding the effect of disguise variations. A major portion of the dataset has been collected from the Internet, thereby encompassing a wide variety of disguise accessories and variations across other covariates. As part of CVPR2018, a competition and workshop are organized to facilitate research in this direction. This paper presents a description of the dataset, the baseline protocols and performance, along with the phase-I results of the competition.
Depth estimation from a single 360 degrees panorama image is a difficult task. It is an ill-posed problem to estimate depth maps from an RGB panorama image due to the intrinsic scale ambiguity issue. To mitigate the s...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Depth estimation from a single 360 degrees panorama image is a difficult task. It is an ill-posed problem to estimate depth maps from an RGB panorama image due to the intrinsic scale ambiguity issue. To mitigate the scale inconsistency issue in the ground truth depth map, we propose a simple yet effective method to normalize the depth data based on estimated camera height. In addition, we design a multiple head planar-guided depth network, to provide more geometric constraints for depth estimation. Experimental results show that our relative depth estimation task is more accurate than the absolute depth estimation task, and our proposed model produces state-of-the-art performance on both Matterport3D and Stanford2D3D datasets.
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and ob...
详细信息
ISBN:
(纸本)9781728193601
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and object entities in a specified relationship. We introduce a simple yet effective proposal-based method for referring relationships. Different from the existing methods such as SSAS, our method can generate a high-resolution result while reducing its complexity and ambiguity. Our method is composed of two modules: a category-based proposal generation module to select the proposals related to the entities and a predicate analysis module to score the compatibility of pairs of selected proposals. We show state-of-the-art performance on the referring relationship task on two public datasets: Visual Relationship Detection and Visual Genome.
Multi-class cell detection (cancer or non-cancer) from a whole slide image (WSI) is an important task for pathological diagnosis. Cancer and non-cancer cells often have a similar appearance, so it is difficult even fo...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Multi-class cell detection (cancer or non-cancer) from a whole slide image (WSI) is an important task for pathological diagnosis. Cancer and non-cancer cells often have a similar appearance, so it is difficult even for experts to classify a cell from a patch image of individual cells. They usually identify the cell type not only on the basis of the appearance of a single cell but also on the context of the surrounding cells. For using such information, we propose a multi-class cell-detection method that introduces a modified self-attention to aggregate the surrounding image features of both classes. Experimental results demonstrate the effectiveness of the proposed method;our method achieved the best performance compared with a method, which simply uses the standard self-attention method.
Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tacti...
详细信息
ISBN:
(纸本)9781538607336
Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tactical profiling in elite badminton. The proposed approach uses computervision techniques to automate data gathering from video footage. The image processing algorithm is validated using video footage of the highest level tournaments, including the Olympic Games. The average accuracy of player position detection is 96.03% and 97.09% on the two halves of a badminton court. Next, frequent trajectories of badminton players are extracted and classified according to their tactical relevance. The classification performs at 97.79% accuracy, 97.81% precision, 97.44% recall, and 97.62% F-score. The combination of automated player position detection, frequent trajectory extraction, and the subsequent classification can be used to automatically generate player tactical profiles.
This paper introduces the Neurodata Lab's approach presented at the 1st Challenge on Remote Physiological Signal Sensing (RePSS) organized within CVPR2020. The RePSS challenge was focused on measuring the average ...
详细信息
ISBN:
(纸本)9781728193601
This paper introduces the Neurodata Lab's approach presented at the 1st Challenge on Remote Physiological Signal Sensing (RePSS) organized within CVPR2020. The RePSS challenge was focused on measuring the average heart rate from color facial videos, which is one of the most fundamental problems in the field of computervision. Our deep learning-based approach includes 3D spatio-temporal attention convolutional neural network for photoplethysmogram extraction and 1D convolutional neural network pre-trained on synthetic data for time series analysis. It provides state-of-the-art results outperforming those of other participants on a mixture of VIPL and OBF databases: MAE=6.94 (12.3% improvement compared to the top-2 result), RMSE=10.68 (24.6% improvement), Pearson R = 0.755 (28.2% improvement).
While virtual try-on has rapidly progressed recently, existing virtual try-on methods still struggle to faithfully represent various details of the clothes when worn. In this paper, we propose a simple yet effective m...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
While virtual try-on has rapidly progressed recently, existing virtual try-on methods still struggle to faithfully represent various details of the clothes when worn. In this paper, we propose a simple yet effective method to better preserve details of the clothing and person by introducing an additional fitting step after geometric warping. This minimal modification enables disentangling representations of the clothing from the wearer, hence we are able to preserve the wearer-agnostic structure and details of the clothing, to fit a garment naturally to a variety of poses and body shapes. Moreover, we propose a novel evaluation framework applicable to any metric, to better reflect the semantics of clothes fitting. From extensive experiments, we empirically verify that the proposed method not only learns to disentangle clothing from the wearer, but also preserves details of the clothing on the try-on results.
In this work, we provide a detailed description on our submitted methods ANTxNN and ANTxNN SSIM to Workshop and Challenge on Learned Image Compression (CLIC) 2021. We propose to incorporate Relativistic average Least ...
详细信息
ISBN:
(纸本)9781665448994
In this work, we provide a detailed description on our submitted methods ANTxNN and ANTxNN SSIM to Workshop and Challenge on Learned Image Compression (CLIC) 2021. We propose to incorporate Relativistic average Least Squares GANs (RaLSGANs) into Rate-Distortion Optimization for end-to-end training, to achieve perceptual image compression. We also compare two types of discriminator networks and visualize their reconstructed images. Experimental results have validated our method optimized by RaLSGANs can achieve higher subjective quality compared to PSNR, MS-SSIM or LPIPS-optimized models.
Automotive systems provide a unique opportunity for mobile vision technologies to improve road safety by understanding and monitoring the driver. In this work, we propose a real-time framework for early detection of d...
详细信息
ISBN:
(纸本)9781479943098
Automotive systems provide a unique opportunity for mobile vision technologies to improve road safety by understanding and monitoring the driver. In this work, we propose a real-time framework for early detection of driver maneuvers. The implications of this study would allow for better behavior prediction, and therefore the development of more efficient advanced driver assistance and warning systems. Cues are extracted from an array of sensors observing the driver (head, hand, and foot), the environment (lane and surrounding vehicles), and the ego-vehicle state (speed, steering angle, etc.). Evaluation is performed on a real-world dataset with overtaking maneuvers, showing promising results. In order to gain better insight into the processes that characterize driver behavior, temporally discriminative cues are studied and visualized.
暂无评论