Unsupervised image-to-image translation tasks aim to find a mapping between a source domain X and a target domain Y from unpaired training data. Contrastive learning for Unpaired image-to-image Translation (CUT) yield...
详细信息
ISBN:
(纸本)9781665448994
Unsupervised image-to-image translation tasks aim to find a mapping between a source domain X and a target domain Y from unpaired training data. Contrastive learning for Unpaired image-to-image Translation (CUT) yields state-of-the-art results in modeling unsupervised image-to-image translation by maximizing mutual information between input and output patches using only one encoder for both domains. In this paper, we propose a novel method based on contrastive learning and a dual learning setting (exploiting two encoders) to infer an efficient mapping between unpaired data. Additionally, while CUT suffers from mode collapse, a variant of our method efficiently addresses this issue. We further demonstrate the advantage of our approach through extensive ablation studies demonstrating superior performance comparing to recent approaches in multiple challenging image translation tasks. Lastly, we demonstrate that the gap between unsupervised methods and supervised methods can be efficiently closed.
Pano3D is a new benchmark for depth estimation from spherical panoramas. It aims to assess performance across all depth estimation traits, the primary direct depth estimation performance targeting precision and accura...
详细信息
ISBN:
(纸本)9781665448994
Pano3D is a new benchmark for depth estimation from spherical panoramas. It aims to assess performance across all depth estimation traits, the primary direct depth estimation performance targeting precision and accuracy, and also the secondary traits, boundary preservation and smoothness. Moreover, Pano3D moves beyond typical intra-dataset evaluation to inter-dataset performance assessment. By disentangling the capacity to generalize in unseen data into different test splits, Pano3D represents a holistic benchmark for 360 degrees depth estimation. We use it as a basis for an extended analysis seeking to offer insights into classical choices for depth estimation. This results into a solid baseline for panoramic depth that followup works can built upon to steer future progress.
Convolutional Neural Networks (CNNs) have achieved remarkable success in various computervision tasks but rely on tremendous computational cost. To solve this problem, existing approaches either compress well-trained...
详细信息
ISBN:
(纸本)9781665448994
Convolutional Neural Networks (CNNs) have achieved remarkable success in various computervision tasks but rely on tremendous computational cost. To solve this problem, existing approaches either compress well-trained large-scale models or learn lightweight models with carefully designed network structures. In this work, we make a close study of the convolution operator, which is the basic unit used in CNNs, to reduce its computing load. In particular, we propose a compact convolution module, called CompConv, to facilitate efficient feature learning. With the divide-and-conquer strategy, CompConv is able to save a great many computations as well as parameters to produce a certain dimensional feature map. Furthermore, CompConv discreetly integrates the input features into the outputs to efficiently inherit the input information. More importantly, the novel CompConv is a plug-and-play module that can be directly applied to modern CNN structures to replace the vanilla convolution layers without further effort. Extensive experimental results suggest that CompConv can adequately compress the benchmark CNN structures yet barely sacrifice the performance, surpassing other competitors.
In recent years, significant progress has been made in face recognition, which can be partially attributed to the availability of large-scale labeled face datasets. However, since the faces in these datasets usually c...
详细信息
ISBN:
(纸本)9781665448994
In recent years, significant progress has been made in face recognition, which can be partially attributed to the availability of large-scale labeled face datasets. However, since the faces in these datasets usually contain limited degree and types of variation, the resulting trained models generalize poorly to more realistic unconstrained face datasets. While collecting labeled faces with larger variations could be helpful, it is practically infeasible due to privacy and labor cost. In comparison, it is easier to acquire a large number of unlabeled faces from different domains, which could be used to regularize the learning of face representations. We present an approach to use such unlabeled faces to learn generalizable face representations, where we assume neither the access to identity labels nor domain labels for unlabeled images. Experimental results on unconstrained datasets show that a small amount of unlabeled data with sufficient diversity can (i) lead to an appreciable gain in recognition performance and (ii) outperform the supervised baseline when combined with less than half of the labeled data. Compared with the state-of-the-art face recognition methods, our method further improves their performance on challenging benchmarks, such as IJBB, IJB-C and IJB-S.
Learning to detect novel objects from few annotated examples is of great practical importance. A particularly challenging yet common regime occurs when there are extremely limited examples (less than three). One criti...
详细信息
ISBN:
(纸本)9781665445092
Learning to detect novel objects from few annotated examples is of great practical importance. A particularly challenging yet common regime occurs when there are extremely limited examples (less than three). One critical factor in improving few-shot detection is to address the lack of variation in training data. We propose to build a better model of variation for novel classes by transferring the shared within-class variation from base classes. To this end, we introduce a hallucinator network that learns to generate additional, useful training examples in the region of interest (RoI) feature space, and incorporate it into a modern object detection model. Our approach yields significant performance improvements on two state-of-the-art few-shot detectors with different proposal generation procedures. In particular, we achieve new state of the art in the extremely-few-shot regime on the challenging COCO benchmark.
Scene understanding is a critical problem in computervision. In this paper, we propose a 3D point-based scene graph generation (SGG(point)) framework to effectively bridge perception and reasoning to achieve scene un...
详细信息
ISBN:
(纸本)9781665445092
Scene understanding is a critical problem in computervision. In this paper, we propose a 3D point-based scene graph generation (SGG(point)) framework to effectively bridge perception and reasoning to achieve scene understanding via three sequential stages, namely scene graph construction, reasoning, and inference. Within the reasoning stage, an EDGE-oriented Graph Convolutional Network (EdgeGCN) is created to exploit multi-dimensional edge features for explicit relationship modeling, together with the exploration of two associated twinning interaction mechanisms between nodes and edges for the independent evolution of scene graph representations. Overall, our integrated SGG(point) framework is established to seek and infer scene structures of interest from both real-world and synthetic 3D point-based scenes. Our experimental results show promising edge-oriented reasoning effects on scene graph generation studies. We also demonstrate our method advantage on several traditional graph representation learning benchmark datasets, including the node-wise classification on citation networks and whole-graph recognition problems for molecular analysis.
Nowadays, machine learning is becoming a ubiquitous artificial intelligence technology. It is actively being implemented in various fields of science and technology, including protection against cyber attacks, image r...
详细信息
Efficiently deploying learning-based systems on embedded hardware is challenging for various reasons, two of which are considered in this paper: The model's size and its robustness against attacks. Both need to be...
详细信息
ISBN:
(纸本)9781665448994
Efficiently deploying learning-based systems on embedded hardware is challenging for various reasons, two of which are considered in this paper: The model's size and its robustness against attacks. Both need to be addressed even-handedly. We combine adversarial training and model pruning in a joint formulation of the fundamental learning objective during training. Unlike existing post-train pruning approaches, our method does not use heuristics and eliminates the need for a pre-trained model. This allows for a classifier which is robust against attacks and enables better compression of the model, reducing its computational effort. In comparison to prior work, our approach yields 6.21 pp higher accuracy for an 85 % reduction in parameters for ResNet20 on the CIFAR-10 dataset.
Removing objects from images is a challenging technical problem that is important for many applications, including mixed reality. For believable results, the shadows that the object casts should also be removed. Curre...
详细信息
ISBN:
(纸本)9781665445092
Removing objects from images is a challenging technical problem that is important for many applications, including mixed reality. For believable results, the shadows that the object casts should also be removed. Current inpainting-based methods only remove the object itself, leaving shadows behind, or at best require specifying shadow regions to inpaint. We introduce a deep learning pipeline for removing a shadow along with its caster. We leverage rough scene models in order to remove a wide variety of shadows (hard or soft, dark or subtle, large or thin) from surfaces with a wide variety of textures. We train our pipeline on synthetically rendered data, and show qualitative and quantitative results on both synthetic and real scenes.
Object detection has achieved great progress with the development of anchor-based and anchor-free detectors. However, the detection of tiny objects is still challenging due to the lack of appearance information. In th...
详细信息
ISBN:
(纸本)9781665448994
Object detection has achieved great progress with the development of anchor-based and anchor-free detectors. However, the detection of tiny objects is still challenging due to the lack of appearance information. In this paper, we observe that Intersection over Union (IoU), the most widely used metric in object detection, is sensitive to slight offsets between predicted bounding boxes and ground truths when detecting tiny objects. Although some new metrics such as GIoU, DIoU and CIoU are proposed, their performance on tiny object detection is still below the expected level by a large margin. In this paper, we propose a simple but effective new metric called Dot Distance (DotD) for tiny object detection where DotD is defined as normalized Euclidean distance between the center points of two bounding boxes. Extensive experiments on tiny object detection dataset show that anchor-based detectors' performance is highly improved over their baselines with the application of DotD.
暂无评论