As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big ...
详细信息
ISBN:
(纸本)9781665445092
As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computervision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks.
We aim to tackle the challenging Few-Shot Object Detection (FSOD), where data-scarce categories are presented during the model learning. The failure modes of Faster-RCNN in FSOD are investigated, and we find that the ...
详细信息
ISBN:
(纸本)9781665445092
We aim to tackle the challenging Few-Shot Object Detection (FSOD), where data-scarce categories are presented during the model learning. The failure modes of Faster-RCNN in FSOD are investigated, and we find that the performance degradation is mainly due to the classification incapability (false positives) caused by category confusion, which motivates us to address FSOD from a novel aspect of classification refinement. Specifically, we address the intrinsic limitation from the aspects of both architectural enhancement and hard-example mining. We introduce a novel few-shot classification refinement mechanism where a de coupled Few-Shot Classification Network (FSCN) is employed to improve the final classification of a base detector. Moreover, we especially probe a commonly-overlooked but destructive issue of FSOD, i.e., the presence of distractor samples due to the incomplete annotations where images from the base set may contain novel-class objects but remain unlabelled. Retreatment solutions are developed to eliminate the incurred false positives. For FSCN training, the distractor is formulated as a semi-supervised problem, where a distractor utilization loss is proposed to make proper use of it for boosting the data-scarce classes, while a confidence-guided dataset pruning (CGDP) technique is developed to facilitate the few-shot adaptation of base detector. Experiments demonstrate that our proposed framework achieves state-of-the-art FSOD performance on public datasets, e.g., Pascal VOC and MS-COCO.
The functional map framework has proven to be extremely effective for representing dense correspondences between deformable shapes. A key step in this framework is to formulate suitable preservation constraints to enc...
详细信息
ISBN:
(纸本)9781665445092
The functional map framework has proven to be extremely effective for representing dense correspondences between deformable shapes. A key step in this framework is to formulate suitable preservation constraints to encode the geometric information that must be preserved by the unknown map. For this issue, we construct novel and powerful constraints to determine the functional map, where multiscale spectral manifold wavelets are required to be preserved at each scale correspondingly. Such constraints allow us to extract significantly more information than previous methods, especially those based on descriptor preservation constraints, and strongly ensure the isometric property of the map. In addition, we also propose a remarkable efficient iterative method to alternatively update the functional maps and pointwise maps. Moreover, when we use the tight wavelet frames in iterations, the computation of the functional maps boils down to a simple filtering procedure with low-pass and various band-pass filters, which avoids time-consuming solving large systems of linear equations commonly presented in functional maps. We demonstrate on a wide variety of experiments with different datasets that our approach achieves significant improvements both in the shape correspondence quality and the computing efficiency.
Hypomimia, also known as "facial masking", is a common symptom of Parkinson's Disease (PD). PD is a neurological disorder characterized by non-motor and motor impairments. Hypomimia is the reduction of f...
详细信息
ISBN:
(纸本)9781665448994
Hypomimia, also known as "facial masking", is a common symptom of Parkinson's Disease (PD). PD is a neurological disorder characterized by non-motor and motor impairments. Hypomimia is the reduction of facial expressiveness, including the emotion expressions. In this work, we explore the use of static and dynamic features for the analysis of evoked facial gestures in PD patients. The main contributions of this work are: (1) We propose a multimodal PD detection system based on both static and dynamic features obtained from evoked face gestures;(2) we propose a novel set of 17 dynamic features to characterize the facial expressiveness and demonstrate that facial dynamics features can be used to improve PD detection;and (3) we analyze different evoked facial expressions and its performance for PD detection. Different expressions activate different Action Units (AUs) and we analyze to what extent each of these AUs contribute to PD detection. The results show that the use of static features generated by pre-trained deep architectures yield up to 77.36% of accuracy for PD detection and the combination with dynamic features improves PD detection by up to 13.46% (from 75.00% to 88.46%). Our experiments also suggest differences in the performance of evoked face gestures in this PD detection task.
Automatically detecting/segmenting object(s) that blend in with their surroundings is difficult for current models. A major challenge is that the intrinsic similarities between such foreground objects and background s...
详细信息
ISBN:
(纸本)9781665445092
Automatically detecting/segmenting object(s) that blend in with their surroundings is difficult for current models. A major challenge is that the intrinsic similarities between such foreground objects and background surroundings make the features extracted by deep model indistinguishable. To overcome this challenge, an ideal model should be able to seek valuable, extra clues from the given scene and incorporate them into a joint learning framework for representation co-enhancement. With this inspiration, we design a novel Mutual Graph Learning (MGL) model, which generalizes the idea of conventional mutual learning from regular grids to the graph domain. Specifically, MGL decouples an image into two task-specific feature maps - one for roughly locating the target and the other for accurately capturing its boundary details - and fully exploits the mutual benefits by recurrently reasoning their high-order relations through graphs. Importantly, in contrast to most mutual learning approaches that use a shared function to model all between-task interactions, MGL is equipped with typed functions for handling different complementary relations to maximize information interactions. Experiments on challenging datasets, including CHAMELEON, CAMO and COD10K, demonstrate the effectiveness of our MGL with superior performance to existing state-of-the-art methods.
Semantic correspondence is a fundamental problem in computervision, which aims at establishing dense correspondences across images depicting different instances under the same category. This task is challenging due t...
详细信息
ISBN:
(数字)9781665445092
ISBN:
(纸本)9781665445092
Semantic correspondence is a fundamental problem in computervision, which aims at establishing dense correspondences across images depicting different instances under the same category. This task is challenging due to large intra-class variations and a severe lack of ground truth. A popular solution is to learn correspondences from synthetic data. However, because of the limited intra-class appearance and background variations within synthetically generated training data, the model's capability for handling "real" image pairs using such strategy is intrinsically constrained. We address this problem with the use of a novel Probabilistic Model Distillation (PMD) approach which transfers knowledge learned by a probabilistic teacher model on synthetic data to a static student model with the use of unlabeled real image pairs. A probabilistic supervision reweighting (PSR) module together with a confidence-aware loss (CAL) is used to mine the useful knowledge and alleviate the impact of errors. Experimental results on a variety of benchmarks show that our PMD achieves state-of-the-art performance. To demonstrate the generalizability of our approach, we extend PMD to incorporate stronger supervision for better accuracy - the probabilistic teacher is trained with stronger key-point supervision. Again, we observe the superiority of our PMD. The extensive experiments verify that PMD is able to infer more reliable supervision signals from the probabilistic teacher for representation learning and largely alleviate the influence of errors in pseudo labels. Cade is avaliable at https://***/fanyang587/PMD.
This paper deals with deep cucumber recognition using CG (computer Graphics)-based dataset generation. The variety and the size of the dataset are crucial in deep learning. Although there are many public datasets for ...
详细信息
ISBN:
(纸本)9784885523434
This paper deals with deep cucumber recognition using CG (computer Graphics)-based dataset generation. The variety and the size of the dataset are crucial in deep learning. Although there are many public datasets for common situations like traffic scenes, we need to make a dataset for a particular scene like cucumber farms. As it is costly and time-consuming to annotate much data manually, we proposed generating images by CG and converting them to realistic ones using adversarial learning approaches. We compare several image conversion methods using real cucumber plant images.
To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-...
详细信息
ISBN:
(纸本)9781665445092
To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning (DenseCL), which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation;and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation.
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain. Though many DA theories and algorithms have been proposed, most of them are tailored into classificatio...
详细信息
ISBN:
(纸本)9781665445092
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain. Though many DA theories and algorithms have been proposed, most of them are tailored into classification settings and may fail in regression tasks, especially in the practical keypoint detection task. To tackle this difficult but significant task, we present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection. Inspired by the latest theoretical work, we first utilize an adversarial regressor to maximize the disparity on the target domain and train a feature generator to minimize this disparity. However, due to the high dimension of the output space, this regressor fails to detect samples that deviate from the support of the source. To overcome this problem, we propose two important ideas. First, based on our observation that the probability density of the output space is sparse, we introduce a spatial probability distribution to describe this sparsity and then use it to guide the learning of the adversarial regressor. Second, to alleviate the optimization difficulty in the high-dimensional space, we innovatively convert the minimax game in the adversarial training to the minimization of two opposite goals. Extensive experiments show that our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computervision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data ...
详细信息
ISBN:
(纸本)9781665445092
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computervision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Developing a robust MC-MOT system, however, is still highly challenging due to many practical issues such as inconsistent lighting conditions, varying object movement patterns, or the trajectory occlusions of the objects between the cameras. To address these problems, this work, therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP) approach 1 to solve the data association task. Compared to existing methods, our new model offers several advantages, including better feature representations and the ability to recover from lost tracks during camera transitions. Moreover, our model works gracefully regardless of the overlapping ratios between the cameras. Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets. Notably, our model works favorably on online settings but can be extended to an incremental approach for large-scale datasets.
暂无评论