A scheme for tracking solar coronal mass ejections (CMEs) is presented. The scheme operates on images from the Solar TErrestrial RElations Observatory (STEREO) satellite. The focus of the scheme is tracking the leadin...
详细信息
ISBN:
(纸本)9781665406529
A scheme for tracking solar coronal mass ejections (CMEs) is presented. The scheme operates on images from the Solar TErrestrial RElations Observatory (STEREO) satellite. The focus of the scheme is tracking the leading edge of the CME via a curve-fitting strategy in STEREO images. In addition to describing the scheme, the strengths and weaknesses of the approach are discussed.
Source-free domain adaptation (SFDA) assumes that model adaptation only accesses the well-learned source model and unlabeled target instances for knowledge trans-fer. However, cross-domain distribution shift easily tr...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Source-free domain adaptation (SFDA) assumes that model adaptation only accesses the well-learned source model and unlabeled target instances for knowledge trans-fer. However, cross-domain distribution shift easily triggers invalid discriminative semantics from source model on rec-ognizing the target samples. Hence, understanding the specific content of discriminative pattern and adjusting their representation in target domain become the important key to overcome SFDA. To achieve such a vision, this paper proposes a novel explanation paradigm “Discriminative pattern Calibration (DPC)” mechanism on solving SFDA issue. Concretely, DPC first utilizes learning network to infer the discriminative regions on the target images and specifically emphasizes them in feature space to enhance their representation. Moreover, DPC relies on the attention-reversed mixup mechanism to augment more samples and improve the robustness of the classifier. Considerable experimental results and studies suggest that the effectiveness of our DPC in enhancing the performance of existing SFDA baselines.
Motion style transfer is an important problem in many computer graphics and computervision applications, including human animation, games, and robotics. Most existing deep learning methods for this problem are superv...
详细信息
ISBN:
(纸本)9781665445092
Motion style transfer is an important problem in many computer graphics and computervision applications, including human animation, games, and robotics. Most existing deep learning methods for this problem are supervised and trained by registered motion pairs. In addition, these methods are often limited to yielding a deterministic output, given a pair of style and content motions. In this paper, we propose an unsupervised approach for motion style transfer by synthesizing stylized motions autoregressively using a generative flow model M. M is trained to maximize the exact likelihood of a collection of unlabeled motions, based on an autoregressive context of poses in previous frames and a control signal representing the movement of a root joint. Thanks to invertible flow transformations, latent codes that encode deep properties of motion styles are efficiently inferred by M. By combining the latent codes (from an input style motion S) with the autoregressive context and control signal (from an input content motion C), M outputs a stylized motion which transfers style from S to C. Moreover, our model is probabilistic and is able to generate various plausible motions with a specific style. We evaluate the proposed model on motion capture datasets containing different human motion styles. Experiment results show that our model outperforms the state-of-the-art methods, despite not requiring manually labeled training data.
Anomaly detection in video is a challenging computervision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this pa...
详细信息
ISBN:
(纸本)9781665445092
Anomaly detection in video is a challenging computervision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting.
Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Since semantic knowledge is built on attributes shared between different classes, which ...
详细信息
ISBN:
(纸本)9781665445092
Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Since semantic knowledge is built on attributes shared between different classes, which are highly local, strong prior for localization of object attribute is beneficial for visual-semantic embedding. Interestingly, when recognizing unseen images, human would also automatically gaze at regions with certain semantic clue. Therefore, we introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization based on the class-level attributes for ZSL. We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description. Specifically, the task-dependent attention is learned with the goal-oriented GEM, and the global image features are simultaneously optimized with the regression of local attribute features. Experiments on three ZSL, benchmarks, i.e., CUB, SUN and AWA2, show the superiority or competitiveness of our proposed method against the state-of-the-art ZSI, methods. The ablation analysis on real gaze data CUB-VWSW also validates the benefits and accuracy of our gaze estimation module. This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computervision tasks. The code is available at https://***/osierboy/GEM-ZSL.
Pseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training...
Pseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training. This renders our object segmentation task an open world task. The pseudo depth maps are generated using pretrained networks, which have either been trained with the full intention to generalize to downstream tasks (LeRes and MiDaS), or which have been trained in an unsupervised fashion on video sequences (MonodepthV2). In order to tell our network which object to segment, we provide the network with a single click on the object’s surface on the pseudo depth map of the image as input. We test our approach on two different scenarios: One without the RGB image and one where the RGB image is part of the input. Our results demonstrate a considerably better generalization performance from seen to unseen object types when depth is used. On the Semantic Boundaries Dataset we achieve an improvement from 61.57 to 69.79 IoU score on unseen classes, when only using half of the training classes during training and performing the segmentation on depth maps only.
Traditional classifiers are deployed under closed-set setting, with both training and test classes belong to the same set. However, real-world applications probably face the input of unknown categories, and the model ...
详细信息
ISBN:
(纸本)9781665445092
Traditional classifiers are deployed under closed-set setting, with both training and test classes belong to the same set. However, real-world applications probably face the input of unknown categories, and the model will recognize them as known ones. Under such circumstances, open-set recognition is proposed to maintain classification performance on known classes and reject unknowns. The closed-set models make overconfident predictions over familiar known class instances, so that calibration and thresholding across categories become essential issues when extending to an open-set environment. To this end, we proposed to learn PlaceholdeRs for Open-SEt recognition (PRosER), which prepares for the unknown classes by allocating placeholders for both data and classifier. In detail, learning data placeholders tries to anticipate open-set class data, thus transforms closed-set training into open-set training. Besides, to learn the invariant information between target and non-target classes, we reserve classifier placeholders as the class-specific boundary between known and unknown. The proposed PROS ER efficiently generates novel class by manifold mixup, and adaptively sets the value of reserved open-set classifier during training. Experiments on various datasets validate the effectiveness of our proposed method.
Learning discriminative representation using large-scale face datasets in the wild is crucial for real-world applications, yet it remains challenging. The difficulties lie in many aspects and this work focus on comput...
详细信息
ISBN:
(纸本)9781665445092
Learning discriminative representation using large-scale face datasets in the wild is crucial for real-world applications, yet it remains challenging. The difficulties lie in many aspects and this work focus on computing resource constraint and long-tailed class distribution. Recently, classification-based representation learning with deep neural networks and well-designed losses have demonstrated good recognition performance. However, the computing and memory cost linearly scales up to the number of identities (classes) in the training set, and the learning process suffers from unbalanced classes. In this work, we propose a dynamic class queue (DCQ) to tackle these two problems. Specifically, for each iteration during training, a subset of classes for recognition are dynamically selected and their class weights are dynamically generated on-the-fly which are stored in a queue. Since only a subset of classes is selected for each iteration, the computing requirement is reduced. By using a single server without model parallel, we empirically verify in large-scale datasets that 10% of classes are sufficient to achieve similar performance as using all classes. Moreover, the class weights are dynamically generated in a few-shot manner and therefore suitable for tail classes with only a few instances. We show clear improvement over a strong baseline in the largest public dataset Megaface Challenge2 (MF2) which has 672K identities and over 88% of them have less than 10 instances. Code is available at https ://***/bilylee/DCQ
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation. Recent work has shown that neural...
详细信息
ISBN:
(纸本)9781665445092
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation. Recent work has shown that neural networks are surprisingly effective at the task of compressing many views of a scene into a learned function which maps from a viewing ray to an observed radiance value via volume rendering. Unfortunately, these methods lose all their predictive power once any object in the scene has moved. In this work, we explicitly model rigid motion of objects in the context of neural representations of radiance fields. We show that without any additional human specified supervision, we can reconstruct a dynamic scene with a single rigid object in motion by simultaneously decomposing it into its two constituent parts and encoding each with its own neural representation. We achieve this by jointly optimizing the parameters of two neural radiance fields and a set of rigid poses which align the two fields at each frame. On both synthetic and real world datasets, we demonstrate that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes. Our factored representation furthermore enables animation of unseen object motion.
Currently, image-to-image translation methods have achieved significant performance with the help of deep CNNs and GANs, but most existing models are not suitable for autonomous driving scenarios due to their interpre...
Currently, image-to-image translation methods have achieved significant performance with the help of deep CNNs and GANs, but most existing models are not suitable for autonomous driving scenarios due to their interpretability. In this paper, we develop a high-quality image translator (i.e. night → day), N2D-LPNet, to facilitate nighttime driving scene dperception. Instead of pursuing a complex network structure, we first attempt to explore the inter-frequency relation knowledge to simplify image translation process. Specifically, the lightweight Laplace pyramid is introduced as the backbone architecture to decompose the feature maps of nighttime image into high- and low-frequency components. Considering the similar morphological properties of different frequency components, we design a flexible inter-frequency guiding strategy which utilizes each lower frequency information to refine the higher frequency feature in a progressive manner. Benefiting from the advantage of constraint from inter-frequency priors, our method can process the nighttime image better under the practical driving system while still persevere competitive results. We also discuss the potential value of N2D-LPNet for other high-level vision tasks.
暂无评论