Fundamentally, super-resolution is ill-posed problem because a low-resolution image can be obtained from many high-resolution images. Recent studies for super-resolution cannot create diverse super-resolution images. ...
详细信息
ISBN:
(纸本)9781665448994
Fundamentally, super-resolution is ill-posed problem because a low-resolution image can be obtained from many high-resolution images. Recent studies for super-resolution cannot create diverse super-resolution images. Although SRFlow tried to account for ill-posed nature of the super-resolution by predicting multiple high-resolution images given a low-resolution image, there is room to improve the diversity and visual quality. In this paper, we propose Noise Conditional flow model for Super-Resolution, NCSR, which increases the visual quality and diversity of images through noise conditional layer. To learn more diverse data distribution, we add noise to training data. However, low-qualit . , images are resulted from adding noise. We propose the noise conditional layer to overcome this phenomenon. The noise conditional layer makes our model generate more diverse images with higher visual quality than other works. Furthermore, we show that this layer can overcome data distribution mismatch, a problem that arises in normalizing flow models. With these benefits, NCSR outperforms baseline in diversity and visual quality and achieves better visual quality than traditional GAN-based models. We also get outperformed scores at NTIRE 2021 challenge [21].
Rain streaks bring serious blurring and visual quality degradation, which often vary in size, direction and density. Current CNN-based methods achieve encouraging performance, while are limited to depict rain characte...
详细信息
ISBN:
(纸本)9781665448994
Rain streaks bring serious blurring and visual quality degradation, which often vary in size, direction and density. Current CNN-based methods achieve encouraging performance, while are limited to depict rain characteristics and recover image details in the poor visibility environment. To address these issues, we present a Multi-scale Hourglass Hierarchical Fusion Network ((MHF)-F-2-Net) in end-to-end manner, to exactly captures rain streak features with multi-scale extraction, hierarchical distillation and information aggregation. For better extracting the features, a novel Multi-scale Hourglass Extraction Block (MHEB) is proposed to get local and global features across different scales through down- and up-sample process. Besides, a Hierarchical Attentive Distillation Block (HADB) then employs the dual attention feature responses to adaptively recalibrate the hierarchical features and eliminate the redundant ones. Further, we introduce a Residual Projected Feature Fusion (RPFF) strategy to progressively discriminate feature learning and aggregate different features instead of directly concatenating or adding. Extensive experiments on both synthetic and real rainy datasets demonstrate the effectiveness of the designed (MHF)-F-2-Net by comparing with recent state-of-the-art deraining algorithms.
Deep models trained on large-scale RGB image datasets have shown tremendous success. It is important to apply such deep models to real-world problems. However, these models suffer from a performance bottleneck under i...
详细信息
ISBN:
(纸本)9781665448994
Deep models trained on large-scale RGB image datasets have shown tremendous success. It is important to apply such deep models to real-world problems. However, these models suffer from a performance bottleneck under illumination changes. Thermal IR cameras are more robust against such changes, and thus can be very useful for the real-world problems. In order to investigate efficacy of combining feature-rich visible spectrum and thermal image modalities, we propose an unsupervised domain adaptation method which does not require RGB-to-thermal image pairs. We employ large-scale RGB dataset MS-COCO as source domain and thermal dataset FLIR ADAS as target domain to demonstrate results of our method. Although adversarial domain adaptation methods aim to align the distributions of source and target domains, simply aligning the distributions cannot guarantee perfect generalization to the target domain. To this end, we propose a self-training guided adversarial domain adaptation method to promote generalization capabilities of adversarial domain adaptation methods. To perform self-training, pseudo labels are assigned to the samples on the target thermal domain to learn more generalized representations for the target domain. Extensive experimental analyses show that our proposed method achieves better results than the state-of-theart adversarial domain adaptation methods. The code and models are publicly available.(1)
In this work, we consider two tracks of the 2021 NVIDIA AI City Challenge, the City-Scale Multi-Camera Vehicle Re-identification and Natural language-based Vehicle Retrieval. For the vehicle re-identification task, we...
详细信息
ISBN:
(纸本)9781665448994
In this work, we consider two tracks of the 2021 NVIDIA AI City Challenge, the City-Scale Multi-Camera Vehicle Re-identification and Natural language-based Vehicle Retrieval. For the vehicle re-identification task, we employ the state-of-art Excited Vehicle Re-Identification deep representation learning model coupled with best training practices and domain adaptation techniques to obtain robust embeddings. We further refine the re-identification results through a series of post-processing steps to remove camera and vehicle orientation bias that is inherent in the task of re-identification. We also take advantage of multiple observations of a vehicle using track-level information and finally obtain fine-grained retrieval results. For the task of Natural language-based vehicle retrieval we leverage the recently proposed Contrastive Language-Image Pre-training model and propose a simple yet effective text-based vehicle retrieval system. We compare our performance against the top submissions to the challenge and our systems are ranked 8th in the public leaderboard for both tracks.
Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police sea...
详细信息
ISBN:
(纸本)9781665448994
Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police searching for a suspect vehicle. However, it is challenging due to the ambiguity of language descriptions and the difficulty of processing multi-modal data. To tackle this problem, we propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval. We also propose two task-specific modules to improve performance: a substitution module that helps features from different domains to be embedded in the same space and a future prediction module that learns temporal information. SBnet has been trained using the CityFlow-NL dataset that contains 2,498 tracks of vehicles with three unique natural language descriptions each and tested 530 unique vehicle tracks and their corresponding query sets. SBNet achieved a significant improvement over the baseline in the natural language-based vehicle tracking track in the AI City Challenge 2021. Source Code: https://***/lsrock1/nlp_search
This paper introduces Phase Selective Convolution (PSC), an enhanced convolution for more deliberate utilization of activations in convolutional networks. Unlike conventional use of convolutions with activation functi...
详细信息
ISBN:
(纸本)9781665448994
This paper introduces Phase Selective Convolution (PSC), an enhanced convolution for more deliberate utilization of activations in convolutional networks. Unlike conventional use of convolutions with activation functions, PSC preserves the full space of activations while supporting desirable model nonlinearity. Similar to several other network operations, e.g., the ReLU operation, at the time of their introduction, PSC may not execute as efficiently on platforms without hardware specialization support. As a first step in addressing the need for optimization, we propose a hardware acceleration scheme to enable the intended efficiency for PSC execution. Moreover, we propose a PSC deployment strategy, with which PSC is applied only to selected layers of the networks, to avoid excessive increase in the total model size. To evaluate the results, we apply PSC as a drop-in replacement for selected convolution layers in several networks without affecting their macro network architectures. In particular, PSC-enhanced ResNets achieve higher accuracies by 1.0-2.0% and 0.7-1.0% on CIFAR-100 and ImageNet, respectively, in Pareto efficiency. PSC-enhanced MobileNets (V2 and V3 Large) and MobileNetV3 (Small) achieve 0.9-1.0% and 1.8% accuracy gains, respectively, on ImageNet at little (0.2-0.7%) total model size increase.
Multi-Target Multi-Camera Tracking has a wide range of applications and is the basis for many advanced inferences and predictions. This paper describes our solution to the Track 3 multi-camera vehicle tracking task in...
详细信息
ISBN:
(纸本)9781665448994
Multi-Target Multi-Camera Tracking has a wide range of applications and is the basis for many advanced inferences and predictions. This paper describes our solution to the Track 3 multi-camera vehicle tracking task in 2021 AI City Challenge (AICITY21). This paper proposes a multi-target multi-camera vehicle tracking framework guided by the crossroad zones. The framework includes: (1) Use mature detection and vehicle re-identification models to extract targets and appearance features. (2) Use modified JDE-Tracker (without detection module) to track single-camera vehicles and generate single-camera tracklets. (3) According to the characteristics of the crossroad, the Tracklet Filter Strategy and the Direction Based Temporal Mask are proposed. (4) Propose Sub-clustering in Adjacent Cameras for multi-camera tracklets matching. Through the above techniques, our method obtained an IDF1 score of 0.8095, ranking first on the leaderboard (1). The code will be released later.
Forecasting head pose future states is a novel task in computervision. Since future may have many possibilities, and the logical results are much more important than the impractical ones, the forecasting results for ...
详细信息
ISBN:
(纸本)9781665448994
Forecasting head pose future states is a novel task in computervision. Since future may have many possibilities, and the logical results are much more important than the impractical ones, the forecasting results for most of the scenarios should be not only diverse but also logically realistic. These requirements pose a real challenge to the current methods, which motivates us to seek for better head pose representation and methods to restrict the forecasting reasonably. In this paper, we adopt a spatial-temporal graph to model the interdependencies between the distribution of landmarks and head pose angles. Furthermore, we propose the conditional spatial-temporal variational graph autoencoder (CST-VGAE), a deep conditional generative model for learning restricted one-to-many mappings conditioned on the spatial-temporal graph input. Specifically, we improve the proposed CST-VGAE for the long-term head pose forecasting task in terms of several aspects. First, we introduce a gaze-guiding prior based on the physiology. Then we apply a temporal self-attention and self-supervised learning mechanism to learn the long-range dependencies on the gaze prior. To better model head poses structurally, we introduce a Gaussian Mixture Model (GMM), instead of a fixed Gaussian in the encoded latent space. Experiments demonstrate the effectiveness of the proposed method for the long-term head pose forecasting task. We achieve superior forecasting performance on the benchmark datasets compared to the existing methods.
In the case of bad weather or low lighting conditions, a single sensor may not be able to capture enough information for object identification. Compared with the traditional optical image, synthetic aperture radar (SA...
详细信息
ISBN:
(纸本)9781665448994
In the case of bad weather or low lighting conditions, a single sensor may not be able to capture enough information for object identification. Compared with the traditional optical image, synthetic aperture radar (SAR) imaging has greater advantages, such as the ability to penetrate through fog and smoke. However, SAR images are of low resolution and contaminated by high-level speckle noise. As a result, it is of great difficulty to extract powerful and robust features from the SAR images. In this paper, we explored whether multiple imaging modalities can improve the object detection performance. Here, we propose a Cross Modality Knowledge Distillation (CMKD) paradigm, and explore two different network structures named CMKD-s and CMKD-m for the object classification task. Specifically, CMKD-s transfers the information captured by the two sensors using the online knowledge distillation, which can achieve cross-modal knowledge sharing and enhance the robustness of the aerial view object classification model. Moreover, leveraging the semi-supervised enhanced training, we proposed a novel method named CMKD-m, which strengthens the model for mutual knowledge transfer. Through quantitative comparison, we found that CMKD-s and CMKD-m outperform the method without knowledge transfer, on the NTIRE2021 SAR-EO challenge dataset.
This paper presents a novel deep learning enabled, video based analysis framework for assessing the Unified Parkinson's Disease Rating Scale (UPDRS) that can be used in the clinic or at home. We report results fro...
详细信息
ISBN:
(纸本)9781665448994
This paper presents a novel deep learning enabled, video based analysis framework for assessing the Unified Parkinson's Disease Rating Scale (UPDRS) that can be used in the clinic or at home. We report results from comparing the performance of the framework to that of trained clinicians on a population of 32 Parkinson's disease (PD) patients. In-person clinical assessments by trained neurologists are used as the ground truth for training our framework and for comparing the performance. We find that the standard sit-to-stand activity can be used to evaluate the UPDRS sub-scores of bradykinesia (BRADY) and posture instability and gait disorders (PIGD). For BRADY we find F1-scores of 0.75 using our framework compared to 0.50 for the video based rater clinicians, while for PIGD we find 0.78 for the framework and 0.45 for the video based rater clinicians. We believe our proposed framework has potential to provide clinically acceptable end points of PD in greater granularity without imposing burdens on patients and clinicians, which empowers a variety of use cases such as passive tracking of PD progression in spaces such as nursing homes, in-home self-assessment, and enhanced tele-medicine.
暂无评论