Long-tailed imbalance distribution is a common issue in practical computervision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-w...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Long-tailed imbalance distribution is a common issue in practical computervision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer learning, and feature augmentation. In recent years, diffusion models have shown an impressive generation ability in many sub-problems of deep computervision. However, its powerful generation has not been explored in long-tailed problems. We propose a new approach, the Latent-based Diffusion Model for Long-tailed recognition (LDMLR), as a feature augmentation method to tackle the issue. First, we encode the imbalanced dataset into features using the baseline model. Then, we train a Denoising Diffusion Implicit Model (DDIM) using these encoded features to generate pseudo-features. Finally, we train the classifier using the encoded and pseudo-features from the previous two steps. The model’s accuracy shows an improvement on the CIFAR-LT and ImageNet-LT datasets by using the proposed method.
Deep generative frameworks including GANs and normalizing flow models have proven successful at filling in missing values in partially observed data samples by effectively learning -either explicitly or implicitly- co...
详细信息
ISBN:
(纸本)9781665448994
Deep generative frameworks including GANs and normalizing flow models have proven successful at filling in missing values in partially observed data samples by effectively learning -either explicitly or implicitly- complex, high-dimensional statistical distributions. In tasks where the data available for learning is only partially observed, however, their performance decays monotonically as a function of the data missingness rate. In high missing data rate regimes (e.g., 60% and above), it has been observed that state-of-the-art models tend to break down and produce unrealistic and/or semantically inaccurate data. We propose a novel framework to facilitate the learning of data distributions in high paucity scenarios that is inspired by traditional formulations of solutions to ill-posed problems. The proposed framework naturally stems from posing the process of learning from incomplete data as a joint optimization task of the parameters of the model being learned and the missing data values. The method involves enforcing a prior regularization term that seamlessly integrates with objectives used to train explicit and tractable deep generative frameworks such as deep normalizing flow models. We demonstrate via extensive experimental validation that the proposed framework outperforms competing techniques, particularly as the rate of data paucity approaches unity.
Recent researches on unsupervised domain adaptation (UDA) have demonstrated that end-to-end ensemble learning frameworks serve as a compelling option for UDA tasks. Nevertheless, these end-to-end ensemble learning met...
详细信息
ISBN:
(纸本)9781665448994
Recent researches on unsupervised domain adaptation (UDA) have demonstrated that end-to-end ensemble learning frameworks serve as a compelling option for UDA tasks. Nevertheless, these end-to-end ensemble learning methods often lack flexibility as any modification to the ensemble requires retraining of their frameworks. To address this problem, we propose a flexible ensemble-distillation framework for performing semantic segmentation based UDA, allowing any arbitrary composition of the members in the ensemble while still maintaining its superior performance. To achieve such flexibility, our framework is designed to be robust against the output inconsistency and the performance variation of the members within the ensemble. To examine the effectiveness and the robustness of our method, we perform an extensive set of experiments on both GTA5 -> Cityscapes and SYNTHIA -> Cityscapes benchmarks to quantitatively inspect the improvements achievable by our method. We further provide detailed analyses to validate that our design choices are practical and beneficial. The experimental evidence validates that the proposed method indeed offer superior performance, robustness and flexibility in semantic segmentation based UDA tasks against contemporary baseline methods.
Optimizing the channel counts for different layers of a CNN has shown great promise in improving the efficiency of CNNs at test-time. However, these methods often introduce large computational overhead (e.g., an addit...
详细信息
ISBN:
(纸本)9781665448994
Optimizing the channel counts for different layers of a CNN has shown great promise in improving the efficiency of CNNs at test-time. However, these methods often introduce large computational overhead (e.g., an additional 2x FLOPs of standard training). Minimizing this overhead could therefore significantly speed up training. In this work, we propose width transfer, a technique that harnesses the assumptions that the optimized widths (or channel counts) are regular across sizes and depths. We show that width transfer works well across various width optimization algorithms and networks. Specifically, we can achieve up to 320x reduction in width optimization overhead without compromising the top-1 accuracy on ImageNet, making the additional cost of width optimization negligible relative to initial training. Our findings not only suggest an efficient way to conduct width optimization, but also highlight that the widths that lead to better accuracy are invariant to various aspects of network architectures and training data.
In order to reduce traffic congestion and improve the efficiency of traffic light signals, intelligent traffic systems are being developed by researchers, and vehicle counting is one of the key techniques in the syste...
详细信息
ISBN:
(纸本)9781665448994
In order to reduce traffic congestion and improve the efficiency of traffic light signals, intelligent traffic systems are being developed by researchers, and vehicle counting is one of the key techniques in the system. The traditional methods mostly focus on increasing the vehicle counting effectiveness without regard to the program execution efficiency. The practical value of these systems will be reduced if they cannot be operated in real-time on compact IoT device. Therefore, in this paper, we mainly focus on designing a real-time and robust system for the problem of counting specific-movement vehicles. The system is able to detect and track objects in the area of interest, then count those tracked trajectories using the movements. To improve performance of tracking multiple objects, a high recall detection method and an efficient feature matching strategy were proposed. Moreover, to minimize the wrong direction of movement prediction and improve the results of vehicle counting, a cosine similarity-based vehicle counting scheme is applied. Experiments are conducted on AI City 2021 Track-1 dataset. Our method is evaluated on both sides of efficiency and effectiveness.
Recent advances in deep learning and computervision have spawned a new class of media forgeries known as deepfakes, which typically consist of artificially generated human faces or voices. The creation and distributi...
详细信息
ISBN:
(纸本)9781665448994
Recent advances in deep learning and computervision have spawned a new class of media forgeries known as deepfakes, which typically consist of artificially generated human faces or voices. The creation and distribution of deepfakes raise many legal and ethical concerns. As a result, the ability to distinguish between deepfakes and authentic media is vital. While deepfakes can create plausible video and audio, it may be challenging for them to to generate content that is consistent in terms of high-level semantic features, such as emotions. Unnatural displays of emotion, measured by features such as valence and arousal, can provide significant evidence that a video has been synthesized. In this paper, we propose a novel method for detecting deepfakes of a human speaker using the emotion predicted from the speaker's face and voice. The proposed technique leverages Long Short-Term Memory (LSTM) networks that predict emotion from audio and video Low-Level Descriptors (LLDs). Predicted emotion in time is used to classify videos as authentic or deepfakes through an additional supervised classifier.
computervision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by p...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
computervision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein – a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computervision tasks: horizon line detection and stereo disparity estimation.
This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contain...
详细信息
ISBN:
(纸本)9781665448994
This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks;Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task.
How does the accuracy of deep neural network models trained to classify clinical images of skin conditions vary across skin color? While recent studies demonstrate computervision models can serve as a useful decision...
详细信息
ISBN:
(纸本)9781665448994
How does the accuracy of deep neural network models trained to classify clinical images of skin conditions vary across skin color? While recent studies demonstrate computervision models can serve as a useful decision support tool in healthcare and provide dermatologist-level classification on a number of specific tasks, darker skin is under-represented in the data. Most publicly available data sets do not include Fitzpatrick skin type labels. We annotate 16,577 clinical images sourced from two dermatology atlases with Fitzpatrick skin type labels and open-source these annotations. Based on these labels, we find that there are significantly more images of light skin types than dark skin types in this dataset. We train a deep neural network model to classify 114 skin conditions and find that the model is most accurate on skin types similar to those it was trained on. In addition, we evaluate how an algorithmic approach to identifying skin tones, individual typology angle, compares with Fitzpatrick skin type labels annotated by a team of human labelers.
Virtually all aspects of modern life depend on space technology. Thanks to the great advancement of computervision in general and deep learning-based techniques in particular, over the decades, the world witnessed th...
详细信息
ISBN:
(纸本)9781665448994
Virtually all aspects of modern life depend on space technology. Thanks to the great advancement of computervision in general and deep learning-based techniques in particular, over the decades, the world witnessed the growing use of deep learning in solving problems for space applications, such as self-driving robot, tracers, insect-like robot on cosmos and health monitoring of spacecraft. These are just some prominent examples that has advanced space industry with the help of deep learning. However, the success of deep learning models requires a lot of training data in order to have decent performance, while on the other hand, there are very limited amount of publicly available space datasets for the training of deep learning models. Currently, there is no public datasets for space-based object detection or instance segmentation, partly because manually annotating object segmentation masks is very time consuming as they require pixel-level labelling, not to mention the challenge of obtaining images from space. In this paper, we aim to fill this gap by releasing a dataset for spacecraft detection, instance segmentation and part recognition. The main contribution of this work is the development of the dataset using images of space stations and satellites, with rich annotations including bounding boxes of spacecrafts and masks to the level of object parts, which are obtained with a mixture of automatic processes and manual efforts. We also provide evaluations with state-of-the-art methods in object detection and instance segmentation as a benchmark for the dataset. The link for downloading the proposed dataset can be found on https://***/Yurushia1998/SatelliteDataset.
暂无评论