Lossy image compression causes a loss of texture, especially at low bitrate. To mitigate this problem, we propose a novel image compression method that utilizes a reference-based image super-resolution model. We use t...
详细信息
ISBN:
(纸本)9781665448994
Lossy image compression causes a loss of texture, especially at low bitrate. To mitigate this problem, we propose a novel image compression method that utilizes a reference-based image super-resolution model. We use two image compression models and a self texture transfer model. The image compression models encode and decode a whole input image and selected reference patches. The reference patches are small but compressed with high quality. The self texture transfer model transfers the texture of reference patches into similar regions in the compressed image. The experimental results show that our method can reconstruct accurate texture by transferring the texture of reference patches.
In this paper, we propose an online movement-specific vehicle counting system to realize robust traffic flow analysis at crowded intersections. Our proposed framework adopts PP-YOLO as the vehicle detector and adapts ...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose an online movement-specific vehicle counting system to realize robust traffic flow analysis at crowded intersections. Our proposed framework adopts PP-YOLO as the vehicle detector and adapts the Deep-Sort algorithm to perform multi-object tracking. In order to realize online and robust vehicle counting, we further adopt a shape-based movement assignment strategy to differentiate movements and carefully designed spatial constraints to effectively reduce false-positive counts. Our proposed framework achieves the overall S1-score of 0.9467, ranking the first in the AICITY2021-track1 challenge.
Machine Learning models have started to outperform medical experts in some classification tasks. Meanwhile, the question of how these classifiers produce certain results is attracting increasing research attention. Cu...
详细信息
ISBN:
(纸本)9781665448994
Machine Learning models have started to outperform medical experts in some classification tasks. Meanwhile, the question of how these classifiers produce certain results is attracting increasing research attention. Current interpretation methods provide a good starting point in investigating such questions, but they still massively lack the relation to the problem domain. In this work, we present how explanations of an AI system for skin image analysis can be made more domain-specific. We apply the synthesis of Local Interpretable Model-agnostic Explanations (LIME) with the ABCD-rule, a diagnostic approach of dermatologists, and present the results using a Deep Neural Network (DNN) based skin image classifier.
We tackle here a specific, still not widely addressed aspect, of AI robustness, which consists of seeking invariance / insensitivity of model performance to hidden factors of variations in the data. Towards this end, ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We tackle here a specific, still not widely addressed aspect, of AI robustness, which consists of seeking invariance / insensitivity of model performance to hidden factors of variations in the data. Towards this end, we employ a two step strategy that a) does unsupervised discovery, via generative models, of sensitive factors that cause models to under-perform, and b) intervenes models to make their performance invariant to these sensitive factors' influence. We consider 3 separate interventions for robustness, including: data augmentation, semantic consistency, and adversarial alignment. We evaluate our method using metrics that measure trade offs between invariance (insensitivity) and overall performance (utility) and show the benefits of our method for 3 settings (unsupervised, semi-supervised and generalization).
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to c...
详细信息
ISBN:
(纸本)9781538607336
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to categorize risks, analyze which ones are most threatening and likely, and ultimately summarize conclusions for how the field may attempt to stem future harms caused by CV technologies. We develop narrative case studies to provoke dialogue and deeply explore possible risk scenarios we found to be most probable and severe. We arrive at the position that there are serious potentials for CV to cause discriminatory harm and exacerbate cybersecurity issues.
Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in natu...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in naturalistic driving. Specifically, we adopted SwinTransformer as feature extractor, and a single framework to detect boundary and class at the same time. Also, we improve multiple loss function for explicit constraints of embedded feature distributions. Our proposed framework achieves the overall F1 -score of 0.3154 on A2 dataset.
We present an end-to-end system for learning outfit recommendations. The core problem we address is how a customer can receive clothing/accessory recommendations based on a current outfit and what type of item the cus...
详细信息
ISBN:
(纸本)9781665448994
We present an end-to-end system for learning outfit recommendations. The core problem we address is how a customer can receive clothing/accessory recommendations based on a current outfit and what type of item the customer wishes to add to the outfit. Using a repository of coherent and stylish outfits, we leverage self-attention to learn a mapping from the current outfit and the customer-requested category to a visual descriptor output. This output is then fed into nearest-neighbor-based visual search, which, during training, is learned via triplet loss and mini-batch retrievals. At inference time, we use a beam search with a desired outfit composition to generate outfits at scale. Moreover, the attention networks provide a diagnostic look into the recommendation process, serving as a fashion-based sanity check.
As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the r...
详细信息
ISBN:
(纸本)9781665448994
As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the right relevance to each input pixel with respect to the output of the network. In this paper, we focus on Class Activation Mapping (CAM) approaches, which provide an effective visualization by taking weighted averages of the activation maps. To enhance the evaluation and the reproducibility of such approaches, we propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches. To evaluate the appropriateness of the proposal, we compare different CAM-based visualization methods on the entire ImageNet validation set, fostering proper comparisons and reproducibility.
Despite the rapid progress in deep visual recognition, modern computervision datasets significantly overrepresent the developed world and models trained on such datasets underperform on images from unseen geographies...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Despite the rapid progress in deep visual recognition, modern computervision datasets significantly overrepresent the developed world and models trained on such datasets underperform on images from unseen geographies. We investigate the effectiveness of unsupervised domain adaptation (UDA) of such models across geographies at closing this performance gap. To do so, we first curate two shifts from existing datasets to study the Geographical DA problem, and discover new challenges beyond data distribution shift: context shift, wherein object surroundings may change significantly across geographies, and subpopulation shift, wherein the intra-category distributions may shift. We demonstrate the inefficacy of standard DA methods at Geographical DA, highlighting the need for specialized geographical adaptation solutions to address the challenge of making object recognition work for everyone.
Recent work such as StyleCLIP aims to harness the power of CLIP embeddings for controlled manipulations. Although these models are capable of manipulating images based on a text prompt, the success of the manipulation...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent work such as StyleCLIP aims to harness the power of CLIP embeddings for controlled manipulations. Although these models are capable of manipulating images based on a text prompt, the success of the manipulation often depends on careful selection of the appropriate text for the desired manipulation. This limitation makes it particularly difficult to perform text-based manipulations in domains where the user lacks expertise, such as fashion. To address this problem, we propose a method for automatically determining the most successful and relevant text-based edits using a pre-trained StyleGAN model. Our approach consists of a novel mechanism that uses CLIP to guide beam-search decoding, and a ranking method that identifies the most relevant and successful edits based on a list of keywords. We also demonstrate the capabilities of our framework in several domains, including fashion.
暂无评论