Imperceptible poisoning attacks on entire datasets have recently been touted as methods for protecting data privacy. However, among a number of defenses preventing the practical use of these techniques, early-stopping...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Imperceptible poisoning attacks on entire datasets have recently been touted as methods for protecting data privacy. However, among a number of defenses preventing the practical use of these techniques, early-stopping stands out as a simple, yet effective defense. To gauge poisons' vulnerability to early-stopping, we benchmark error-minimizing, error-maximizing, and synthetic poisons in terms of peak test accuracy over 100 epochs and make a number of surprising observations. First, we find that poisons that reach a low training loss faster have lower peak test accuracy. Second, we find that a current state-of-the-art error-maximizing poison is 7x less effective when poison training is stopped at epoch 8. Third, we find that stronger, more transferable adversarial attacks do not make stronger poisons. We advocate for evaluating poisons in terms of peak test accuracy.
Weakly supervised object localization (WSOL) aims at predicting object locations in an image using only image-level category labels. Common challenges that image classification models encounter when localizing objects...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Weakly supervised object localization (WSOL) aims at predicting object locations in an image using only image-level category labels. Common challenges that image classification models encounter when localizing objects are, (a) they tend to look at the most discriminative features in an image that confines the localization map to a very small region, (b) the localization maps are class agnostic, and the models highlight objects of multiple classes in the same image and, (c) the localization performance is affected by background noise. To alleviate the above challenges we introduce the following simple changes through our proposed method ViTOL. We leverage the vision-based transformer for self-attention and introduce a patch-based attention dropout layer (p-ADL) to increase the coverage of the localization map and a gradient attention rollout mechanism to generate class-dependent attention maps. We conduct extensive quantitative, qualitative and ablation experiments on the ImageNet-1K and CUB datasets. We achieve state-of-the-art MaxBoxAcc-V2 localization scores of 70.47% and 73.17% on the two datasets respectively.
Video action recognition has been an active area of research for the past several years. However, the majority of research is concentrated on recognizing a diverse range of activities in distinct environments. On the ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Video action recognition has been an active area of research for the past several years. However, the majority of research is concentrated on recognizing a diverse range of activities in distinct environments. On the other hand, Driver Activity recognition (DAR) is significantly more difficult since there is a much finer distinction between various actions. Moreover, training robust DAR models requires diverse training data from multiple sources, which might not be feasible for a centralized setup due to privacy and security concerns. Furthermore, it is critical to develop efficient models due to limited computational resources available on vehicular edge devices. Federated Learning (FL), which allows data parties to collaborate on machine learning models while preserving data privacy and reducing communication requirements, can be used to overcome these challenges. Despite significant progress on various computervision tasks, FL for DAR has been largely unexplored. In this work, we propose an FL-based DAR model and extensively benchmark the model performance on two datasets under various practical setups. Our results indicate that the proposed approach performs competitively under the centralized (non-FL) and decentralized (FL) settings.
Seam carving is a popular technique for content aware image retargeting. It can be used to deliberately manipulate images, for example, change the GPS locations of a building or displace/remove roads in a satellite im...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Seam carving is a popular technique for content aware image retargeting. It can be used to deliberately manipulate images, for example, change the GPS locations of a building or displace/remove roads in a satellite image. This paper proposes a novel approach for detecting and localizing seams in such images. While there are methods to detect seam carving based manipulations, this is the first time that robust localization and detection of seam carving forgery is made possible. We also propose a seam localization score (SLS) metric to evaluate the effectiveness of localization. The proposed method is evaluated extensively on a large collection of images from different sources, demonstrating a high level of detection and localization performance across these datasets. The code and datasets curated during this work will be released to the public.
Multiple datasets and open challenges for object detection have been introduced in recent years. To build more general and powerful object detection systems, in this paper, we construct a new large-scale benchmark ter...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Multiple datasets and open challenges for object detection have been introduced in recent years. To build more general and powerful object detection systems, in this paper, we construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. Specifically, we generate a new taxonomy which unifies the heterogeneous label spaces from different sources. Our BigDetection dataset has 600 object categories and contains over 3.4M training images with 36M bounding boxes. It is much larger in multiple dimensions than previous benchmarks, which offers both opportunities and challenges. Extensive experiments demonstrate its validity as a new benchmark for evaluating different object detection methods and its effectiveness as a pre-training dataset. The code and models are available at https://***/amazonresearch/bigdetection.
This paper reports our approach for the 2022 AI City Challenge - Naturalistic Driving Action recognition (Track 3), where the objective is to detect when and what kinds of actions that a driver performs in a long, unt...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper reports our approach for the 2022 AI City Challenge - Naturalistic Driving Action recognition (Track 3), where the objective is to detect when and what kinds of actions that a driver performs in a long, untrimmed video. Our solution is built upon the single stage ActionFormer detector, in which temporal location and classification are predicted simultaneously for efficiency. The input feature for the detector is extracted offline using our proposed backbone, which we named "ConvNext-Video". However, due to the small size of the dataset, training the model to avoid over-fitting becomes challenging. To address this problem, we focus on training techniques that can improve the generalization of underlying features. Specifically, we utilize two methods: "learning without forgetting" and semi-weak supervised learning on the unlabeled data A2. Finally, we also add a second-stage classifier (SSC) using our ConvNeXt-Video backbone. The SSC Classifer is designed to combine information from multi-clips and multi-view cameras to improve the prediction precision. Our best result achieves 29.1 F1 score on the public test set. Our source code is released at link.
Existing continual learning techniques focus on either task incremental learning (TIL) or class incremental learning (CIL) problem, but not both. CIL and TIL differ mainly in that the task-id is provided for each test...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Existing continual learning techniques focus on either task incremental learning (TIL) or class incremental learning (CIL) problem, but not both. CIL and TIL differ mainly in that the task-id is provided for each test sample during testing for TIL, but not provided for CIL. Continual learning methods intended for one problem have limitations on the other problem. This paper proposes a novel unified approach based on out-of-distribution (OOD) detection and task masking, called CLOM, to solve both problems. The key novelty is that each task is trained as an OOD detection model rather than a traditional supervised learning model, and a task mask is trained to protect each task to prevent forgetting. Our evaluation shows that CLOM outperforms existing state-of-the-art baselines by large margins. The average TIL/CIL accuracy of CLOM over six experiments is 87.6/67.9% while that of the best baselines is only 84.4/55.0%.
Recent methods for Dataset Distillation are able to take in a large set of images of a specific class (e.g., from ImageNet) and synthesize a single image, such that a classifier trained on that image could perform sim...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent methods for Dataset Distillation are able to take in a large set of images of a specific class (e.g., from ImageNet) and synthesize a single image, such that a classifier trained on that image could perform similarly to one trained on the original dataset. It was noticed that the resulting "distilled images" are often quite visually pleasing. In this paper, we describe a simple method for generating tileable distilled textures by sampling random crops from a toroidal canvas of synthetic pixels while enforcing that all such crops serve as effective distilled training data. Such distilled textures not only summarize a given image category in a visually interesting way, but also allow for generation of infinite texture patterns suitable for printing on fabric, clothing, etc. This paper might be just the first step in making the ImageNet dataset into a fashion statement.
Object detection is a classical problem in computervision, and the vast majority of approaches require large annotated datasets for training and evaluation purposes. The most popular representations are bounding boxe...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Object detection is a classical problem in computervision, and the vast majority of approaches require large annotated datasets for training and evaluation purposes. The most popular representations are bounding boxes (BBs), usually defined as the minimal-area rectangle that encompasses the whole object region. However, the annotation process presents some subjectiveness (particularly when occlusions are present), and its quality might get degraded when the annotators get tired. Comparing BBs is crucial for evaluation purposes, and the Intersection-over-Union (IoU) is the standard similarity metric. In this paper, we provide theoretical and experimental results indicating that the IoU can be strongly affected even by small annotation discrepancies in popular datasets used for object detection. As a consequence, the Average Precision (AP) value commonly used to evaluate object detectors is also influenced by annotation bias or noise, particularly for small objects and tighter IoU thresholds.
Machine Learning, and in general Artificial Intelligence approaches, brought a great advance in each and every field of computer Science increasing accuracy levels of predictors in any known problem. Indeed, this evol...
详细信息
ISBN:
(纸本)9781665487399
Machine Learning, and in general Artificial Intelligence approaches, brought a great advance in each and every field of computer Science increasing accuracy levels of predictors in any known problem. Indeed, this evolution enabled the construction of effective frameworks and solutions able to be used in investigative and forensics scenarios for detection of fakes and, in general, manipulations in multimedia contents. On the other hand, can we trust these systems? Is research activity going in the right direction? Are we just taking the low-hanging fruit without taking into account many real-case-in-the-wild situations? The purpose of this paper is to raise an alert to the research community in the specific context of synthetic voice detection, where data available for training is not big enough to give sufficient trust in the techniques available in the literature. To this aim, an exploratory investigation of the most common voice spoofing dataset was carried out and it was surprisingly easy to build simple classifiers without any Deep Learning techniques. Simple considerations on bitrate were sufficient to achieve an effective detection performance.
暂无评论