We propose SymDNN, a Deep Neural Network (DNN) inference scheme, to segment an input image into small patches, replace those patches with representative symbols, and use the reconstructed image for CNN inference. This...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose SymDNN, a Deep Neural Network (DNN) inference scheme, to segment an input image into small patches, replace those patches with representative symbols, and use the reconstructed image for CNN inference. This approach of deconstruction of images, and the reconstruction from cluster centroids trained on clean images, enhances robustness against adversarial attacks. The input transform used in SymDNN is learned from very large datasets, making it difficult to approximate for adaptive adversarial attacks. For example, SymDNN achieves 23% and 42% robust accuracy at L-infinity attack strengths of 8/255 and 4/255 respectively, against BPDA under a complete white box setting, where most input processing based defenses break completely. SymDNN is not a future-proof adversarial defense that can defend any attack, but it is one of the few readily usable defenses in resource-limited embedded systems that defends against a wide range of attacks.
Video action recognition has been an active area of research for the past several years. However, the majority of research is concentrated on recognizing a diverse range of activities in distinct environments. On the ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Video action recognition has been an active area of research for the past several years. However, the majority of research is concentrated on recognizing a diverse range of activities in distinct environments. On the other hand, Driver Activity recognition (DAR) is significantly more difficult since there is a much finer distinction between various actions. Moreover, training robust DAR models requires diverse training data from multiple sources, which might not be feasible for a centralized setup due to privacy and security concerns. Furthermore, it is critical to develop efficient models due to limited computational resources available on vehicular edge devices. Federated Learning (FL), which allows data parties to collaborate on machine learning models while preserving data privacy and reducing communication requirements, can be used to overcome these challenges. Despite significant progress on various computervision tasks, FL for DAR has been largely unexplored. In this work, we propose an FL-based DAR model and extensively benchmark the model performance on two datasets under various practical setups. Our results indicate that the proposed approach performs competitively under the centralized (non-FL) and decentralized (FL) settings.
Quantization-Aware Training (QAT) has recently showed a lot of potential for low-bit settings in the context of image classification. Approaches based on QAT are using the Cross Entropy Loss function which is the refe...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Quantization-Aware Training (QAT) has recently showed a lot of potential for low-bit settings in the context of image classification. Approaches based on QAT are using the Cross Entropy Loss function which is the reference loss function in this domain. We investigate quantization-aware training with disentangled loss functions. We qualify a loss to disentangle as it encourages the network output space to be easily discriminated with linear functions. We introduce a new method, Disentangled Loss Quantization Aware Training, as our tool to empirically demonstrate that the quantization procedure benefits from those loss functions. Results show that the proposed method substantially reduces the loss in top-1 accuracy for low-bit quantization on CIFAR10, CIFAR100 and ImageNet. Our best result brings the top-1 Accuracy of a Resnet-18 from 63.1% to 64.0% with binary weights and 2-bit activations when trained on ImageNet.
With the recently massive development in convolution neural networks, numerous lightweight CNN-based image super-resolution methods have been proposed for practical deployments on edge devices. However, most existing ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
With the recently massive development in convolution neural networks, numerous lightweight CNN-based image super-resolution methods have been proposed for practical deployments on edge devices. However, most existing methods focus on one specific aspect: network or loss design, which leads to the difficulty of minimizing the model size. To address the issue, we conclude block devising, architecture searching, and loss design to obtain a more efficient SR structure. In this paper, we proposed an edge-enhanced feature distillation network, named EFDN, to preserve the high-frequency information under constrained resources. In detail, we build an edge-enhanced convolution block based on the existing reparameterization methods. Meanwhile, we propose edge-enhanced gradient loss to calibrate the reparameterized path training. Experimental results show that our edge-enhanced strategies preserve the edge and significantly improve the final restoration quality. Code is available at https://***/icandle/EFDN.
Unsupervised continual learning aims to learn new tasks incrementally without requiring human annotations. However, most existing methods, especially those targeted on image classification, only work in a simplified s...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Unsupervised continual learning aims to learn new tasks incrementally without requiring human annotations. However, most existing methods, especially those targeted on image classification, only work in a simplified scenario by assuming all new data belong to new tasks, which is not realistic if the class labels are not provided. Therefore, to perform unsupervised continual learning in real life applications, an out-of-distribution detector is required at beginning to identify whether each new data corresponds to a new task or already learned tasks, which still remains under-explored yet. In this work, we formulate the problem for Out-of-distribution Detection in Unsupervised Continual Learning (OOD-UCL) with the corresponding evaluation protocol. In addition, we propose a novel OOD detection method by correcting the output bias at first and then enhancing the output confidence for in-distribution data based on task discriminativeness, which can be applied directly without modifying the learning procedures and objectives of continual learning. Our method is evaluated on CIFAR-100 dataset by following the proposed evaluation protocol and we show improved performance compared with existing OOD detection methods under the unsupervised continual learning scenario.
On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these acc...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. In this work, we provide a two-pronged approach to this challenge: (i) a NAS-enabling infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants that provide flexible quality/performance trade-offs on ML accelerators, complementing the existing full and depthwise convolution based IBNs. Using this approach we target a state-of-the-art mobile platform, Google Tensor SoC, and demonstrate neural architectures1 that improve the quality-performance pareto frontier for various computervision (classification, detection, segmentation) as well as natural language processing tasks.
Multi-Target Multi-Camera tracking is a fundamental task for intelligent traffic systems. The track 1 of AI City Challenge 2022 aims at the city-scale multi-camera vehicle tracking task. In this paper we propose an ac...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Multi-Target Multi-Camera tracking is a fundamental task for intelligent traffic systems. The track 1 of AI City Challenge 2022 aims at the city-scale multi-camera vehicle tracking task. In this paper we propose an accurate vehicle tracking system composed of 4 parts, including: (1) State-of-the-art detection and re-identification models for vehicle detection and feature extraction. (2) Single camera tracking, where we introduce augmented tracks prediction and multi-level association method on top of tracking-by-detection paradigm.(3) Zone-based singe-camera track-let merging strategy. (4) Multi-camera spatial-temporal matching and clustering strategy. The proposed system achieves promising results and ranks the second place in Track 1 of the AI City Challenge 2022 with a IDF1 score of 0.8437.
In this paper, we address the problem of cross-modal retrieval in presence of multi-view and multi-label data. For this, we present Multi-view Multi-label Canonical Correlation Analysis (or MVMLCCA), which is a genera...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we address the problem of cross-modal retrieval in presence of multi-view and multi-label data. For this, we present Multi-view Multi-label Canonical Correlation Analysis (or MVMLCCA), which is a generalization of CCA for multi-view data that also makes use of high-level semantic information available in the form of multi-label annotations in each view. While CCA relies on explicit pairings/associations of samples between two views (or modalities), MVMLCCA uses the available multi-label annotations to establish correspondence across multiple (two or more) views without the need of explicit pairing of multi-view samples. Extensive experiments on two multi-modal datasets demonstrate that the proposed approach offers much more flexibility than the related approaches without compromising on scalability and cross-modal retrieval performance. Our code and precomputed features are available at https://***/Rushil231100/MVMLCCA.
Ensuring traffic safety and preventing accidents is a critical goal in daily driving, where the advancement of computervision technologies can be leveraged to achieve this goal. In this paper, we present a multi-view...
详细信息
Identifying players in video is a foundational step in computervision-based sports analytics. Obtaining player identities is essential for analyzing the game and is used in downstream tasks such as game event recogni...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Identifying players in video is a foundational step in computervision-based sports analytics. Obtaining player identities is essential for analyzing the game and is used in downstream tasks such as game event recognition. Transformers are the existing standard in natural language processing (NLP) and are swiftly gaining traction in computervision. Motivated by the increasing success of transformers in computervision, we introduce a transformer network for recognizing players through their jersey numbers in broadcast National Hockey League (NHL) videos. The transformer takes temporal sequences of player frames (called player tracklets) as input and outputs the probabilities of jersey numbers present in the frames. The proposed network performs better than the previous benchmark on the same dataset. We implement a weakly-supervised training approach by generating approximate frame-level labels for jersey number presence and use the frame-level labels for faster training. We also utilize player shifts available in the NHL play-by-play data by reading the game time using optical character recognition (OCR) to get the players on the ice rink at a certain game time. Using player-shifts improved the player identification accuracy by 6%.
暂无评论