Video Action Recognition (VAR) is a challenging task due to its inherent complexities. Though different approaches have been explored in the literature, designing a unified framework to recognize a large number of hum...
详细信息
image denoising is a fundamental task in computervision and imageprocessing, crucial for improving the visual quality and interpretability of images captured in noisy environments. In this research, we propose a qua...
详细信息
This paper introduces a novel technique of computational art with mandala—an iconic heritage of indian folk art. Its novelty lies in several fundamental steps. The first one is fixing the asymmetries and the imperfec...
详细信息
ISBN:
(纸本)9798400716256
This paper introduces a novel technique of computational art with mandala—an iconic heritage of indian folk art. Its novelty lies in several fundamental steps. The first one is fixing the asymmetries and the imperfections in a hand-drawn piece of art based on the notion of a primitive map. The primitive map is described using a novel concept of geometric salience—a set of well-defined salient points on the frontier polygon of a primitive—characterizing the concavities and convexities present in the primitive. The primitive map is also used for the vectorization of a mandala and its succinct representation as a mandala sector graph (MSG), which eventually results in efficient graph operations on an existing artwork to create a new piece of art. The use of frontier polygons in different steps of the algorithm makes it robust and efficient. Experimental results on various datasets demonstrate the potential and versatility of the proposed technique.
Satellite image super resolution is an important task that generates high resolution satellite images from low resolution inputs. Multi-frame super resolution utilizes multiple low-resolution images to generate a sing...
详细信息
Aiming at the problems of slow detection speed and low detection accuracy in existing fatigue driving detection algorithms, a fatigue driving detection algorithm based on YOLOv5 is proposed. In order to improve the fe...
详细信息
Knowledge Distillation is a transfer learning and compression technique that aims to transfer hidden knowledge from a teacher model to a student model. However, this transfer often leads to poor calibration in the stu...
详细信息
ISBN:
(纸本)9798400716256
Knowledge Distillation is a transfer learning and compression technique that aims to transfer hidden knowledge from a teacher model to a student model. However, this transfer often leads to poor calibration in the student model. This can be problematic for high-risk applications that require well-calibrated models to capture prediction uncertainty. To address this issue, we propose a simple and novel technique that enhances the calibration of the student network by using an ensemble of well-calibrated teacher models. We train multiple teacher models using various data-augmentation techniques such as cutout, mixup, CutMix, and AugMix and use their ensemble for knowledge distillation. We evaluate our approach on different teacher-student combinations using CIFAR-10 and CIFAR-100 datasets. Our results demonstrate that our technique improves calibration metrics (such as expected calibration and overconfidence errors) while also increasing the accuracy of the student network.
We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive ze...
详细信息
ISBN:
(纸本)9798400716256
We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive zero-shot generalization performance, their effectiveness is limited when dealing with diverse domains during training and testing. Existing prompt learning techniques overlook the importance of incorporating domain and content information into the prompts, which results in a drop in performance while dealing with such multi-domain data. To address these challenges, we propose a solution that ensures domain-invariant prompt learning while enhancing the expressiveness of visual features. We observe that CLIP’s vision encoder struggles to identify contextual image information, particularly when image patches are jumbled up. This issue is especially severe in optical remote sensing images, where land-cover classes exhibit well-defined contextual appearances. To this end, we introduce C-SAW, a method that complements CLIP with a self-supervised loss in the visual space and a novel prompt learning technique that emphasizes both visual domain and content-specific features. We keep the CLIP backbone frozen and introduce a small set of projectors for both the CLIP encoders to train C-SAW contrastively. Experimental results demonstrate the superiority of C-SAW across multiple remote sensing benchmarks and different generalization tasks.
Domain shifts are a common problem in computervision. As a result, a classifier trained on a source domain cannot perform well on a target domain. Due to this, a source classifier trained to differentiate based on a ...
详细信息
Recent developments in the field of Visual Question Answering (VQA) have witnessed promising improvements in performance through contributions in attention based networks. Most such approaches have focused on unidirec...
详细信息
ISBN:
(纸本)9798400716256
Recent developments in the field of Visual Question Answering (VQA) have witnessed promising improvements in performance through contributions in attention based networks. Most such approaches have focused on unidirectional attention that leverage over attention from textual domain (question) on visual space. This work proposes a multistage co-attention framework. Here, co-attention framework performs both image and text attention. The co-attention mechanism is repeated in multiple stages. Attention on different stages may capture some significant and distinct features for learning better contextual information. Thus, aggregation of attention is performed to preserve the information from different stages. The proposed architecture with multiple stage network could suffer from vanishing or exploding gradients. To prevent this, loss at each of the different stages is computed. Extensive experiments and analysis are performed for validating the effects of aggregated attention and stage-wise loss.
A neuromorphic camera is an image sensor that emulates the human eyes capturing only changes in local brightness levels. They are widely known as event cameras, silicon retinas or dynamic vision sensors (DVS). DVS rec...
详细信息
ISBN:
(纸本)9798400716256
A neuromorphic camera is an image sensor that emulates the human eyes capturing only changes in local brightness levels. They are widely known as event cameras, silicon retinas or dynamic vision sensors (DVS). DVS records asynchronous per-pixel brightness changes, resulting in a stream of events that encode the time, location, and polarity of brightness change. DVS consumes little power and can capture a wider dynamic range with no motion blur and higher temporal resolution than conventional frame-based cameras. Despite yielding a lower bit rate compared to conventional video capture, the present approach of event capture demonstrates enhanced compressibility. Hence, we introduce a novel deep learning-based compression methodology tailored for event data. The proposed technique employs a deep belief network (DBN) to condense the high-dimensional event data into a latent representation, which is subsequently encoded utilising an entropy-based coding method. Notably, our proposed scheme represents one of the initial endeavours to integrate deep learning methodologies for event compression. It achieves a high compression ratio while maintaining good reconstruction quality outperforming state-of-the-art event data coders and other lossless benchmark techniques.
暂无评论