This work presents AFRIFASHION1600, an openly accessible contemporary African fashion image dataset containing 1600 samples labelled into 8 classes representing some African fashion styles. Each sample is coloured and...
详细信息
ISBN:
(纸本)9781665448994
This work presents AFRIFASHION1600, an openly accessible contemporary African fashion image dataset containing 1600 samples labelled into 8 classes representing some African fashion styles. Each sample is coloured and has an image size of 128 x 128. This is a niche dataset that aims to improve visibility, inclusion, and familiarity of African fashion in computervision ***1600 dataset is available here.
We present M3ED, the first multi-sensor event camera dataset focused on high-speed dynamic motions in robotics applications. M3ED provides high-quality synchronized and labeled data from multiple platforms, including ...
详细信息
Recent Anomaly Detection techniques have progressed the field considerably but at the cost of increasingly complex training pipelines. Such techniques require large amounts of training data, resulting in computational...
详细信息
This paper introduces our solution for Track 2 in AI City Challenge 2023. The task is tracked-vehicle retrieval by natural language descriptions with a real-world dataset of various scenarios and cameras. Our solution...
详细信息
MP4 video files are stored using a tree data structure. These trees contain rich information that can be used for forensic analysis. In this paper, we propose MP4 Tree Network (MTN), an approach based on an end-to-end...
详细信息
Multi-camera person tracking has gained significant attention in recent times, owing to its widespread application in surveillance scenarios. However, this task is challenging due to the variance viewpoints, heavy occ...
详细信息
While recent vision-Language (VL) models excel at open-vocabulary tasks, it is unclear how to use them with specific or uncommon concepts. Personalized Text-to-Image Retrieval (TIR) or Generation (TIG) are recently in...
详细信息
The proceedings contain 2356 papers. The topics discussed include: exploring discontinuity for video frame interpolation;two-view geometry scoring without correspondences;language-guided audio-visual source separation...
ISBN:
(纸本)9798350301298
The proceedings contain 2356 papers. The topics discussed include: exploring discontinuity for video frame interpolation;two-view geometry scoring without correspondences;language-guided audio-visual source separation via trimodal consistency;handwritten text generation from visual archetypes;Bayesian posterior approximation with stochastic ensembles;ERM-KTP: knowledge-level machine unlearning via knowledge transfer;PlenVDB: memory efficient VDB-based radiance fields for fast training and rendering;learning and aggregating lane graphs for urban automated driving;teaching matters: investigating the role of supervision in vision transformers;NeuralField-LDM: scene generation with hierarchical latent diffusion models;cut and learn for unsupervised object detection and instance segmentation;probabilistic debiasing of scene graphs;and unifying layout generation with a decoupled diffusion model.
Self-attention is a corner stone for transformer models. However, our analysis shows that self-attention in vision transformer inference is extremely sparse. When applying a sparsity constraint, our experiments on ima...
详细信息
ISBN:
(纸本)9781665448994
Self-attention is a corner stone for transformer models. However, our analysis shows that self-attention in vision transformer inference is extremely sparse. When applying a sparsity constraint, our experiments on image (ImageNet-1K) and video (Kinetics-400) understanding show we can achieve 95% sparsity on the self-attention maps while maintaining the performance drop to be less than 2 points. This motivates us to rethink the role of self-attention in vision transformer models.
Advances in machine learning and computervision have led to significant improvements in automated facial recognition. Many real-world forensic settings, however, are confronted with challenging low-quality and low-re...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Advances in machine learning and computervision have led to significant improvements in automated facial recognition. Many real-world forensic settings, however, are confronted with challenging low-quality and low-resolution images that often confound even state-of-the art facial recognition. We investigate if and when advances in neural-based image enhancement and restoration can be used to restore degraded images while preserving facial identity for use in forensic facial recognition.
暂无评论