While several content management systems (CMS) and audience analytics tools are available for digital signage in the market, they are often sold separately and can be expensive. Therefore, this project aims to design ...
详细信息
In the task of infrared weak and small target recognition, in order to improve the image quality and solve the problem of poor learning ability of convolutional neural network (CNN) due to the imbalance of positive an...
详细信息
The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing.st...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing.steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision.
The image Signal Processor (ISP) is a customized device to restore RGB images from the pixel signals of CMOS image sensor. In order to realize this function, a series of processing.units are leveraged to tackle differ...
详细信息
ISBN:
(纸本)9781665448994
The image Signal Processor (ISP) is a customized device to restore RGB images from the pixel signals of CMOS image sensor. In order to realize this function, a series of processing.units are leveraged to tackle different artifacts, such as color shifts, signal noise, moire effects, and so on, that are introduced from the photo-capturing devices. However, tuning each processing.unit is highly complicated and requires a lot of experience and effort from image experts. In this paper, a novel network architecture, CSANet, with emphases on inference speed and high PSNR is proposed for end-to-end learned ISP task. The proposed CSANet applies a double attention module employing both channel and spatial attentions. Particularly, its spatial attention is simplified to a light-weighted dilated depth-wise convolution and still performs as well as others. As proof of performance, CSANet won 2nd place in the Mobile AI 2021 Learned Smartphone ISP Challenge with 1st place PSNR score.
Background objects obscured in some sub-apertures of light-field cameras can be seen by other sub-apertures. Consequently, occluded surfaces are possible to be reconstructed from LF images. So far, Current foreground ...
详细信息
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominan...
详细信息
ISBN:
(纸本)9781665445092
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO [31] and Cityscapes [8] with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training.
This paper addresses the challenge of capturing performance for the clothed humans from sparse-view or monocular videos. Previous methods capture the performance of full humans with a personalized template or recover ...
详细信息
This study aims to develop a system for extracting crucial information from tire sidewalls using Optical Character recognition (OCR). Initially, images of tire were captured manually by smartphone cameras, including R...
详细信息
The Sign Language Translation and Voice Impairment Support System (SLT-VISS) represents a groundbreaking application of deep learning methodology aimed at facilitating communication for individuals with hearing impair...
详细信息
Multi-camera tracking (MCT) plays a crucial role in various computer vision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In thi...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Multi-camera tracking (MCT) plays a crucial role in various computer vision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In this paper, we present an efficient online MCT system that tackles these challenges through online processing. Our system leverages memory-efficient accumulated appearance features to provide stable representations of individuals across cameras and time. By incorporating trajectory validation using hierarchical agglomerative clustering (HAC) in overlapping regions, ID transfers are identified and rectified. Evaluation on the 2024 AI City Challenge Track 1 dataset [39] demonstrates the competitive performance of our system, achieving accurate tracking in both overlapping and non-overlapping camera networks. With a 40.3% HOTA score [29], our system ranked 9th in the challenge. The integration of trajectory validation enhances performance by 8% over the baseline, and the accumulated appearance features further contribute to a 17% improvement.
暂无评论