Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to n...
详细信息
ISBN:
(纸本)9798350353006
Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set. OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.
Rate-distortion optimization (RDO) is responsible for large gains in image and video compression. While RDO is a standard tool in traditional image and video coding, it is not yet widely used in novel end-to-end train...
详细信息
ISBN:
(纸本)9781665487399
Rate-distortion optimization (RDO) is responsible for large gains in image and video compression. While RDO is a standard tool in traditional image and video coding, it is not yet widely used in novel end-to-end trained neural methods. The major reason is that the decoding function is trained once and does not have free parameters. In this paper, we present RDONet, a network containing state-of-the-art components, which is perceptually optimized and capable of rate-distortion optimization. With this network, we are able to outperform VVC Intra on MS-SSIM and two different perceptual LPIPS metrics. This paper is part of the CLIC challenge, where we participate under the team name RDONet FAU.
Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pr...
详细信息
ISBN:
(纸本)9798350353006
Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally expensive fine-tuning, which is undesirable in many edge deployment cases. In this work, we propose Zero-TPrune, the first zero-shot method that considers both the importance and similarity of tokens in performing token pruning. It leverages the attention graph of pre-trained Transformer models to produce an importance distribution for tokens via our proposed Weighted Page Rank (WPR) algorithm. This distribution further guides token partitioning for efficient similarity-based pruning. Due to the elimination of the fine-tuning overhead, Zero-TPrune can prune large models at negligible computational cost, switch between different pruning configurations at no computational cost, and perform hyperparameter tuning efficiently. We evaluate the performance of Zero-TPrune on vision tasks by applying it to various vision Transformer backbones and testing them on ImageNet. Without any fine-tuning, Zero-TPrune reduces the FLOPs cost of DeiT-S by 34.7% and improves its throughput by 45.3% with only 0.4% accuracy loss. Compared with state- of-the-art pruning methods that require fine-tuning, Zero-TPrune not only eliminates the need for fine-tuning after pruning but also does so with only 0.1% accuracy loss. Compared with state-of-the-art fine-tuning-free pruning methods, Zero-TPrune reduces accuracy loss by up to 49% with similar FLOPs budgets. Project webpage: https://***/zerotprune.
Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate color tone distortion in undere...
详细信息
ISBN:
(纸本)9798350353006
Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate color tone distortion in underexposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and over-exposed regions display opposite color tone distribution shifts, which may not be easily normalized in joint modeling as they usually do not have "normal-exposed" regions/pixels as reference. In this paper, we propose a novel method to enhance images with both over- and under-exposures by learning to estimate and correct such color shifts. Specifically, we first derive the color feature maps of the bright-ened and darkened versions of the input image via a UNet-based network, followed by a pseudo-normal feature generator to produce pseudo-normal color feature maps. We then propose a novel COlor Shift Estimation (COSE) module to estimate the color shifts between the derived brightened ( or darkened) color feature maps and the pseudo-normal color feature maps. The COSE module corrects the estimated color shifts of the over- and under-exposed regions separately. We further propose a novel COlor MOdulation (COMO) module to modulate the separately corrected colors in the over- and under-exposed regions to produce the enhanced image. Comprehensive experiments show that our method outperforms existing approaches. Project web-page: https://***/yiyulics/CSEC.
Digital art restoration has benefited from inpainting models to correct the degradation or missing sections of a painting. This work compares three current state-of-the art models for inpainting of large missing regio...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Digital art restoration has benefited from inpainting models to correct the degradation or missing sections of a painting. This work compares three current state-of-the art models for inpainting of large missing regions. We provide qualitative and quantitative comparison of the performance by CoModGANs, LaMa and GLIDE in inpainting of blurry and missing sections of images. We use Escher's incomplete painting Print Gallery as our test study since it presents several of the challenges commonly present in restorative inpainting.
In our paper, we propose a novel strategy to learn distortion invariant latent representation from painting pictures for visual attention modelling downstream task. In further detail, we design an unsupervised framewo...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In our paper, we propose a novel strategy to learn distortion invariant latent representation from painting pictures for visual attention modelling downstream task. In further detail, we design an unsupervised framework that jointly maximises the mutual information over different painting styles. To show the effectiveness of our approach, we firstly propose a lightweight scanpath baseline model and compare its performance to some state-of-the-art methods. Secondly, we train the encoder of our baseline model on large-scale painting images to study the efficiency of the proposed self-supervised strategy. The lightweight decoder proves effective in learning from the self-supervised pre-trained encoder with better performances than the end-to-end fine-tuned supervised baseline on two painting datasets, including a proposed new visual attention modelling dataset.
Neural Architecture Search (NAS) can automatically design model architecture with better performance. Current researchers have searched for local architecture similar to block, then stacked to construct entire models,...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Neural Architecture Search (NAS) can automatically design model architecture with better performance. Current researchers have searched for local architecture similar to block, then stacked to construct entire models, or searched the entire model based on a manually designed benchmark module. There is no method to directly search the architecture of the global(entire) model at the operation level. The purpose of this article is to search the entire model directly in the operation level search space. We analyzed the search space of past methods which searching for local architectures, then a working mode for global model architecture search named CAM is proposed. Proposed CAM decouples the architectural parameters of the entire model which can complete the entire model architecture search with few architecture parameters. In the experiment, the test error 2.68 % in CIFAR-10 is obtained by the proposed method at the global architecture level, which can compare with the stage-of-art local architecture search methods.
A false negative in object detection describes an object that was not correctly localised and classified by a detector. In prior work, we introduced five 'false negative mechanisms' that identify the specific ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
A false negative in object detection describes an object that was not correctly localised and classified by a detector. In prior work, we introduced five 'false negative mechanisms' that identify the specific component inside the detector architecture that failed to detect the object. Using these mechanisms, we explore how different computervision datasets and their inherent characteristics can influence object detector failures. Specifically, we investigate the false negative mechanisms of Faster R-CNN and RetinaNet across five computervision datasets, namely Microsoft COCO, Pascal VOC, ExDark, ObjectNet, and COD10K. Our results show that object size and class influence the false negative mechanisms of object detectors. We also show that comparing the false negative mechanisms of a single object class across different datasets can highlight potentially unknown biases in datasets.
vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate...
详细信息
ISBN:
(纸本)9798350353006
vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method. The code is available at https://***/MrZihan/HNR-VLN
We present an efficient method for the reconstruction of multispectral information from RGB images, as part of the NTIRE 2022 Spectral Reconstruction Challenge. Given an input image, our method determines a global RGB...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present an efficient method for the reconstruction of multispectral information from RGB images, as part of the NTIRE 2022 Spectral Reconstruction Challenge. Given an input image, our method determines a global RGB-to-spectral linear transformation matrix, based on a search through optimal matrices from training images that share low-level features with the input. The resulting spectral signatures are then adjusted by a global scaling factor, determined through a lightweight SqueezeNet-inspired neural network. By combining the efficiency of linear transformation matrices with the data-driven effectiveness of convolutional neural networks, we are able to achieve superior performance than winners of the previous editions of the challenge.
暂无评论