In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the N...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.
The necessity of secure image transmission and storage has become more urgent in the digital era. In a variety of applications, including medical imaging, military communications, and personal data protection, image e...
详细信息
Intra block copy with local illumination compensation (IBC-LIC) is a coding technique utilized in video coding to compensate for illumination variation between the current block and its prediction block within the pic...
详细信息
The task of image outpainting extends an image beyond its boundaries with semantically plausible content. Recently, Scene Graph Transformer (SGT) introduced a transformer architecture to leverage scene graph guidance ...
详细信息
ISBN:
(纸本)9781728198354
The task of image outpainting extends an image beyond its boundaries with semantically plausible content. Recently, Scene Graph Transformer (SGT) introduced a transformer architecture to leverage scene graph guidance for image outpainting. Despite its success, we identified two shortcomings: (a) SGT uses a positional encoding that was originally proposed for 1D signal;(b) SGT uses a scene graph attention layer that propagates information between neighboring nodes which limited the model to learning local graph features. To address these issues, we propose incorporating Laplacian positional encoding and introducing a multiscale scene graph attention into SGT. Extensive results on MS-COCO and visual Genome show that our proposed approach generates more plausible outpainted images with higher quality.
Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads...
详细信息
ISBN:
(纸本)9798350302615
Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. To address this issue, we introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process, thereby indirectly introducing intended semantics in ML models. Specifically, an iterative refinement-based annotation method is proposed to optimize data labeling by organizing objects in a classification hierarchy according to their visual properties, ensuring that they are aligned with their linguistic descriptions. Preliminary results verify the effectiveness of the proposed method.
The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, histo...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals.
As video conferencing becomes an indispensable part of human's daliy life, how to achieve a high-fidelity calling experience under low bandwidth has been a popular and challenging issue. Deep generative models hav...
详细信息
Texture transfer is a crucial approach in imageprocessing which enables smooth transfer of textures between images while preserving visual qualities like color fidelity and texture details. This feature is essential ...
详细信息
Versatile Video Coding (VVC) has adopted a quad-Tree with a nested multi-Type tree (QTMT) partition structure to improve the rate-distortion (RD) performance, but this greatly increases complexity due to the brute-for...
详细信息
Based on human visual systems, imageprocessing algorithms, and efficient hardware implementation methodologies are proposed to optimize the image qualities of AR displays according to the changes in ambient lights. T...
详细信息
ISBN:
(纸本)9798350327038
Based on human visual systems, imageprocessing algorithms, and efficient hardware implementation methodologies are proposed to optimize the image qualities of AR displays according to the changes in ambient lights. To this end, methods are described to improve the image qualities perceived by humans. In addition, the delta look-up table is presented to minimize the number of additional circuits without significant changes in existing hardware. HOSA, an image quality assessment based on the human visual system is used to verify the image qualities for the extreme ambient light conditions.
暂无评论