Bolus covers the patient's skin surface in cancer care for desired dose distribution and minimal damage to the healthy tissue. The existing bolus shaping method is mainly a manual process which is inaccurate and i...
详细信息
ISBN:
(纸本)9798350390797;9789532901351
Bolus covers the patient's skin surface in cancer care for desired dose distribution and minimal damage to the healthy tissue. The existing bolus shaping method is mainly a manual process which is inaccurate and inefficient. This paper proposes a model retrieval method based on feature skeletons of the model and model image. Mesh nodes in a bolus model are embedded into a feature space by the spectral analysis. Skeletons are formed from features of the model to build a skeleton base. visual entropies are applied to detect edges of the model image. The edges are then classified into the object and background pixels for contours of the object using a spectral clustering method. The skeleton of the image is compared with skeletons in the model skeleton base to find the best-matched bolus model using an iterative closest point method. The proposed method is verified in the case studies.
image dehazing is a meaningful low-level computer vision task and can be applied to a variety of contexts. In our industrial deployment scenario based on remote sensing (RS) images, the quality of image dehazing direc...
详细信息
Synthetic aperture radar (SAR) images are inherently affected by speckle noise. Deep learning-based methods have shown good potential in image denoising task. Most deep learning methods for denoising focus on additive...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Synthetic aperture radar (SAR) images are inherently affected by speckle noise. Deep learning-based methods have shown good potential in image denoising task. Most deep learning methods for denoising focus on additive Gaussian noise removal. However, SAR images are usually contaminated by non-Gaussian multiplicative speckle noise. In this paper, we propose a novel deep unrolling network named SAR-DURNet to deal with the SAR image despeckling problem. We establish optimization problem of speckle noise removal by using the priori of noise distribution, which can be sovled by half-quadratic splitting (HQS) method with iterative steps. We unroll the iterative process into a trainable deep unrolling network(SAR-DURNet). The parameters of the SAR-DURNet are trained end-to-end with simulated SAR image dataset. Experimental results on simulated test data and real SAR data show that the proposed approach has superior results in terms of quantitative performance metrics and the preservation of intricate visual details, compared to several well-known SAR image despeckling methods.
In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the N...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.
The necessity of secure image transmission and storage has become more urgent in the digital era. In a variety of applications, including medical imaging, military communications, and personal data protection, image e...
详细信息
Intra block copy with local illumination compensation (IBC-LIC) is a coding technique utilized in video coding to compensate for illumination variation between the current block and its prediction block within the pic...
详细信息
The task of image outpainting extends an image beyond its boundaries with semantically plausible content. Recently, Scene Graph Transformer (SGT) introduced a transformer architecture to leverage scene graph guidance ...
详细信息
ISBN:
(纸本)9781728198354
The task of image outpainting extends an image beyond its boundaries with semantically plausible content. Recently, Scene Graph Transformer (SGT) introduced a transformer architecture to leverage scene graph guidance for image outpainting. Despite its success, we identified two shortcomings: (a) SGT uses a positional encoding that was originally proposed for 1D signal;(b) SGT uses a scene graph attention layer that propagates information between neighboring nodes which limited the model to learning local graph features. To address these issues, we propose incorporating Laplacian positional encoding and introducing a multiscale scene graph attention into SGT. Extensive results on MS-COCO and visual Genome show that our proposed approach generates more plausible outpainted images with higher quality.
The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, histo...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals.
Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads...
详细信息
ISBN:
(纸本)9798350302615
Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. To address this issue, we introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process, thereby indirectly introducing intended semantics in ML models. Specifically, an iterative refinement-based annotation method is proposed to optimize data labeling by organizing objects in a classification hierarchy according to their visual properties, ensuring that they are aligned with their linguistic descriptions. Preliminary results verify the effectiveness of the proposed method.
As video conferencing becomes an indispensable part of human's daliy life, how to achieve a high-fidelity calling experience under low bandwidth has been a popular and challenging issue. Deep generative models hav...
详细信息
暂无评论