Near-infrared (NIR) imaging can acquire more details and textures with less noise in low-light environments compared to RGB. As a result, it has been widely used in low-light vision scenarios such as CCTV, autonomous ...
详细信息
Most image dehazing deep learning models target synthetic datasets of hazy images, resulting in not considering features in natural hazy images. Leveraging on depth attention with adaptation, we propose a novel dehazi...
详细信息
Magnetic Resonance Imaging (MRI) plays a significant role in medical diagnostics. However, prolonged scan times may hinder its widespread applicability in clinical settings. To mitigate this challenge, certain contras...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
Magnetic Resonance Imaging (MRI) plays a significant role in medical diagnostics. However, prolonged scan times may hinder its widespread applicability in clinical settings. To mitigate this challenge, certain contrasts within multi-contrast MRI protocols can be excluded, and these target contrasts can then be synthesized from the acquired set of source contrasts retrospectively. Recently introduced generative adversarial and diffusion based MRI synthesis models yield enhanced performance against classical methods, yet there can still benefit from technical improvements. In this study, we propose a Brownian diffusion-based multi-contrast MR image synthesis model. Existing diffusion models synthesize images starting from a Gaussian noise sample, so guidance from the source contrast images are weakened. Conditional denoising diffusion models employs a weak conditioning during reverse process within the denoising network that may result in suboptimal sample generation due to poor convergence to target distribution. Capitalizing Brownian diffusion, the proposed model instead incorporates stronger guidance toward the target contrast distribution via a refined diffusion process. Experimental results suggest that our method attains higher performance in noise reduction and capture of tissue structural details over existing methods.
In this paper, we propose a high-frequency guided CNN for video compression artifacts reduction. In the proposed method, high frequency component in Y channel is extracted and used to guide the quality enhancement of ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a high-frequency guided CNN for video compression artifacts reduction. In the proposed method, high frequency component in Y channel is extracted and used to guide the quality enhancement of all Y, U, V channels. As high frequency component contains the edge and contour information of the objects in the image, which is of vital importance to both subjective and objective quality. In general, the proposed method consists of two modules: the high frequency guidance module and the quality enhancement module. The high-frequency guidance module uses multiple octave convolutions to extract the high-frequency component in Y channel and then fuse it into the features of Y, U, and V channels. While in the quality enhancement module, multiple CNN residual blocks are used for the quality enhancement of Y, U, and V channels. The proposed method was integrated into both HM-16.22 and VTM-16.0. The results on the JVET test sequence under All Intra configuration shows the effectiveness of the proposed method. Compared with HEVC, the proposed method achieves the average BD-rate reductions of -12.3%, -22.7% and -23.5% for Y, U and V channels respectively. Compared with VVC, the average BD-rate reductions are -6.7%, -12.3% and -13.2% correspondingly.
image fusion combines images from multiple domains into one image, containing complementary information from source domains. Existing methods take pixel intensity, texture and high-level vision task information as the...
详细信息
The growing demand for trustworthy picture forgery detection techniques to maintain the integrity of visual content across a range of applications is addressed in this work. To improve picture authentication accuracy ...
详细信息
Designing visual content and characters for games is a time consuming task even for designers and illustrators with experience. Most of the game companies and developers use procedural methods to automate the design p...
详细信息
ISBN:
(纸本)9781665450928
Designing visual content and characters for games is a time consuming task even for designers and illustrators with experience. Most of the game companies and developers use procedural methods to automate the design process. The visual content produced by these algorithms is limited in terms of variation. In this paper, we propose to use Generative Adversarial Networks (GANs) for visual content production. Two different rpg and dnd visualimage datasets were collected over the internet for training and 6 different GAN models were trained on them. In 3 of 18 experiments, transfer learning methods are used because of the limited datasets. The Frechet Inception Distance metric was used to compare the model results. As a result, SNGAN was the most successful in both datasets. Moreover, the transfer learning method (WGAN-GP, BigGAN) was more successful than the from scratch method.
One method to significantly enhance the performance of Person-ReID models is to leverage segmentation models to extract semantic parsing information, thus emphasizing the characteristics of human body parts. Studies e...
详细信息
ISBN:
(纸本)9798350379808;9798350379792
One method to significantly enhance the performance of Person-ReID models is to leverage segmentation models to extract semantic parsing information, thus emphasizing the characteristics of human body parts. Studies employing this approach typically utilize CNN architectures as the backbone of Person-ReID, integrating human semantic parsing information into the output of the backbone model, resulting in substantial performance improvement. However, Transformer architectures, utilizing self-attention mechanisms and employing different image data processing techniques from CNN architectures, necessitate a direct integration method for human semantic parsing information into the model's input. In this study, we propose a novel, simple, and highly adaptable method for integrating human semantic parsing information into input data, called Dual Semantic Parsing image (DSPI). In this method, a pre-trained segmentation model is employed to extract human pixels from the background and create a new foreground image. The DSPI is formed by horizontally merging the foreground image with the original image, which serves as the input image for the Re-ID model. Our study demonstrates that the Person Re-ID Backbone model ViT+DSPI achieves 93.1% Rank-1 accuracy and 91.1% mAP in the same-domain test of DukeMTMC-ReID, thus attaining state-of-the-art performance. Additionally, significant performance enhancements were observed in cross-domain tests.
We study the problem of online learning of optimal offloading policies for imageprocessing tasks, for minimizing a cost that is weighted sum of transmit energy and object recognition error rate. A mobile node generat...
详细信息
ISBN:
(纸本)9781538674628
We study the problem of online learning of optimal offloading policies for imageprocessing tasks, for minimizing a cost that is weighted sum of transmit energy and object recognition error rate. A mobile node generates imageprocessing tasks that involve object recognition. There exist three options: (i) transmit the image to a remote server for processing with a deep-learning (DL) model, (ii) process locally with a simpler model, (iii) apply a lightweight, error-prone technique for object detection, and if objects are detected, then send image to the server. The proper offloading decision requires knowledge of the transmit energy cost and object recognition error rate for each option. However, these processes are non-stationary due to unpredictable object occurrence, mobility and propagation dynamics, and they depend on the object inference result which is unknown at decision time. We cast the problem as an adversarial multi-armed bandit, in which the EXP3 algorithm achieves sublinear regret. For the constrained problem, we propose an algorithm that extends EXP3 and achieves good regret in the objective and constraint, thus asymptotically learning the optimal static randomized offloading policy, while satisfying the error constraint. Performance is validated via numerical experiments informed by real-life object recognition measurements and models.
This study explores the utilization of the Pyramid Scene Parsing Network (PSPNet) architecture to achieve accurate segmentation of brain tumors in magnetic resonance (MR) images. Experimental evaluations were conducte...
详细信息
ISBN:
(数字)9798350388961
ISBN:
(纸本)9798350388978;9798350388961
This study explores the utilization of the Pyramid Scene Parsing Network (PSPNet) architecture to achieve accurate segmentation of brain tumors in magnetic resonance (MR) images. Experimental evaluations were conducted on different pre-trained backbone network models, including Vgg16, Inceptionv3, Mobilenetv2, Efficientnetb0, Resnet18, Resnet34, Resnet50, Resnet101, Resnext50, and Resnext101, assessing the performance of each model in brain tumor segmentation. The results highlight the VGG16-PSPNet model as the most successful, showcasing high F1-score, mIoU, precision, recall, and accuracy values.
暂无评论