The problem of recovering missing data has garnered considerable attention due to its significance and challenges in recent times. In particular, the ability to recover clear face images from occluded face images has ...
详细信息
ISBN:
(数字)9783031585357
ISBN:
(纸本)9783031585340;9783031585357
The problem of recovering missing data has garnered considerable attention due to its significance and challenges in recent times. In particular, the ability to recover clear face images from occluded face images has found applications in various domains. One prominent approach in this context is the utilization of autoencoders within the framework of Generative Adversarial Networks (GAN), such as the Context Encoder (CE). The CE is an unsupervised algorithm that leverages an autoencoder as its generator. It is designed to inpaint missing areas in an image based on the information present in the surrounding areas. By learning a compressed representation of the input image, the autoencoder can generate plausible and visually coherent predictions for the missing regions. We found that the initial values of the pixels in the missing area have a significant effect on the quality of the generated images. Careful selection of these initial values proved crucial in achieving accurate and visually appealing inpainted results. Furthermore, we explored various useful loss functions that can be employed within the model. We discovered that the choice of loss function also has a substantial effect on the visual quality of the generated images.
In this paper, we propose an end-to-end image compression framework, which cooperates with the swin-transformer modules to capture the localized and non-localized similarities in image compression. In particular, the ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose an end-to-end image compression framework, which cooperates with the swin-transformer modules to capture the localized and non-localized similarities in image compression. In particular, the swin-transformer modules are deployed in the analysis and synthesis stages, interleaving with convolution layers. The transformer layers are expected to perceive more flexible receptive fields, such that the spatially localized and non-localized redundancies could be more effectively eliminated. The proposed method reveals the excellent capability of signal conjunction and prediction, leading to the improvement of the rate and distortion performance. Experimental results show that the proposed method is superior to the existing methods on both natural scene and screen content images, where 22.46% BD-Rate savings are achieved when compared with the BPG. Over 30% BD-Rate gains could be observed with screen content images when compared with the classical hyper-prior end-to-end coding method.
This study presents a cost-effective approach to constructing simulation systems by integrating 2D visualprocessing with neural network technology for simultaneous 3D point cloud generation and part segmentation, the...
详细信息
ISBN:
(纸本)9798350367164;9798350367157
This study presents a cost-effective approach to constructing simulation systems by integrating 2D visualprocessing with neural network technology for simultaneous 3D point cloud generation and part segmentation, thereby reducing time and costs. We propose multiple task-specific models using 2D RGB images, incorporating attention mechanisms, feature encoding, decoding layers, and innovative loss functions. The optimal models are merged to form an integrated system, evaluated using public and self-built datasets. Our experiments demonstrate that the integrated model effectively reduces training time and Floating Point Operations (FLOP) values while maintaining high accuracy in Critical Hole Distance (CHD), Earth Mover's Distance (EMD), and Intersection over Union (IOU) metrics. This advancement enhances applications in military training, aerospace technology, and disaster response simulation, facilitating faster 3D object construction.
A point cloud's attributes constitutes most of its information content. This is why their efficient compression is of great importance when designing a compression scheme. In this paper, the entropy coding stage o...
详细信息
A tensor display is a type of 3D light field display, composed of multiple transparent screens and a back-light that can render a scene with correct depth, allowing to view a 3D scene without wearing glasses. The anal...
详细信息
ISBN:
(纸本)9781665475921
A tensor display is a type of 3D light field display, composed of multiple transparent screens and a back-light that can render a scene with correct depth, allowing to view a 3D scene without wearing glasses. The analysis of state-of-the-art tensor displays assumes that the content is Lambertian. In order to extend its capabilities, we analyze the limitations of displaying non-Lambertian scenes and propose a new method to factorize the non-Lambertian scenes using disparity analysis. Moreover, we demonstrate a new prototype of a tensor display with three layers of full HD content at 60 fps. Compared with state-of-the-art, the evaluation results verify that the proposed non-Lambertian rendering method can display a higher quality for non-Lambertian scenes on both simulation and a prototyped tensor display.
With the popularity and development of short video applications, the behavior of using mobile devices to shoot and share user-generated content (UGC) videos has become increasingly common. Video quality assessment (VQ...
详细信息
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large ...
详细信息
ISBN:
(纸本)9781665475921
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large number of parameters and computation. To tackle the problem above, we propose a novel framework to extract features more efficiently. Inspired by the idea of deep separable convolution, we improve the standard residual block and propose the inverted bottleneck block (IBNB). The IBNB replaces the small-sized convolution kernel with the large-sized convolution kernel without introducing additional computation. The proposed IBNB proves that large kernel size convolution is available for SISR. Comprehensive experiments demonstrate that our method surpasses most methods by up to 0.10 similar to 0.32dB in quantitative metrics with fewer parameters.
A seam is a set of pixels with minimum energy forming a continuous line in an image. By eliminating or duplicating seams iteratively, an input image can be retargeted. However, this process often results in blurring, ...
详细信息
The image sequences captured by Unmanned Aerial Vehicles (UAVs) can be applied to many computer vision tasks. However, due to the instability of UAV flight, the captured image sequences will deviate from the preset tr...
详细信息
Multi-focused plenoptic images possess many special characteristics related to the micro-images (MIs) array, which are expected to be useful in further increasing its compression performance. Those special characteris...
详细信息
暂无评论