Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superi...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at https://***/tokkiwa/imageTextCoding.
Skin diseases are the most common disease on the planet. When detecting skin diseases, dermatologists must have a high degree of expertise and precision, which is why computer-aided diagnosis is so helpful. An approac...
详细信息
image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the ...
详细信息
ISBN:
(纸本)9781665436496
image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the development of many image captioning applications. In this study, a novel automatic image captioning system based on the encoder-decoder approach that can be applied in smartphones is proposed. While high-level visual information is extracted with the ResNet152V2 convolutional neural network in the encoder part, the proposed decoder transforms the extracted visual information into natural expressions of the images. The proposed decoder with the multilayer gated recurrent unit structure allows generating more meaningful captions using the most relevant visual information. The proposed system has been evaluated using different performance metrics on the MSCOCO dataset and it outperforms the state-of-the-art approaches. The proposed system is also integrated with our custom-designed Android application, named IMECA, which generates captions in offline mode unlike similar applications. Thus, image captioning is intended to be practical for more people.
This paper discusses image enhancement, which involves adjusting captured images to improve their visual quality and suitability for display or analysis using techniques like filtering, histogram equalization, and con...
详细信息
A light field is usually represented as a set of multi-view images captured from a two-dimensional (2-D) array of viewpoints and requires a large amount of data compared with a standard 2-D image. We propose a 2-D com...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
A light field is usually represented as a set of multi-view images captured from a two-dimensional (2-D) array of viewpoints and requires a large amount of data compared with a standard 2-D image. We propose a 2-D compatible light-field compression method for encoding a light field as a 2-D monocular image and subsidiary data. In terms of the image quality, we prioritize the central image (regarded as the 2-D monocular image) over the other images in the light field, because the light field is considered an extension of the 2-D monocular image. To this end, we encode and decode the monocular image using a standard image codec and introduce a learned encoder and decoder pair for the subsidiary data. Experimental results indicate that our method achieved promising rate-distortion performance, especially for extremely low bit-rate ranges. Even though our method requires only a small amount of subsidiary data compared with those for the monocular image, the entire light field can be reconstructed with reasonable visual quality.
Today, thanks to the developing technology, changes are experienced in many areas and these changes sometimes bring some problems, issues that need to be solved or improved. In the field of digital image and video pro...
详细信息
ISBN:
(纸本)9781665436496
Today, thanks to the developing technology, changes are experienced in many areas and these changes sometimes bring some problems, issues that need to be solved or improved. In the field of digital image and video processing, the viewer experience has been greatly improved with the development of image and video coding methods and technologies in recent years. However, due to these developments, the importance of video and image quality evaluation techniques has increased. In the field of imaging technology, although the human visual system and human judgment experiments will give us the most accurate results, this rapidly growing field has made human judgment experiments unsuitable with the increasing content. At this point, effective and automatic quality assessment metrics are needed to evaluate and optimize advanced compression technologies in terms of visual perception quality. In this paper, our aim is to compare the performances of some of the new generation image and video quality assessment metrics.
In order to meet the urgent needs of automation and intelligent picking of kiwifruit, aiming at the problems of unreasonable construction of kiwifruit data set, low fruit recognition accuracy and poor spatial position...
详细信息
In this paper, we propose an innovative method for refining segmentation method that improves the visual quality of Video-based Point Cloud Compression (V-PCC) encoder. Recently standardized as an international standa...
In this paper, we propose an innovative method for refining segmentation method that improves the visual quality of Video-based Point Cloud Compression (V-PCC) encoder. Recently standardized as an international standard by MPEG, V-PCC standard provides state-of-the-art performance in compressing dynamic and dense point cloud object. However, lossy V-PCC encoder has an unavoidable problem of visual quality degradation due to lost points. When converting a 3D point cloud to 2D patches in the V-PCC encoder, some points constituting a point cloud are not converted. In particular, in the refining segmentation of the 2D patch generation process, points that are changed the projection plane due to over-smoothing can be discarded. We propose a distance weighted refining segmentation method that reduces the number of missed points to improve visual quality. Experimental results show a noticeable improvement in visual quality with minor coding gain.
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular input and overlook the need to decouple different components, leading to mutual interference during the enhancement of each component. Consequently, issues such as insufficient color restoration or blurred edges may arise. In this paper, we introduce a novel tri-branch network for Single image Dehazing that independently extracts low-frequency, high-frequency, and semantic information from images using three distinct sub-networks. A meticulously designed fusion network is then employed to integrate the information from these three branches to produce the final dehazed image. To facilitate the training of such a complex network, we propose a two-stage training approach. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance.
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we pre...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we present visual analyses on latent features of learned image compression and find that the latent variables are spread over a wide range, which may lead to complex entropy coding processes. To address this, we introduce a Deviation Control (DC) method, which applies a constraint loss on latent features and entropy parameter μ. Training with DC loss, we obtain latent features with smaller values of coding symbols and σ, effectively reducing entropy coding complexity. Our experimental results show that the plug-and-play DC loss reduces entropy coding time by 30-40% and improves compression performance.
暂无评论