While sparse depth hints from LiDAR points have been utilized as guidance to enhance stereo matching, the improvement is hindered by the low density and uneven distribution of those points. To deal with these challeng...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
While sparse depth hints from LiDAR points have been utilized as guidance to enhance stereo matching, the improvement is hindered by the low density and uneven distribution of those points. To deal with these challenges, the sparse LiDAR hints are usually expanded for further processing. However, existing methods use only the local information of a fixed window surrounding the sparse hint, leading to inaccurate propagation that ultimately deteriorates stereo matching results. We introduce a new adaptive LiDAR propagation recurrent network by incorporating global context and local information, propagating the hints with an adaptive deformable window, and iteratively updating a disparity field through a recurrent unit. We have conducted comprehensive experiments on various public datasets. The results show that our method produces better matching quality than existing methods.
We propose an end-to-end learned image data hiding framework that embeds and extracts secrets in the latent representations of a generic neural compressor. By leveraging a perceptual loss function in conjunction with ...
We propose an end-to-end learned image data hiding framework that embeds and extracts secrets in the latent representations of a generic neural compressor. By leveraging a perceptual loss function in conjunction with our proposed message encoder and decoder, our approach simultaneously achieves high image quality and high bit accuracy. Compared to existing techniques, our framework offers superior image secrecy and competitive watermarking robustness in the compressed domain while accelerating the embedding speed by over 50 times. These results demonstrate the potential of combining data hiding techniques and neural compression and offer new insights into developing neural compression techniques and their applications.
Pictorial data is the most expressive representation of an information using the graphics and designs. Mostly pictorial text data which is needed by the user are unable to access due to a language barrier (pictorial d...
详细信息
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superi...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at https://***/tokkiwa/imageTextCoding.
image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the ...
详细信息
ISBN:
(纸本)9781665436496
image captioning is the description of an image with natural language expressions using computer vision and natural language processing fields. Recent advances in hardware and processing power in smartphones lead the development of many image captioning applications. In this study, a novel automatic image captioning system based on the encoder-decoder approach that can be applied in smartphones is proposed. While high-level visual information is extracted with the ResNet152V2 convolutional neural network in the encoder part, the proposed decoder transforms the extracted visual information into natural expressions of the images. The proposed decoder with the multilayer gated recurrent unit structure allows generating more meaningful captions using the most relevant visual information. The proposed system has been evaluated using different performance metrics on the MSCOCO dataset and it outperforms the state-of-the-art approaches. The proposed system is also integrated with our custom-designed Android application, named IMECA, which generates captions in offline mode unlike similar applications. Thus, image captioning is intended to be practical for more people.
Skin diseases are the most common disease on the planet. When detecting skin diseases, dermatologists must have a high degree of expertise and precision, which is why computer-aided diagnosis is so helpful. An approac...
详细信息
This paper discusses image enhancement, which involves adjusting captured images to improve their visual quality and suitability for display or analysis using techniques like filtering, histogram equalization, and con...
详细信息
A light field is usually represented as a set of multi-view images captured from a two-dimensional (2-D) array of viewpoints and requires a large amount of data compared with a standard 2-D image. We propose a 2-D com...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
A light field is usually represented as a set of multi-view images captured from a two-dimensional (2-D) array of viewpoints and requires a large amount of data compared with a standard 2-D image. We propose a 2-D compatible light-field compression method for encoding a light field as a 2-D monocular image and subsidiary data. In terms of the image quality, we prioritize the central image (regarded as the 2-D monocular image) over the other images in the light field, because the light field is considered an extension of the 2-D monocular image. To this end, we encode and decode the monocular image using a standard image codec and introduce a learned encoder and decoder pair for the subsidiary data. Experimental results indicate that our method achieved promising rate-distortion performance, especially for extremely low bit-rate ranges. Even though our method requires only a small amount of subsidiary data compared with those for the monocular image, the entire light field can be reconstructed with reasonable visual quality.
Today, thanks to the developing technology, changes are experienced in many areas and these changes sometimes bring some problems, issues that need to be solved or improved. In the field of digital image and video pro...
详细信息
ISBN:
(纸本)9781665436496
Today, thanks to the developing technology, changes are experienced in many areas and these changes sometimes bring some problems, issues that need to be solved or improved. In the field of digital image and video processing, the viewer experience has been greatly improved with the development of image and video coding methods and technologies in recent years. However, due to these developments, the importance of video and image quality evaluation techniques has increased. In the field of imaging technology, although the human visual system and human judgment experiments will give us the most accurate results, this rapidly growing field has made human judgment experiments unsuitable with the increasing content. At this point, effective and automatic quality assessment metrics are needed to evaluate and optimize advanced compression technologies in terms of visual perception quality. In this paper, our aim is to compare the performances of some of the new generation image and video quality assessment metrics.
In order to meet the urgent needs of automation and intelligent picking of kiwifruit, aiming at the problems of unreasonable construction of kiwifruit data set, low fruit recognition accuracy and poor spatial position...
详细信息
暂无评论