作者:
Bai, ZQYuan, DFInha Univ
Grad Sch Informat Technol & Telecommun Telecommun Engn Lab Inchon South Korea
In this paper, based on the characteristics of Turbo Codes and Trellis Coded Modulation (TCM), an improved method Turbo Trellis Coded Modulation (Turbo TCM) with two different mapping strategies is presented over AWGN...
详细信息
ISBN:
(纸本)0819452114
In this paper, based on the characteristics of Turbo Codes and Trellis Coded Modulation (TCM), an improved method Turbo Trellis Coded Modulation (Turbo TCM) with two different mapping strategies is presented over AWGN (add white gauss noise) channels. The performance of different mapping strategies of Turbo TCM in AWGN channels for image transmission is analyzed and investigated. Furthermore, taking 16QAM modulation of Turbo TCM as the example, two different signal constellations are compared and computer simulation results are showed.
In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to th...
详细信息
ISBN:
(纸本)9798350343557
In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to the subset which is created via calculated text similarities. A BERT-based embedding vector method is applied to the raw and pure texts. As a visual feature, object-based and CLIP-based methods are used to define video frames. According to the results, alignment with CLIP features achieves the best results in the subset created by filtering using raw text.
Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. visual inspection of medicine packaging imag...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. visual inspection of medicine packaging images through keypoint matching techniques offers a promising approach for detecting design inconsistencies that could indicate counterfeit products. However, conventional methods often struggle with high computational costs and reduced accuracy when processingimages of varying quality and perspectives. To address these limitations, we propose the Angle and Scale Voting (ASVote) method, which enhances keypoint-based image matching by introducing a 2D voting mechanism that leverages relative angles and scales of the keypoints to eliminate false matches(outliers) while identifying consistent matches (inliers). This approach significantly improves both processing time and accuracy. Experiments on a real-world dataset of medicine packages show that ASVote improves processing time and accuracy, outperforming conventional methods.
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, v...
详细信息
ISBN:
(纸本)9781728185514
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, very few works have been conducted to evaluate the performance of pre-processing methods. In this paper, we select the source (SRC) videos and various pre-processing approaches to construct the first Pre-processed and Transcoded Video Database (PTVD). Then, we conduct the subjective experiment, showing that compared with the video sent to the codec directly at the same bitrate, the appropriate pre-processing methods indeed improve the perceptual quality. Finally, existing image/video quality metrics are evaluated on our database. The results indicate that the performance of the existing image/video quality assessment (IQA/VQA) approaches remain to be improved. We will make our database publicly available soon.
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate featur...
详细信息
ISBN:
(纸本)9781728185514
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This pape...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This paper presents a method to enhance semantic image communication that combines a novel lossy semantic encoding approach with spatially adaptive semantic image synthesis models. By developing a model-agnostic training augmentation strategy, our approach substantially reduces susceptibility to distortion introduced during encoding, effectively eliminating the need for lossless semantic encoding. Comprehensive evaluation across two spatially adaptive conditioning methods and three popular datasets indicates that this approach enhances semantic image communication at very low bit rate regimes.
Ultra-high resolution image segmentation has attracted increasing attention recently due to its wide applications in various scenarios such as road extraction and urban planning. The ultra-high resolution image facili...
详细信息
ISBN:
(纸本)9781665475921
Ultra-high resolution image segmentation has attracted increasing attention recently due to its wide applications in various scenarios such as road extraction and urban planning. The ultra-high resolution image facilitates the capture of more detailed information but also poses great challenges to the image understanding system. For memory efficiency, existing methods preprocess the global image and local patches into the same size, which can only exploit local patches of a fixed resolution. In this paper, we empirically analyze the effect of different patch sizes and input resolutions on the segmentation accuracy and propose a multi-scale collective fusion (MSCF) method to exploit information from multiple resolutions, which can be end-to-end trainable for more efficient training. Our method achieves very competitive performance on the widely-used DeepGlobe dataset while training on one single GPU.
Reflection removal is a long-standing problem in computer vision. In this paper, we consider the reflection removal problem for stereoscopic images. By exploiting the depth information of stereoscopic images, a new ba...
详细信息
ISBN:
(纸本)9781728180687
Reflection removal is a long-standing problem in computer vision. In this paper, we consider the reflection removal problem for stereoscopic images. By exploiting the depth information of stereoscopic images, a new background edge estimation algorithm based on the Wasserstein Generative Adversarial Network (WGAN) is proposed to distinguish the edges of the background image from the reflection. The background edges are then used to reconstruct the background image. We compare the proposed approach with the state-of-the-art reflection removal methods. Results show that the proposed approach can outperform the traditional single-image based methods and is comparable to the multiple-image based approach while having a much simpler imaging hardware requirement.
This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International conference on visualcommunications and imageprocessing (VCIP). A Cycle-Consistent G...
详细信息
ISBN:
(纸本)9781728180687
This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International conference on visualcommunications and imageprocessing (VCIP). A Cycle-Consistent Generative Adversarial Network (CycleGAN) with cross-scale dense connections is developed to learn the color translation from the NIR domain to the RGB domain based on both paired and unpaired data. Due to the limited number of paired NIR-RGB images, data augmentation via cropping, scaling, contrast and mirroring operations have been adopted to increase the variations of the NIR domain. An alternating training strategy has been designed, such that CycleGAN can efficiently and alternately learn the explicit pixel-level mappings from the paired NIR-RGB data, as well as the implicit domain mappings from the unpaired ones. Based on the validation data, we have evaluated our method and compared it with conventional CycleGAN method in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and angular error (AE). The experimental results validate the proposed colorization framework.
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superi...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at https://***/tokkiwa/imageTextCoding.
暂无评论