The rapid advancement of social networks has significantly altered how people convey their emotions, increasingly through a mix of images and text on social media platforms. visual-textual sentiment analysis has garne...
详细信息
ISBN:
(纸本)9789819796700;9789819796717
The rapid advancement of social networks has significantly altered how people convey their emotions, increasingly through a mix of images and text on social media platforms. visual-textual sentiment analysis has garnered considerable attention because it incorporates visual data into textual sentiment analysis. Moreover, most current visual-textual sentiment analysis approaches underperform because of their limited exploitation of the correlations between these two modalities. Furthermore, current methods for visual analysis tend to focus excessively on extracting image features while neglecting the aesthetic aspects of images. To address these issues, this study introduces a text-oriented transformer with an image aesthetics assessment fusion network mechanism, ter633133_1_En_12_Chaptermed ToTIAN. This approach comprises two main components: aesthetics-oriented visual feature extraction and a text-oriented transformer. It integrates textual information with image aesthetics-which include emotional cues-via a multiattention mechanism, resulting in a comprehensive representation enriched with emotional cues. Extensive experiments on two publicly available datasets confirmed the superior efficacy of the proposed ToTIAN approach compared with the prevalent unimodal and multimodal methods.
Despite the fairly good performance of Convolutional Neural Networks (CNNs) in image classification tasks, existing CNNs do not perform well when handling datasets with Gaussian noise. This results in the instability ...
详细信息
Infrared (IR) imaging sensors, designed to detect the wavelength range between 0.9 mu m and 14 mu m, offer unique advantages over daylight cameras in consumer, industrial, and defense applications. However, IR images ...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
Infrared (IR) imaging sensors, designed to detect the wavelength range between 0.9 mu m and 14 mu m, offer unique advantages over daylight cameras in consumer, industrial, and defense applications. However, IR images lack natural color information and can be challenging for individuals without sensor-specific training to interpret. Consequently, transforming IR images into perceptually realistic color images represents a valuable research endeavor with significant commercial potential. Recently, various studies utilizing deep neural networks for colorizing single-mode (near-IR or thermal) infrared images have been reported. This article will apply a common neural network architecture to images captured with different imaging modes (near-IR, thermal IR, and low-light) for colorization and compare the results. These experiments will examine the influence of perceived wavelength on the colorization process.
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
ISBN:
(纸本)9781665475921
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
The Low Light image Enhancement (LLIE) task aims to restore images with poor lighting conditions and visual effects to images with good lighting conditions and visual effects. However, the enhancement results output b...
详细信息
Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight s...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of high-quality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs.
Contrast enhancement plays a pivotal role in imageprocessing, particularly for improving the visual quality of images in various applications. This paper presents an approach for enhancing such images by employing a ...
详细信息
In the information age, imageprocessing technology has become prevalent across various domains. To enhance image correction, computer vision algorithms can be employed. Traditional methods for structural system ident...
详细信息
Detecting small objects in drone-captured images or aerial videos is challenging due to their minimal representation. As data traverses deep learning networks, the information about small objects can diminish, making ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Detecting small objects in drone-captured images or aerial videos is challenging due to their minimal representation. As data traverses deep learning networks, the information about small objects can diminish, making high-resolution images essential for enhanced detection performance. However, high-resolution images increase computational load undesirably. Leveraging this fact, we propose a streamlined neural network designed specifically for small object detection in high-resolution images. The proposed network encompasses three main components: i) Enhanced High-Resolution processing Module (EHRPM), ii) the Small Object Feature Amplified Feature Pyramid Network (SOFA-FPN) with its Edge Enhancement Module (EEM), Cross Lateral Connection Module (CLCM), and Dual Bottom-up Convolution Module (DBCM), and iii) the Sigmoid Re-weighting Module (SRM). Compared to several state-of-the-art networks, our method delivers superior performance with fewer parameters and a lower computational demand. The source code is available at https://***/datu0615/EHRPM.
In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the N...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.
暂无评论