Deploying style transfer methods on resource-constrained devices is challenging, which limits their real-world applicability. To tackle this issue, we propose using pruning techniques to accelerate various visual styl...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Deploying style transfer methods on resource-constrained devices is challenging, which limits their real-world applicability. To tackle this issue, we propose using pruning techniques to accelerate various visual style transfer methods. We argue that typical pruning methods may not be well-suited for style transfer methods and present an iterative correlation-based channel pruning (ICCP) strategy for encoder-transform-decoder-based image/video style transfer models. The correlation-based channel regularization preserves the feature distributions for content and style references, and the iterative pruning strategy prevents layer collapse when pruning on the encoder-decoder structure. Experiments demonstrate that the proposed ICCP can generate visual competitive results compared to SOTA style transfer methods and significantly reduces the number of parameters (at least 70K) and inference time. Model is available at https://***/wukx-wukx/ICCP.
Understanding document images, such as invoices, is a difficult task. It necessitates reading the text as well as comprehending the document's general structure. Conventional techniques first extract text using Op...
详细信息
This paper presents a novel approach to image enhancement using Fractional-Order Unsharp Masking (FOUM) combined with Particle Swarm Optimization (PSO). The proposed method aims to improve the quality of digital image...
详细信息
Contrast enhancement plays a pivotal role in imageprocessing, particularly for improving the visual quality of images in various applications. This paper presents an approach for enhancing such images by employing a ...
详细信息
Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice co...
详细信息
ISBN:
(纸本)9798331530143
Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific directions. This brings the need for identifying the observer's motion direction (ego-motion) by merely processingvisual information, which is the key contribution of this paper. Specifically, we introduce Motor Focus, a lightweight image-based framework that predicts the ego-motion -the humans' (and humanoid machines') movement intentions based on their visual feeds, while filtering out camera motion without any camera calibration. To this end, we implement an optical flow-based pixel-wise temporal analysis method to compensate for the camera motion with a Gaussian aggregation to smooth out the movement prediction area. Subsequently, to evaluate the performance, we collect a dataset including 50 clips of pedestrian scenes in 5 different scenarios. We tested this framework with classical feature detectors such as SIFT and ORB to show the comparison. Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness ( SNR = 23dB), confirming its potential to enhance the usability of vision-based assistive navigation tools in complex environments. The code is publicly available at https://***/***/project/VisionGPT.
The high visual quality of modern deepfakes raises significant concerns about the trustworthiness of digital media and makes facial tampering detection more challenging. Although current deep learning-based deepfake d...
详细信息
ISBN:
(纸本)9781728198354
The high visual quality of modern deepfakes raises significant concerns about the trustworthiness of digital media and makes facial tampering detection more challenging. Although current deep learning-based deepfake detectors achieve excellent results when tested on deepfake images or image sequences generated using known methods, generalization-where a trained model is tasked with detecting deepfakes created with previously unseen manipulation techniques-is still a major challenge. In this paper, we investigate the impact of training spatial and spatio-temporal deep learning network architectures in the image noise residual domain using spatial rich model (SRM) filters on generalization performance. To this end, we conduct a series of tests on the manipulation methods of the FaceForensics++, DeeperForensics-1.0 and Celeb-DF datasets, demonstrating the value of image noise residuals and temporal feature exploitation in tackling the generalization task.
Powerful manipulation techniques have made digital image forgeries be easily created and widespread without leaving visual anomalies. The blind localization of tampered regions becomes quite significant for image fore...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Powerful manipulation techniques have made digital image forgeries be easily created and widespread without leaving visual anomalies. The blind localization of tampered regions becomes quite significant for image forensics. In this paper, we propose an effective image tampering localization network (EITLNet) based on a two-branch enhanced transformer encoder with attention-based feature fusion. Specifically, a feature enhancement module is deployed to enhance the feature representation ability of the transformer encoder. The features extracted from RGB and noise streams are fused effectively by the coordinate attention-based fusion module at multiple scales. Extensive experimental results verify that the proposed scheme achieves the state-of-the-art generalization ability and robustness in various benchmark datasets. Code is public at https://***/multimediaFor/EITLNet.
Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level imageprocessing. This paper introduces a novel self-supervised constraint for single imag...
详细信息
ISBN:
(纸本)9798350390155;9798350390162
Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level imageprocessing. This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential moving average to enhance stability. The proposed SSC-SR framework works as a plug-and-play paradigm and can be easily applied to existing SR models. Empirical evaluations reveal that our SSC-SR framework delivers substantial enhancements on a variety of benchmark datasets, achieving an average increase of 0.1 dB over EDSR and 0.06 dB over SwinIR. In addition, extensive ablation studies corroborate the effectiveness of each component in our SSC-SR framework. Codes are available at https://***/Aitical/SSCSR.
This work proposed a new model based on transformers for multimodal image fusion, with explicit attention paid to fusing infrared and visible images toward enhanced detail and information content. This method, which i...
详细信息
Low-light image enhancement aims at improving human perception or the effectiveness of computer vision tasks of images taken in dark. The low-light images are usually seriously lack in visual information. To tackle th...
详细信息
ISBN:
(纸本)9781728198354
Low-light image enhancement aims at improving human perception or the effectiveness of computer vision tasks of images taken in dark. The low-light images are usually seriously lack in visual information. To tackle this problem, we propose a general Low-light image Enhancement Transformer Network (LLIEFormer) with a degraded restoration model in this paper. The network of LLIEFormer synthesizes the advantages of Transformer to extract global information and convolutional neural networks to capture local details. We conduct extensive experiments on various low-illumination enhanced datasets including PairL1.6K and FiveK to demonstrate the effectiveness of our method. The results show that our LLIEFormer has better performance and wider applicability than other advanced methods. Our code will be available at https://***/xunpengyi/LLIEFormer.
暂无评论