Car counting on drone-based images is a challenging task in computer vision. Most advanced methods for counting are based on density maps. Usually, density maps are first generated by convolving ground truth point map...
详细信息
ISBN:
(纸本)9781728180687
Car counting on drone-based images is a challenging task in computer vision. Most advanced methods for counting are based on density maps. Usually, density maps are first generated by convolving ground truth point maps with a Gaussian kernel for later model learning (generation). Then, the counting network learns to predict density maps from input images (estimation). Most studies focus on the estimation problem while overlooking the generation problem. In this paper, a training framework is proposed to generate density maps by learning and train generation and estimation subnetworks jointly. Experiments demonstrate that our method outperforms other density map-based methods and shows the best performance on drone-based car counting.
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make d...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make decision boundaries based on facial identity instead of synthetic artifacts, leading to poor cross-domain performance. To address this issue, we propose FRIDAY, a novel training method that attenuates facial identity utilizing a face recognizer. To be specific, we first train a face recognizer using the same backbone as the Deepfake detector. We then freeze the recognizer and use it during the detector's training to mitigate facial identity information. This is achieved by feeding input images into both the recognizer and the detector, then minimizing the similarity of their feature embeddings using our Facial Identity Attenuating loss. This process encourages the detector to produce embeddings distinct from the recognizer, effectively attenuating facial identity. Comprehensive experiments demonstrate that our approach significantly improves detection performance on both in-domain and cross-domain datasets.
Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and eas...
详细信息
ISBN:
(纸本)9781728180687
Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and easy to use. In some cases, these tools can be utilized to produce forgeries with the objective to change the semantic meaning of a photo or a video (e.g. fake news). Digital image forensics (DIF) includes two main objectives: the detection (and localization) of forgery and the identification of the origin of the acquisition (i.e. sensor identification). Since 2005, many classical methods for DIF have been designed, implemented and tested on several databases. Meantime, innovative approaches based on deep learning have emerged in other fields a nd have surpassed traditional techniques. In the context of DIF, deep learning methods mainly use convolutional neural networks (CNN) associated with significant preprocessing modules. This is an active domain and two possible ways to operate preprocessing have been studied: prior to the network or incorporated into it. None of the various studies on the digital image forensics provide a comprehensive overview of the preprocessing techniques used with deep learning methods. Therefore, the core objective of this article is to review the preprocessing modules associated with CNN models.
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large ...
详细信息
ISBN:
(纸本)9781665475921
In recent years, a lot of deep convolution neural networks have been successfully applied in single image super-resolution (SISR). Even in the case of using small convolution kernel, those methods still require large number of parameters and computation. To tackle the problem above, we propose a novel framework to extract features more efficiently. Inspired by the idea of deep separable convolution, we improve the standard residual block and propose the inverted bottleneck block (IBNB). The IBNB replaces the small-sized convolution kernel with the large-sized convolution kernel without introducing additional computation. The proposed IBNB proves that large kernel size convolution is available for SISR. Comprehensive experiments demonstrate that our method surpasses most methods by up to 0.10 similar to 0.32dB in quantitative metrics with fewer parameters.
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rat...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rate-distortion performance, leading to increased time and resource consumption. To tackle this challenge, we propose leveraging multi-gain and inverse multi-gain unit pairs to enable variable rate adaptation within a single model. Nevertheless, experiments have shown that rate-distortion performance may degrade at certain bitrates. Therefore, we introduce weighted probability assignment, where different selection probabilities are assigned during training based on lambda values, to increase the model's training frequency under specific bitrate conditions. To validate our approach, extensive experiments were conducted on Transformer-based and CNN-based models. The experimental results validate the efficiency of our proposed method.
With the development of the game industry and the popularization of mobile devices, mobile games have played an important role in people's entertainment life. The aesthetic quality of mobile game images determines...
详细信息
ISBN:
(纸本)9781728185514
With the development of the game industry and the popularization of mobile devices, mobile games have played an important role in people's entertainment life. The aesthetic quality of mobile game images determines the users' Quality of Experience (QoE) to a certain extent. In this paper, we propose a multi-task deep learning based method to evaluate the aesthetic quality of mobile game images in multiple dimensions (i.e. the fineness, color harmony, colorfulness, and overall quality). Specifically, we first extract the quality-aware feature representation through integrating the features from all intermediate layers of the convolution neural network (CNN) and then map these quality-aware features into the quality score space in each dimension via the quality regressor module, which consists of three fully connected (FC) layers. The proposed model is trained through a multi-task learning manner, where the quality-aware features are shared by different quality dimension prediction tasks, and the multi-dimensional quality scores of each image are regressed by multiple quality regression modules respectively. We further introduce an uncertainty principle to balance the loss of each task in the training stage. The experimental results show that our proposed model achieves the best performance on the Multi-dimensional Aesthetic assessment for Mobile Game image database (MAMG) among state-of-the-art image quality assessment (IQA) algorithms and aesthetic quality assessment (AQA) algorithms.
With the increasing popularity of commercial depth cameras, 3D reconstruction of dynamic scenes has aroused widespread interest. Although many novel 3D applications have been unlocked, real-time performance is still a...
详细信息
ISBN:
(纸本)9781665475921
With the increasing popularity of commercial depth cameras, 3D reconstruction of dynamic scenes has aroused widespread interest. Although many novel 3D applications have been unlocked, real-time performance is still a big problem. In this paper, a low-cost, real-time system: LiveRecon3D, is presented, with multiple RGB-D cameras connected to one single computer. The goal of the system is to provide an interactive frame rate for 3D content capture and rendering at a reduced cost. In the proposed system, we adopt a scalable volume structure and employ ray casting technique to extract the surface of 3D content. Based on a pipeline design, all the modules in the system run in parallel and are designed to minimize the latency to achieve an interactive frame rate of 30 FPS. At last, experimental results corresponding to implementation with three Kinect v2 cameras are presented to verify the system's effectiveness in terms of visual quality and real-time performance.
One of the principal contradictions these days in the field of v ideo is l ying b etween t he b ooming d emand for evaluating the streaming video quality and the low precision of the Quality of Experience prediction r...
详细信息
ISBN:
(纸本)9781728180687
One of the principal contradictions these days in the field of v ideo is l ying b etween t he b ooming d emand for evaluating the streaming video quality and the low precision of the Quality of Experience prediction results. In this paper, we propose Convolutional Neural Network and Gate Recurrent Unit (CGNN)-QoE, a deep learning QoE model, that can predict overall and continuous scores of video streaming services accurately in real time. We further implement state-of-the-art models on the basis of their works and compare with our method on six public available datasets. In all considered scenarios, the CGNN-QoE outperforms existing methods.
The process of multi-modal image registration is fundamental in remote sensing and visual navigation applications. However, existing image registration methods that are designed for single modality images do not provi...
详细信息
ISBN:
(纸本)9798350343557
The process of multi-modal image registration is fundamental in remote sensing and visual navigation applications. However, existing image registration methods that are designed for single modality images do not provide satisfactory results when applied to multi-modal image registration. In this research, our objective is to achieve highly accurate alignment of both infrared and optical (visible range) images. To accomplish this goal, we explore the effectiveness of the Swin Transformer encoder and cosine loss in enhancing the keypoint-based image registration process. Simulation results show the improvement achieved in multi-modal registration by using a transformer based Siamese network.
Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awarenes...
详细信息
ISBN:
(纸本)9781728180687
Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awareness and the confusion between noise and texture. Thus, we present a low-light image enhancement method that consists of an image disentanglement network and an illumination boosting network. The disentanglement network is first used to decompose the input image into image details and image illumination. The extracted illumination part then goes through a multi-branch enhancement network designed to improve the dynamic range of the image. The multi-branch network extracts multi-level image features and enhances them via numerous subnets. These enhanced features are then fused to generate the enhanced illumination part. Finally, the denoised image details and the enhanced illumination are entangled to produce the normal-light image. Experimental results show that our method can produce visually pleasing images in many public datasets
暂无评论