Traffic sign recognition plays a crucial role in self-driving cars, but unfortunately, it is vulnerable to adversarial patches (AP). Although AP can efficiently fool DNN-based models in previous studies, the connectio...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Traffic sign recognition plays a crucial role in self-driving cars, but unfortunately, it is vulnerable to adversarial patches (AP). Although AP can efficiently fool DNN-based models in previous studies, the connection between image forensics and AP detection still needs to be explored. From a high-level point of view, their goals are the same. That is to find tampered regions and prevent false positives in the meantime. A natural question arises: "Is achieving application-agnostic anomaly detection possible?" In this paper, we propose image Forensics Defense Against Adversarial Patch (IDAP), a framework to defend against adversarial patches via generalizable features learned from tampered images. In addition, we incorporate the Hausdorff erosion loss into our network model for joint training to complete the shape of a predicted mask. Extensive experimental comparisons on three datasets, including COCO, DFG, and APRICOT demonstrate that IDAP outperforms state-of-the-art AP detection methods.
When training a Learned image Compression model, the loss function is minimized such that the encoder and the decoder attain a target Rate-Distorsion trade-off. Therefore, a distinct model shall be trained and stored ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
When training a Learned image Compression model, the loss function is minimized such that the encoder and the decoder attain a target Rate-Distorsion trade-off. Therefore, a distinct model shall be trained and stored at the transmitter and receiver for each target rate, fostering the quest for efficient variable bitrate compression schemes. This paper proposes plugging Low-Rank Adapters into a transformer-based pre-trained LIC model and training them to meet different target rates. With our method, encoding an image at a variable rate is as simple as training the corresponding adapters and plugging them into the frozen pre-trained model. Our experiments show performance comparable with state-of-the-art fixed-rate LIC models at a fraction of the training and deployment cost. We publicly released the code at https://***/EIDOSLAB/ALICE.
Lookup tables (LUTs) are commonly used to speed up imageprocessing by handling complex mathematical functions like sine and exponential calculations. They are used in various applications such as camera image process...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Lookup tables (LUTs) are commonly used to speed up imageprocessing by handling complex mathematical functions like sine and exponential calculations. They are used in various applications such as camera imageprocessing, high-dynamic range imaging, and edge-preserving filtering. However, due to the increasing gap between computing and input/output performance, LUTs are becoming less effective. Even though specific circuits like SIMD can improve LUT efficiency, they still need to bridge the performance gap fully. The gap makes it difficult to choose between direct numerical and LUT calculations. For this problem, a register-LUTs method with the nearest neighbor was proposed;however, it is limited for functions with narrow-range values approaching zero. In this paper, we propose a method for using register LUTs to process images efficiently over a wide range of values. Our contributions include proposing register-LUT with linear interpolation for efficient computation, using a smaller data type for further efficiency, and suggesting an efficient data retrieving method.
Learned image compression (LIC) methods often employ symmetrical encoder and decoder architectures, evitably increasing decoding time. However, practical scenarios demand an asymmetric design, where the decoder requir...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Learned image compression (LIC) methods often employ symmetrical encoder and decoder architectures, evitably increasing decoding time. However, practical scenarios demand an asymmetric design, where the decoder requires low complexity to cater to diverse low-end devices, while the encoder can accommodate higher complexity to improve coding performance. In this paper, we propose an asymmetric lightweight learned image compression (AsymLLIC) architecture with a novel training scheme, enabling the gradual substitution of complex decoding modules with simpler ones. Building upon this approach, we conduct a comprehensive comparison of different decoder network structures to strike a better trade-off between complexity and compression performance. Experiment results validate the efficiency of our proposed method, which not only achieves comparable performance to VVC but also offers a lightweight decoder with only 51.47 GMACs computation and 19.65M parameters. Furthermore, this design methodology can be easily applied to any LIC models, enabling the practical deployment of LIC techniques.
While saliency detection for images has been extensively studied during the past decades, only a little work explores the influence of different viewing devices (i.e., tablet computer, mobile phone) towards human visu...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
While saliency detection for images has been extensively studied during the past decades, only a little work explores the influence of different viewing devices (i.e., tablet computer, mobile phone) towards human visual attention behavior. The lack of research in this field hinders the research progress in cross-device image saliency detection. In this paper, we first establish a novel cross-device saliency detection (CDSD) database based on eye-tracking experiments and investigate subjects' visual attention behavior when using different viewing devices. Then, we evaluate several classic saliency detection models using the CDSD database and the evaluation results indicate that the cross-device performance of these models need further improvement. Finally, some meaningful discussions are provided which might enlighten the design of cross-device saliency detection model. The proposed CDSD database will be made publicly available.
Depth estimation of light field images is a crucial technique in various applications, including 3D reconstruction, autonomous driving, and object tracking. However, current deep-learning methods ignore the geometric ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Depth estimation of light field images is a crucial technique in various applications, including 3D reconstruction, autonomous driving, and object tracking. However, current deep-learning methods ignore the geometric information of the light field image and are limited to learning repetitive textures, which leads to inaccurate estimates of depth. The paper proposes a light field depth estimation network that fuses multi-scale semantic information with geometric information to address the problem of non-adaptation for repeated texture regions. The main focus of the network is the semantic and geometric information fusion (SGI) module, which can adaptively combine semantic and geometric information to improve the efficiency of cost aggregation. Furthermore, SGI module establishes a direct link between feature extraction and cost aggregation, providing feedback for feature extraction and guiding more efficient feature extraction. The experimental results from the optical field synthesis dataset HCI 4D demonstrate that the method has high accuracy and generalisation performance.
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make d...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make decision boundaries based on facial identity instead of synthetic artifacts, leading to poor cross-domain performance. To address this issue, we propose FRIDAY, a novel training method that attenuates facial identity utilizing a face recognizer. To be specific, we first train a face recognizer using the same backbone as the Deepfake detector. We then freeze the recognizer and use it during the detector's training to mitigate facial identity information. This is achieved by feeding input images into both the recognizer and the detector, then minimizing the similarity of their feature embeddings using our Facial Identity Attenuating loss. This process encourages the detector to produce embeddings distinct from the recognizer, effectively attenuating facial identity. Comprehensive experiments demonstrate that our approach significantly improves detection performance on both in-domain and cross-domain datasets.
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rat...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rate-distortion performance, leading to increased time and resource consumption. To tackle this challenge, we propose leveraging multi-gain and inverse multi-gain unit pairs to enable variable rate adaptation within a single model. Nevertheless, experiments have shown that rate-distortion performance may degrade at certain bitrates. Therefore, we introduce weighted probability assignment, where different selection probabilities are assigned during training based on lambda values, to increase the model's training frequency under specific bitrate conditions. To validate our approach, extensive experiments were conducted on Transformer-based and CNN-based models. The experimental results validate the efficiency of our proposed method.
暂无评论