The rapid advancements in medical imaging have led to a growing demand for high-performance lossless compression of large 3D medical image datasets. Unlike natural images, medical images typically feature three-dimens...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The rapid advancements in medical imaging have led to a growing demand for high-performance lossless compression of large 3D medical image datasets. Unlike natural images, medical images typically feature three-dimensional structures, and high bit-depth, necessitating specialized compression techniques. Based on a decoder-only transformer, we propose a learnable dual-decoder model for lossless compression of 3D medical images. Our approach packs voxels into patches, which are processed by a patch-level decoder to extract the patch feature. The voxels, along with the patch feature, are subsequently fed into a voxel-level decoder to model each voxel. This coarse-to-fine modeling strategy reduces the computational time for each voxel and enables long-range modeling dependencies. Experimental results demonstrate that our proposed model achieves state-of-the-art compression performance, with an approximately 15% improvement in compression performance over the traditional JP3D benchmark on various datasets.
Traffic sign recognition plays a crucial role in self-driving cars, but unfortunately, it is vulnerable to adversarial patches (AP). Although AP can efficiently fool DNN-based models in previous studies, the connectio...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Traffic sign recognition plays a crucial role in self-driving cars, but unfortunately, it is vulnerable to adversarial patches (AP). Although AP can efficiently fool DNN-based models in previous studies, the connection between image forensics and AP detection still needs to be explored. From a high-level point of view, their goals are the same. That is to find tampered regions and prevent false positives in the meantime. A natural question arises: "Is achieving application-agnostic anomaly detection possible?" In this paper, we propose image Forensics Defense Against Adversarial Patch (IDAP), a framework to defend against adversarial patches via generalizable features learned from tampered images. In addition, we incorporate the Hausdorff erosion loss into our network model for joint training to complete the shape of a predicted mask. Extensive experimental comparisons on three datasets, including COCO, DFG, and APRICOT demonstrate that IDAP outperforms state-of-the-art AP detection methods.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
Learned image compression (LIC) methods often employ symmetrical encoder and decoder architectures, evitably increasing decoding time. However, practical scenarios demand an asymmetric design, where the decoder requir...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Learned image compression (LIC) methods often employ symmetrical encoder and decoder architectures, evitably increasing decoding time. However, practical scenarios demand an asymmetric design, where the decoder requires low complexity to cater to diverse low-end devices, while the encoder can accommodate higher complexity to improve coding performance. In this paper, we propose an asymmetric lightweight learned image compression (AsymLLIC) architecture with a novel training scheme, enabling the gradual substitution of complex decoding modules with simpler ones. Building upon this approach, we conduct a comprehensive comparison of different decoder network structures to strike a better trade-off between complexity and compression performance. Experiment results validate the efficiency of our proposed method, which not only achieves comparable performance to VVC but also offers a lightweight decoder with only 51.47 GMACs computation and 19.65M parameters. Furthermore, this design methodology can be easily applied to any LIC models, enabling the practical deployment of LIC techniques.
The proceedings contain 16 papers. The special focus in this conference is on Emotional Intelligence. The topics include: EIDA: Explicit and Implicit-Space Self-supervised Learning for visual Emotion Adaptat...
ISBN:
(纸本)9789819650835
The proceedings contain 16 papers. The special focus in this conference is on Emotional Intelligence. The topics include: EIDA: Explicit and Implicit-Space Self-supervised Learning for visual Emotion Adaptation;a Three Streams Convolutional Transformer Fusion Model for Facial Macro- and Micro-expressions Spotting;facial Action Unit Recognition with Micro-Action-Aware Transformer;local and Global Iterative Adaptation Based on Meta Learning for Source-Free Cross-Corpus Speech Emotion Recognition;decoupled Representation with Multimodal Prompts for Emotion Recognition in Conversation;generative Text Prompts for image Aesthetic Quality Assessment;large Language Model Enhanced Fuzzy Logic Fusion Framework for Stance Detection;skeleton-Based Online Action Detection with Temporal Enhancement;fine-Grained Spatial-Temporal Framework for Engagement Prediction;Multimodal Engagement Recognition by Fusing Transformer and Bi-LSTM;emotional Interaction Hardware Design for Wrist Rehabilitation Based on Secondary Fuzzy Reasoning;attention-Based Audio Depression Recognition Integrating Handcrafted and Deep Features;STC-ND: Leveraging Spatiotemporal Characteristics with NeXtVLAD for Depression Detection from Few-Channel EEG Signals;DepLLM: Fine-Tuning Large Language Models with a Chinese Dialogue Dataset for Depression Diagnosis via Mixture of Specialized Experts.
Information and Communication Technology (ICT), in the modern digital era has become so cutting-edge and touched to each and every field but there is still a disconnect between visually impaired persons and communicat...
详细信息
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make d...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make decision boundaries based on facial identity instead of synthetic artifacts, leading to poor cross-domain performance. To address this issue, we propose FRIDAY, a novel training method that attenuates facial identity utilizing a face recognizer. To be specific, we first train a face recognizer using the same backbone as the Deepfake detector. We then freeze the recognizer and use it during the detector's training to mitigate facial identity information. This is achieved by feeding input images into both the recognizer and the detector, then minimizing the similarity of their feature embeddings using our Facial Identity Attenuating loss. This process encourages the detector to produce embeddings distinct from the recognizer, effectively attenuating facial identity. Comprehensive experiments demonstrate that our approach significantly improves detection performance on both in-domain and cross-domain datasets.
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rat...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the advancement of deep learning techniques, learned image compression (LIC) has surpassed traditional compression methods. However, these methods typically require training separate models to achieve optimal rate-distortion performance, leading to increased time and resource consumption. To tackle this challenge, we propose leveraging multi-gain and inverse multi-gain unit pairs to enable variable rate adaptation within a single model. Nevertheless, experiments have shown that rate-distortion performance may degrade at certain bitrates. Therefore, we introduce weighted probability assignment, where different selection probabilities are assigned during training based on lambda values, to increase the model's training frequency under specific bitrate conditions. To validate our approach, extensive experiments were conducted on Transformer-based and CNN-based models. The experimental results validate the efficiency of our proposed method.
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compress...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
HTTP adaptive streaming (HAS) constructs bitrate ladders to deliver videos with the best possible quality under varying network conditions. Though per-shot content adaptive encoding (CAE) largely improves the compression efficiency by constructing the optimal bitrate ladder for each video shot, it suffers from excessive encoding complexity as all the points in the operating space (typically resolution x bitrate) need to be encoded and compared. To address this issue, this paper proposes an efficient bitrate ladder construction method that encodes only a subset of operating points, then uses curve fitting and inter-curve prediction to estimate other points' RD performance. The proposed method enables low-complexity ladder construction even for high-dimension operating spaces that incorporate dimensions like encoding presets. Experiments show that this method can achieve RD performance comparable to the original per-shot CAE with only 42% encoding points. Even when minimizing the encoding points to 3.6% of the original CAE, it achieves 15% BDRate improvements compared to using the fixed bitrate ladder.
暂无评论