Image Quality Assessment (IQA) constitutes a fundamental task within the field of computer vision, yet it remains an unresolved challenge, owing to the intricate distortion conditions, diverse image contents, and limi...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
Image Quality Assessment (IQA) constitutes a fundamental task within the field of computer vision, yet it remains an unresolved challenge, owing to the intricate distortion conditions, diverse image contents, and limited availability of data. Recently, the community has witnessed the emergence of numerous large-scale pretrained foundation models. However, it remains an open problem whether the scaling law in high-level tasks is also applicable to IQA tasks which are closely related to low-level clues. In this paper, we demonstrate that with a proper injection of local distortion features, a larger pretrained vision transformer (ViT) foundation model performs better in IQA tasks. Specifically, for the lack of local distortion structure and inductive bias of the large-scale pretrained ViT, we use another pretrained convolution neural networks (CNNs), which is well known for capturing the local structure, to extract multi-scale image features. Further, we propose a local distortion extractor to obtain local distortion features from the pretrained CNNs and a local distortion injector to inject the local distortion features into ViT. By only training the extractor and injector, our method can benefit from the rich knowledge in the powerful foundation models and achieve state-of-the-art performance on popular IQA datasets, indicating that IQA is not only a low-level problem but also benefits from stronger high-level features drawn from large-scale pretrained models. Codes are publicly available at: https://***/NeosXu/LoDa.
This work presents Adaptive local-then-Global Merging (ALGM), a token reduction method for semantic segmentation networks that use plain Vision Transformers. ALGM merges tokens in two stages: (1) In the first network ...
详细信息
ISBN:
(纸本)9798350353006
This work presents Adaptive local-then-Global Merging (ALGM), a token reduction method for semantic segmentation networks that use plain Vision Transformers. ALGM merges tokens in two stages: (1) In the first network layer, it merges similar tokens within a small local window and (2) halfway through the network, it merges similar tokens across the entire image. This is motivated by an analysis in which we found that, in those situations, tokens with a high cosine similarity can likely be merged without a drop in segmentation quality. With extensive experiments across multiple datasets and network configurations, we show that ALGM not only significantly improves the throughput by up to 100%, but can also enhance the mean IoU by up to +1.1, thereby achieving a better trade-off between segmentation quality and efficiency than existing methods. Moreover, our approach is adaptive during inference, meaning that the same model can be used for optimal efficiency or accuracy, depending on the application. Code is available at https://***/ALGM.
Flow based garment warping is an integral part of image-based virtual try-on networks. However, optimizing a single flow predicting network for simultaneous global boundary alignment and local texture preservation res...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Flow based garment warping is an integral part of image-based virtual try-on networks. However, optimizing a single flow predicting network for simultaneous global boundary alignment and local texture preservation results in sub-optimal flow fields. Moreover, dense flows are inherently not suited to handle intricate conditions like garment occlusion by body parts or by other garments. Forcing flows to handle the above issues results in various distortions like texture squeezing, and stretching. In this work, we propose a novel approach where we disentangle the global boundary alignment and local texture preserving tasks via our GlobalNet and localNet modules. A consistency loss is then employed between the two modules which harmonizes the local flows with the global boundary alignment. Additionally, we explicitly handle occlusions by predicting body-parts visibility mask, which is used to mask out the occluded regions in the warped garment. The masking prevents the localNet from predicting flows that distort texture to compensate for occlusions. We also introduce a novel regularization loss (NIPR), that defines a criteria to identify the regions in the warped garment where texture integrity is violated (squeezed or stretched). NIPR subsequently penalizes the flow in those regions to ensure regular and coherent warps that preserve the texture in local neighborhoods. Evaluation on a widely used virtual try-on dataset demonstrates strong performance of our network compared to the current SOTA methods.
The IoT is vulnerable to network attacks, and Intrusion Detection Systems (IDS) can provide high attack detection accuracy and are easily installed in IoT Servers. However, IDS are seldom evaluated in operational cond...
详细信息
ISBN:
(纸本)9798350300741;9798350300734
The IoT is vulnerable to network attacks, and Intrusion Detection Systems (IDS) can provide high attack detection accuracy and are easily installed in IoT Servers. However, IDS are seldom evaluated in operational conditions which are seriously impaired by attack overload. Thus a local Area Network test-bed is used to evaluate the impact of UDP Flood Attacks on an IoT Server, whose first line of defence is an accurate IDS. We show that attacks overload the multi-core Server and paralyze its IDS. Thus a mitigation scheme that detects attacks rapidly, and drops packets within milli-seconds after the attack begins, is proposed and experimentally evaluated.
Deepfake detection aims to contrast the spread of deep-generated media that undermines trust in online content. While existing methods focus on large and complex models, the need for real-time detection demands greate...
详细信息
ISBN:
(纸本)9798350365474
Deepfake detection aims to contrast the spread of deep-generated media that undermines trust in online content. While existing methods focus on large and complex models, the need for real-time detection demands greater efficiency. With this in mind, unlike previous work, we introduce a novel deepfake detection approach on images using Binary Neural networks (BNNs) for fast inference with minimal accuracy loss. Moreover, our method incorporates Fast Fourier Transform (FFT) and local Binary Pattern (LBP) as additional channel features to uncover manipulation traces in frequency and texture domains. Evaluations on COCOFake, DFFD, and CIFAKE datasets demonstrate our method's state-of-the-art performance in most scenarios with a significant efficiency gain of up to a 20x reduction in FLOPs during inference. Finally, by exploring BNNs in deepfake detection to balance accuracy and efficiency, this work paves the way for future research on efficient deepfake detection.
Pedestrian attribute recognition (PAR) poses a significant challenge but holds practical significance in various security applications, including surveillance. In the scope of the UPAR challenge, this paper introduces...
详细信息
ISBN:
(纸本)9798350370287;9798350370713
Pedestrian attribute recognition (PAR) poses a significant challenge but holds practical significance in various security applications, including surveillance. In the scope of the UPAR challenge, this paper introduces the Channel-Aware Cross-Fused Transformer-Style networks ((CT)-T-2-Net). This network effectively integrates two powerful transformer-style networks, namely the Swin Transformer (SwinT) and a customized variant of the vanilla vision transformer (EVA ViT). The aim is to capture both local and global aspects of an individual for precise attribute recognition. To facilitate the understanding of intricate relationships among channels, a channel-aware self-attention mechanism is devised and integrated into each SwinT block. Furthermore, the fusion of features from the two transformer-style networks is accomplished through cross-fusion, enabling each network to mutually amplify and boost the textural nuances present in the other. The efficacy of the proposed model has been demonstrated through its performance on three PAR benchmarks: PA100K, PETA, and the UPAR2024 private test. With respect to the PA100K benchmark, our approach has achieved state-of-the-art results when compared to models that do not employ any pre-training techniques. Our performance on the PETA dataset remains competitive, standing on par with other cutting-edge models. Notably, our model achieved runner-up performance on the UPAR2024-track-1 test set. Source code is available at https://***/caodoanh2001/upar_challenge.
IPFS is a content-addressed decentralized peer-to-peer data network, using the Bitswap protocol for exchanging data. The data exchange leaks the information to all neighbors, compromising a user's privacy. This pa...
详细信息
ISBN:
(纸本)9798350300741;9798350300734
IPFS is a content-addressed decentralized peer-to-peer data network, using the Bitswap protocol for exchanging data. The data exchange leaks the information to all neighbors, compromising a user's privacy. This paper investigates the suitability of forwarding with source obfuscation techniques for improving the privacy of the Bitswap protocol. The usage of forwarding can add plausible deniability and the source obfuscation provides additional protection against passive observers. First results showed that through trickle-spreading the source prediction could decrease to 40 %, at the cost of an increased content fetching time. However, assuming short distances between content provider and consumer the content fetching time can be faster even with the additional source obfuscation.
Face super-resolution (FSR) is a critical technique for enhancing low-resolution facial images and has significant implications for face-related tasks. However, existing FSR methods are limited by fixed up-sampling sc...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Face super-resolution (FSR) is a critical technique for enhancing low-resolution facial images and has significant implications for face-related tasks. However, existing FSR methods are limited by fixed up-sampling scales and sensitivity to input size variations. To address these limitations, this paper introduces an Arbitrary-Resolution and Arbitrary-Scale FSR method with implicit representation networks (ARASFSR), featuring three novel designs. First, ARASFSR employs 2D deep features, local relative coordinates, and up-sampling scale ratios to predict RGB values for each target pixel, allowing super-resolution at any up-sampling scale. Second, a local frequency estimation module captures high-frequency facial texture information to reduce the spectral bias effect. Lastly, a global coordinate modulation module guides FSR to leverage prior facial structure knowledge and achieve resolution adaptation effectively. Quantitative and qualitative evaluations demonstrate the robustness of ARASFSR over existing state-of-the-art methods while super-resolving facial images across various input sizes and up-sampling scales.
TCP slow start begins at a conservative bitrate but quickly ramps up to the available bandwidth. Unfortunately, current TCP implementations can either: 1) exit from slow start prematurely, which is especially detrimen...
详细信息
ISBN:
(纸本)9798350300741;9798350300734
TCP slow start begins at a conservative bitrate but quickly ramps up to the available bandwidth. Unfortunately, current TCP implementations can either: 1) exit from slow start prematurely, which is especially detrimental to utilization on satellite links, or 2) exit from slow start too late, causing unnecessary packet loss. We propose a novel technique to exit slow start while avoiding both premature and belated exits. We evaluate our approach over commercial satellite links - long, fat networks that pose challenges to determining the right slow start exit time. Preliminary results show a high success rate for picking appropriate exit points over satellite links, with potentially being applicable to other types of networks, more generally.
We present a novel multi-hop data dissemination protocol for wireless networks that minimizes the total energy consumption across an entire network by minimizing the transmission power at each hop. It is based on a ga...
详细信息
ISBN:
(纸本)9798350300741;9798350300734
We present a novel multi-hop data dissemination protocol for wireless networks that minimizes the total energy consumption across an entire network by minimizing the transmission power at each hop. It is based on a game-theoretic model, constructs a spanning tree topology in a decentralized manner, and is usable in practice. We evaluate the protocol via simulation and a pratical implementation on a testbed of 75 Raspberry Pis, demonstrating that a total energy reduction of up to 90% can be achieved compared to a simple broadcast protocol.
暂无评论