Machine learning is vulnerable to adversarial manipulation. Previous literature demonstrated that at the training stage attackers can manipulate data [14] and data sampling procedures [29] to control model behaviour. ...
详细信息
ISBN:
(纸本)9798350301298
Machine learning is vulnerable to adversarial manipulation. Previous literature demonstrated that at the training stage attackers can manipulate data [14] and data sampling procedures [29] to control model behaviour. A common attack goal is to plant backdoors i.e. force the victim model to learn to recognise a trigger known only by the adversary. In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture that others will reuse unknowingly. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch. We formalise the main construction principles behind architectural backdoors, such as a connection between the input and the output, and describe some possible protections against them. We evaluate our attacks on computervision benchmarks of different scales and demonstrate the underlying vulnerability is pervasive in a variety of common training settings.
Image alignment, also known as image registration, is a critical block used in many computervision problems. One of the key factors in alignment is efficiency, as inefficient aligners can cause significant overhead t...
详细信息
ISBN:
(纸本)9781665487399
Image alignment, also known as image registration, is a critical block used in many computervision problems. One of the key factors in alignment is efficiency, as inefficient aligners can cause significant overhead to the overall problem. In the literature, there are some blocks that appear to do the alignment operation, although most do not focus on efficiency. Therefore, an image alignment block which can both work in time and/or space and can work on edge devices would be beneficial for almost all networks dealing with multiple images. Given its wide usage and importance, we propose an efficient, cross-attention-based, multi-purpose image alignment block (XABA) suitable to work within edge devices. Using cross-attention, we exploit the relationships between features extracted from images. To make cross-attention feasible for real-time image alignment problems and handle large motions, we provide a pyramidal block based cross-attention scheme. This also captures local relationships besides reducing memory requirements and number of operations. Efficient XABA models achieve real-time requirements of running above 20 FPS performance on NVIDIA Jetson Xavier with 30W power consumption compared to other powerful computers. Used as a sub-block in a larger network, XABA also improves multi-image super-resolution network performance in comparison to other alignment methods.
computervision applications have heavily relied on the linear combination of Lambertian diffuse and microfacet specular reflection models for representing reflected radiance, which turns out to be physically incompat...
详细信息
ISBN:
(纸本)9798350301298
computervision applications have heavily relied on the linear combination of Lambertian diffuse and microfacet specular reflection models for representing reflected radiance, which turns out to be physically incompatible and limited in applicability. In this paper, we derive a novel analytical reflectance model, which we refer to as Fresnel Microfacet BRDF model, that is physically accurate and generalizes to various real-world surfaces. Our key idea is to model the Fresnel reflection and transmission of the surface microgeometry with a collection of oriented mirror facets, both for body and surface reflections. We carefully derive the Fresnel reflection and transmission for each microfacet as well as the light transport between them in the subsurface. This physically-grounded modeling also allows us to express the polarimetric behavior of reflected light in addition to its radiometric behavior. That is, FMBRDF unifies not only body and surface reflections but also light reflection in radiometry and polarization and represents them in a single model. Experimental results demonstrate its effectiveness in accuracy, expressive power, image-based estimation, and geometry recovery.
Growing abundance of multi-dimensional data creates a need for efficient data exploration and analysis. In this paper, we address this need by tackling the task of tensor dataset visualization and clustering, as tenso...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Growing abundance of multi-dimensional data creates a need for efficient data exploration and analysis. In this paper, we address this need by tackling the task of tensor dataset visualization and clustering, as tensors are a natural form of multi-dimensional data. Previous work has shown that representing individual tensor modes via respective linear subspaces and unifying them on the product Grassmann manifold (PGM) is an effective and memory-efficient way of representation. However, such representation may lead to loss of valuable temporal information. To address this issue, we model temporal tensor modes with a Hankel-like matrix, preserving sequence information and encoding it with a linear subspace, fully compatible with PGM. Unifying regular tensor modes and Hankel-like representation of regular tensor modes then enriches representation on the PGM, with minimal increase in computational complexity. By relying on geodesic distance on the manifold, we facilitate analysis of multi-dimensional datasets in two ways: 1) by enabling straightforward visualizations using algorithms such as t-SNE;and 2) by fostering clustering of data using distance- or similarity-based methods such as spectral clustering. We evaluate our approach on hand gesture and action recognition datasets as exemplars of temporal tensor datasets.
作者:
Luo, XiongbiaoXiamen Univ
Dept Comp Sci & Technol Xiamen Peoples R China Xiamen Univ
Natl Inst Data Sci Hlth & Med Xiamen 361102 Peoples R China
Stochastic filtering is widely used to deal with nonlinear optimization problems such as 3-D and visual tracking in various computervision and augmented reality applications. Many current methods suffer from an imbal...
详细信息
ISBN:
(纸本)9798350301298
Stochastic filtering is widely used to deal with nonlinear optimization problems such as 3-D and visual tracking in various computervision and augmented reality applications. Many current methods suffer from an imbalance between exploration and exploitation due to their particle degeneracy and impoverishment, resulting in local optimums. To address this imbalance, this work proposes a new constrained evolutionary diffusion filter for nonlinear optimization. Specifically, this filter develops spatial state constraints and adaptive history-recall differential evolution embedded evolutionary stochastic diffusion instead of sequential resampling to resolve the degeneracy and impoverishment problem. With application to monocular endoscope 3-D tracking, the experimental results show that the proposed filtering significantly improves the balance between exploration and exploitation and certainly works better than recent 3-D tracking methods. Particularly, the surgical tracking error was reduced from 4.03 mm to 2.59 mm.
This study explores the psychological significance of the commonly used visual metaphor 'seeing the big picture' and examines whether and how it leads to positive experiences in real-life situations. To elucid...
详细信息
Measuring the perceptual quality of images automatically is an essential task in the area of computervision, as degradations on image quality can exist in many processes from image acquisition, transmission to enhanc...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Measuring the perceptual quality of images automatically is an essential task in the area of computervision, as degradations on image quality can exist in many processes from image acquisition, transmission to enhancing. Many Image Quality Assessment(IQA) algorithms have been designed to tackle this problem. However, it still remains unsettled due to the various types of image distortions and the lack of large-scale human-rated datasets. In this paper, we propose a novel algorithm based on the Swin Transformer [31] with fused features from multiple stages, which aggregates information from both local and global features to better predict the quality. To address the issues of small-scale datasets, relative rankings of images have been taken into account together with regression loss to simultaneously optimize the model. Furthermore, effective data augmentation strategies are also used to improve the performance. In comparisons with previous works, experiments are carried out on two standard IQA datasets and a challenge dataset. The results demonstrate the effectiveness of our work. The proposed method outperforms other methods on standard datasets and ranks 2nd in the no-reference track of NTIRE 2022 Perceptual Image Quality Assessment Challenge [53]. It verifies that our method is promising in solving diverse IQA problems and thus can be used to real-word applications.
Deep learning has revolutionized artificial intelligence and enabled breakthroughs across various domains. However, as deep learning models continue to grow in scale and complexity, optimizing their hyperparameters fo...
详细信息
ISBN:
(纸本)9798350307443
Deep learning has revolutionized artificial intelligence and enabled breakthroughs across various domains. However, as deep learning models continue to grow in scale and complexity, optimizing their hyperparameters for efficient resource utilization becomes a critical challenge. Traditional optimization techniques often assume smooth and continuous loss functions, limiting their effectiveness in this context. In this work, we propose a novel data-driven approach to hyperparameter optimization using a convex quadrature surrogate. By leveraging a set of sampled hyperparameters and their corresponding performance, our method fits a multivariate quadratic surrogate model to identify the optimal hyperparameters. We demonstrate the practicality and effectiveness of our approach by improving the efficiency and performance of various hyperparameter strategies on both closed and open set benchmarks across diverse vision and tabular datasets. Additionally, we showcase its applicability in automatic target recognition tasks. This research contributes to the broader objective of resource-efficient deep learning for computervision, fostering advancements in model efficiency, computational memory constraints, and latency considerations. Code available here.
The purpose of image inpainting is to recover scratches and damaged areas using context information from remaining parts. In recent years, thanks to the resurgence of convolutional neural networks (CNNs), image inpain...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The purpose of image inpainting is to recover scratches and damaged areas using context information from remaining parts. In recent years, thanks to the resurgence of convolutional neural networks (CNNs), image inpainting task has made great breakthroughs. However, most of the work consider insufficient types of mask, and their performance will drop dramatically when encountering unseen masks. To combat these challenges, we propose a simple yet general method to solve this problem based on the LaMa image inpainting framework [35], dubbed GLaMa. Our proposed GLaMa can better capture different types of missing information by using more types of masks. By incorporating more degraded images in the training phase, we can expect to enhance the robustness of the model with respect to various masks. In order to yield more reasonable results, we further introduce a frequency-based loss in addition to the traditional spatial reconstruction loss and adversarial loss. In particular, we introduce an effective reconstruction loss both in the spatial and frequency domain to reduce the chessboard effect and ripples in the reconstructed image. Extensive experiments demonstrate that our method can boost the performance over the original LaMa method for each type of mask on FFHQ [18], ImageNet [7], Places2 [42] and WikiArt [28] dataset. The proposed GLaMa was ranked first in terms of PSNR, LPIPS [39] and SSIM [34] in the NTIRE 2022 Image Inpainting Challenge Track 1 Unsupervised [27].
Recently deep learning techniques have significantly advanced image super-resolution (SR). Due to the black-box nature, quantifying reconstruction uncertainty is crucial when employing these deep SR networks. Previous...
详细信息
ISBN:
(纸本)9798350301298
Recently deep learning techniques have significantly advanced image super-resolution (SR). Due to the black-box nature, quantifying reconstruction uncertainty is crucial when employing these deep SR networks. Previous approaches for SR uncertainty estimation mostly focus on capturing pixel-wise uncertainty in the spatial domain. SR uncertainty in the frequency domain which is highly related to image SR is seldom explored. In this paper, we propose to quantify spectral Bayesian uncertainty in image SR. To achieve this, a Dual-Domain Learning (DDL) framework is first proposed. Combined with Bayesian approaches, the DDL model is able to estimate spectral uncertainty accurately, enabling a reliability assessment for high frequencies reasoning from the frequency domain perspective. Extensive experiments under non-ideal premises are conducted and demonstrate the effectiveness of the proposed spectral uncertainty. Furthermore, we propose a novel Spectral Uncertainty based Decoupled Frequency (SUDF) training scheme for perceptual SR. Experimental results show the proposed SUDF can evidently boost perceptual quality of SR results without sacrificing much pixel accuracy.
暂无评论