The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which improves search efficiency by explorin...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which improves search efficiency by exploring only a small number of architectures while also achieving a higher search accuracy. TNAS introduces an architecture tree and a binary operation tree, to factorize the search space and substantially reduce the exploration size. TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture. Impressively, TNAS finds the global optimal architecture on CIFAR-10 with test accuracy of 94.37% in four GPU hours in NAS-Bench-201. The average test accuracy is 94.35%, which outperforms the state-of-the-art. Code is available at: https://***/guochengqian/TNAS.
Transformer models have recently approached or even surpassed the performance of ConvNets on computervision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use ...
详细信息
ISBN:
(纸本)9781665487399
Transformer models have recently approached or even surpassed the performance of ConvNets on computervision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a significant challenge for the adaption of vision Transformers to domains where datasets with millions of labelled samples are not available. In this work, we bridge the gap between ConvNets and Transformers for Earth observation by self-supervised pre-training on large-scale unlabelled remote sensing data(1). We show that self-supervised pre-training yields latent task-agnostic representations that can be utilized for both land cover classification and segmentation tasks, where they significantly outperform the fully supervised baselines. Additionally, we find that subsequent fine-tuning of Transformers for specific downstream tasks performs on-par with commonly used ConvNet architectures. An ablation study further illustrates that the labelled dataset size can be reduced to one-tenth after self-supervised pre-training while still maintaining the performance of the fully supervised approach.
In many applications, such as burst photography and magnetic resonance imaging (MRI), multiple images are acquired to reduce the noise of the eventual reconstructed image. However, this leads to very high dimensional ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In many applications, such as burst photography and magnetic resonance imaging (MRI), multiple images are acquired to reduce the noise of the eventual reconstructed image. However, this leads to very high dimensional datasets which have redundant information across the various acquired images. In MRI, multiple images are acquired via multiple RF coil arrays in the scanner. Afterwards, coil compression is performed to convert the original set of coil images into a smaller set of virtual coil images to enable smaller datasets and faster computation time. However, traditional iterative coil compression methods are lossy and time-consuming. In this work, we propose a novel neural network-based coil compression method in pursuit of higher reconstruction accuracy and faster coil compression. Our learned compression method achieves up to 1.5x lower NRMSE and up to 10 times runtime speed compared to traditional methods on a benchmark test dataset.
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly. In this paper, we propose a novel multi-view subspace clustering method. Most existing methods suffer from two...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly. In this paper, we propose a novel multi-view subspace clustering method. Most existing methods suffer from two critical issues. First, they usually adopt a two-stage framework and isolate the processes of affinity learning, multi-view information fusion and clustering. Second, they assume the data lies in a linear subspace which may fail in practice as most real-world datasets may have non-linearity structures. To address the above issues, in this paper we propose a novel Enriched Robust Multi-View Kernel Subspace Clustering framework where the consensus affinity matrix is learned from both multi-view data and spectral clustering. Due to the objective and constraints which is difficult to optimize, we propose an iterative optimization method which is easy to implement and can yield closed solution in each step. Extensive experiments have validated the superiority of our method over state-of-the-art clustering methods.
In this paper, we present our latest work on Action Unit Detection, which is a part of the Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition [15]. Our proposed network is based on the IResnet100 [6]. Fir...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we present our latest work on Action Unit Detection, which is a part of the Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition [15]. Our proposed network is based on the IResnet100 [6]. First of all, We utilize feature pyramid networks (FPN) [25] and single stage headless (SSH) [29] to enlarge the receptive field and extract more facial texture features. Then we employ the ML-ROS data balancing [4] and the BCE Loss plus Multi-label Loss to solve the multi-label imbalance problem. We also use three different models as the base model to fine-tune the Aff-Wild2 dataset. The pre-train backbones are the AU detection model, expression model and face recognition model. Finally, we adopt an ensemble methodology to get the final result. Our f1 score achieved 49.82 on the AU test set and ranked second in this challenge with a very small difference from the first team 49.89.
In class incremental learning, discriminative models are trained to classify images while adapting to new instances and classes incrementally. Training a model to adapt to new classes without total access to previous ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In class incremental learning, discriminative models are trained to classify images while adapting to new instances and classes incrementally. Training a model to adapt to new classes without total access to previous class data, however, leads to the known problem of catastrophic forgetting of the previously learnt classes. To alleviate this problem, we show how we can build upon recent progress on contrastive learning methods. In particular, we develop an incremental learning approach for deep neural networks operating both at classification and representation level which alleviates forgetting and learns more general features for data classification. Experiments performed on several datasets demonstrate the superiority of the proposed method with respect to well known state-of-the-art methods.
Empirical robustness evaluation (RE) of deep learning models against adversarial perturbations involves solving non-trivial constrained optimization problems. Recent work has shown that these RE problems can be reliab...
详细信息
Dynamic vision Sensor (DVS) is an event sensor that asynchronously captures an event whenever there is a brightness change in the scene. However, due to the event sensor's high temporal resolution property, it is ...
详细信息
In this paper, we present a super-resolution-based video coding scheme that compresses video data by combining traditional hybrid video coding and Convolutional neural network-based video coding. During video encoding...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we present a super-resolution-based video coding scheme that compresses video data by combining traditional hybrid video coding and Convolutional neural network-based video coding. During video encoding, downsampling reduces the resolution of an original video in both horizontal and vertical directions to reduce original video data, and Convolutional neural networkbased super-resolution is employed after the decoding process to recover the resolution of the reconstructed video during upsampling. For core encoding and decoding processes, the latest video coding standard (i.e., VVC/H.266) is conducted. The experimental results show that the proposed method can provide efficient coding performance while maintaining good visual quality.
We present a self-supervised approach to recolorization of images from design-oriented domains. Our approach can recolor images based on image exemplars or target color palettes provided by a user. In contrast with pr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a self-supervised approach to recolorization of images from design-oriented domains. Our approach can recolor images based on image exemplars or target color palettes provided by a user. In contrast with previous approaches, our method can reproduce color palettes with luminance distributions that differ significantly from input, and our method is the first palette-based approach to distinguish between recolorings that match reflectance and those that match illumination, making it particularly well-suited to visualizing different aesthetic decisions in design applications. The key to our approach is first to learn latent representations for texture and color in a setting where self-supervision is especially straightforward, and then to learn a mapping to our color representation from input color palettes and scene illumination, which offers a more intuitive space for controlling and exploring recolorization.
暂无评论