Continual learning (CL) has become one of the most active research venues within the artificial intelligence community in recent years. Given the significant amount of attention paid to continual learning, the need fo...
详细信息
ISBN:
(纸本)9781665448994
Continual learning (CL) has become one of the most active research venues within the artificial intelligence community in recent years. Given the significant amount of attention paid to continual learning, the need for a library that facilitates both research and development in this field is more visible than ever. However, CL algorithms' codes are currently scattered over isolated repositories written with different frameworks, making it difficult for researchers and practitioners to work with various CL algorithms and benchmarks using the same interface. In this paper, we introduce CL-Gym, a full-featured continual learning library that overcomes this challenge and accelerates the research and development. In addition to the necessary infrastructure for running end-to-end continual learning experiments, CL-Gym includes benchmarks for various CL scenarios and several state-of-the-art CL algorithms. In this paper, we present the architecture, design philosophies, and technical details behind CL-Gym (1).
Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration [66]. We show that, for high-level image recognition tasks, we can fu...
详细信息
ISBN:
(纸本)9781665448994
Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration [66]. We show that, for high-level image recognition tasks, we can further reconstruct "realistic" images of each category by leveraging intrinsic Batch Normalization (BN) statistics without any training data. Inspired by the popular VAE/GAN methods, we regard the zero-shot optimization process of synthetic images as generative modeling to match the distribution of BN statistics. The generated images serve as a calibration set for the following zero-shot network quantizations. Our method meets the needs for quantizing models based on sensitive information, e.g., due to privacy concerns, no data is available. Extensive experiments on benchmark datasets show that, with the help of generated data, our approach consistently outperforms existing data-free quantization methods.
Digital Memes have been widely utilized in people’s daily lives over social media platforms. Composed of images and descriptive texts, memes are often distributed with the flair of sarcasm or humor, yet can also spre...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Digital Memes have been widely utilized in people’s daily lives over social media platforms. Composed of images and descriptive texts, memes are often distributed with the flair of sarcasm or humor, yet can also spread harmful content or biases from social and cultural factors. Aside from mainstream tasks such as meme generation and classification, generating explanations for memes has become more vital and poses challenges in avoiding propagating already embedded biases. Our work studied whether recent advanced vision Language Models (VL models) can fairly explain meme contents from different domains/topics, contributing to a unified benchmark for meme explanation. With the dataset, we semi-automatically and manually evaluate the quality of VL model-generated explanations, identifying the major categories of biases in meme explanations.
Event cameras are robust neuromorphic visual sensors, which communicate transients in luminance as events. Current paradigm for image reconstruction from event data relies on direct optimization of artificial Convolut...
详细信息
ISBN:
(纸本)9781665448994
Event cameras are robust neuromorphic visual sensors, which communicate transients in luminance as events. Current paradigm for image reconstruction from event data relies on direct optimization of artificial Convolutional Neural Networks (CNNs). Here we proposed a two-phase neural network, which comprises a CNN, optimized for Laplacian prediction followed by a Spiking Neural Network (SNN) optimized for Poisson integration. By introducing Laplacian prediction into the pipeline, we provide image reconstruction with a network comprising only 200 parameters. We converted the CNN to SNN, providing a full neuromorphic implementation. We further optimized the network with Mish activation and a novel convoluted CNN design, proposing a hybrid of spiking and artificial neural network with < 100 parameters. Models were evaluated on both N-MNIST and N-Caltech101 datasets.
We address the problem of unsupervised classification of players in a team sport according to their team affiliation, when jersey colours and design are not known a priori. We adopt a contrastive learning approach in ...
详细信息
ISBN:
(纸本)9781665448994
We address the problem of unsupervised classification of players in a team sport according to their team affiliation, when jersey colours and design are not known a priori. We adopt a contrastive learning approach in which an embedding network learns to maximize the distance between representations of players on different teams relative to players on the same team, in a purely unsupervised fashion, without any labelled data. We evaluate the approach using a new hockey dataset and find that it outperforms prior unsupervised approaches by a substantial margin, particularly for real-time application when only a small number of frames are available for unsupervised learning before team assignments must be made. Remarkably, we show that our contrastive method achieves 94% accuracy after unsupervised training on only a single frame, with accuracy rising to 97% within 500 frames (17 seconds of game time). We further demonstrate how accurate team classification allows accurate team-conditional heat maps of player positioning to be computed.
In this paper, we propose an efficient image compression framework that is optimized for subjective quality. Our framework is mainly based on the NLAIC (NonLocal Attention Optimized Image Coding) model which applied V...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose an efficient image compression framework that is optimized for subjective quality. Our framework is mainly based on the NLAIC (NonLocal Attention Optimized Image Coding) model which applied Variational Autoencoder (VAE) and non-local attention module to end-to-end image compression. This work makes two major contributions to the NLAIC framework. First, our models are optimized for subjective-friendly loss functions rather than conventional MSE (Mean Squared Error) or MS-SSIM (Multiscale Structural Similarity) which was widely used in previous works. Second, we introduce block-based inference mechanism to reduce the running memory consumption of the image compression network, and suggest a partial post-processing step to alleviate block artifacts caused by block-based inference in a lightweight computational fashion. Experiments have proved that the image reconstructed by our method can preserve more texture details than models trained for optimal MSE or MS-SSIM and also present capability for high-throughput decoding.
Autonomous driving systems need to handle complex scenarios such as lane following, avoiding collisions, taking turns, and responding to traffic signals. In recent years, approaches based on end-to-end behavioral clon...
详细信息
ISBN:
(纸本)9781665448994
Autonomous driving systems need to handle complex scenarios such as lane following, avoiding collisions, taking turns, and responding to traffic signals. In recent years, approaches based on end-to-end behavioral cloning have demonstrated remarkable performance in point-to-point navigational scenarios, using a realistic simulator and standard benchmarks. Offline imitation learning is readily available, as it does not require expensive hand annotation or interaction with the target environment, but it is difficult to obtain a reliable system. In addition, existing methods have not specifically addressed the learning of reaction for traffic lights, which are a rare occurrence in the training datasets. Inspired by the previous work on multi-task learning and attention modeling, we propose a novel multi-task attention-aware network in the conditional imitation learning (CIL) framework. This does not only improve the success rate of standard benchmarks, but also the ability to react to traffic lights, which we show with standard benchmarks.
We propose a pipeline that leverages Stable Diffusion to improve inpainting results in the context of defurnishing—the removal of furniture items from indoor panorama images. Specifically, we illustrate how increased...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
We propose a pipeline that leverages Stable Diffusion to improve inpainting results in the context of defurnishing—the removal of furniture items from indoor panorama images. Specifically, we illustrate how increased context, domain-specific model fine-tuning, and improved image blending can produce high-fidelity inpaints that are geometrically plausible without needing to rely on room layout estimation. We demonstrate qualitative and quantitative improvements over other furniture removal techniques.
The existing deep learning-based Versatile Video Coding (VVC) in-loop filtering (ILF) enhancement works mainly focus on learning the one-to-one mapping between the reconstructed and the original video frame, ignoring ...
详细信息
ISBN:
(纸本)9781665448994
The existing deep learning-based Versatile Video Coding (VVC) in-loop filtering (ILF) enhancement works mainly focus on learning the one-to-one mapping between the reconstructed and the original video frame, ignoring the potential resources at encoder and decoder. This work proposes a deep learning-based Spatial-Temporal In-Loop filtering (STILF) that takes advantage of the coding information to improve VVC in-loop filtering. Each CTU is filtered by VVC default in-loop filtering, self-enhancement Convolutional neural network (CNN) with CU map (SEC), and the reference-based enhancement CNN with the optical flow (REO). Bits indicating ILF mode are encoded under CABAC regular mode. Experimental results show that 3.78%, 6.34%, 6%, and 4.64% BD-rate reductions are obtained under All Intra, Low Delay P, Low Delay B, and Random Access configurations, respectively.
Esports is a fastest-growing new field with a largely online-presence, and is creating a demand for automatic domain-specific captioning tools. However, at the current time, there are few approaches that tackle the es...
详细信息
ISBN:
(纸本)9781665448994
Esports is a fastest-growing new field with a largely online-presence, and is creating a demand for automatic domain-specific captioning tools. However, at the current time, there are few approaches that tackle the esports video description problem. In this work, we propose a large-scale dataset for esports video description, focusing on the popular game "League of Legends". The dataset, which we call LoL-V2T, is the largest video description dataset in the video game domain, and includes 9,723 clips with 62,677 captions. This new dataset presents multiple new video captioning challenges such as large amounts of domain-specific vocabulary, subtle motions with large importance, and a temporal gap between most captions and the events that occurred. In order to tackle the issue of vocabulary, we propose a masking the domain-specific words and provide additional annotations for this. In our results, we show that the dataset poses a challenge to existing video captioning approaches, and the masking can significantly improve performance. Our dataset and code is publicly available(1).
暂无评论