Handwritten digit recognition is a significant challenge in the field of machine learning, particularly for patternrecognition and computervision applications. It has found applications in various areas, such as ide...
详细信息
The existing deep learning-based Versatile Video Coding (VVC) in-loop filtering (ILF) enhancement works mainly focus on learning the one-to-one mapping between the reconstructed and the original video frame, ignoring ...
详细信息
ISBN:
(纸本)9781665448994
The existing deep learning-based Versatile Video Coding (VVC) in-loop filtering (ILF) enhancement works mainly focus on learning the one-to-one mapping between the reconstructed and the original video frame, ignoring the potential resources at encoder and decoder. This work proposes a deep learning-based Spatial-Temporal In-Loop filtering (STILF) that takes advantage of the coding information to improve VVC in-loop filtering. Each CTU is filtered by VVC default in-loop filtering, self-enhancement Convolutional neural network (CNN) with CU map (SEC), and the reference-based enhancement CNN with the optical flow (REO). Bits indicating ILF mode are encoded under CABAC regular mode. Experimental results show that 3.78%, 6.34%, 6%, and 4.64% BD-rate reductions are obtained under All Intra, Low Delay P, Low Delay B, and Random Access configurations, respectively.
Majority of the perception methods in robotics require depth information provided by RGB-D cameras. However, standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light. ...
详细信息
ISBN:
(纸本)9781665445092
Majority of the perception methods in robotics require depth information provided by RGB-D cameras. However, standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light. In this paper, we introduce a new approach for depth completion of transparent objects from a single RGB-D image. Key to our approach is a local implicit neural representation built on ray-voxel pairs that allows our method to generalize to unseen objects and achieve fast inference speed. Based on this representation, we present a novel framework that can complete missing depth given noisy RGBD input. We further improve the depth estimation iteratively using a self-correcting refinement model. To train the whole pipeline, we build a large scale synthetic dataset with transparent objects. Experiments demonstrate that our method performs significantly better than the current stateof-the-art methods on both synthetic and real world data. In addition, our approach improves the inference speed by a factor of 20 compared to the previous best method, ClearGrasp [43].
Car recognition systems have been gaining popularity in recent years due to their ability to identify the make, model, and year of cars from images. In this paper, we present the building of accurate car recognition s...
详细信息
Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances. We introduce and...
详细信息
ISBN:
(纸本)9781665445092
Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances. We introduce and study a method aimed at characterizing and explaining failures by identifying visual attributes whose presence or absence results in poor performance. In distinction to previous work that relies upon crowdsourced labels for visual attributes, we leverage the representation of a separate robust model to extract interpretable features and then harness these features to identify failure modes. We further propose a visualization method aimed at enabling humans to understand the meaning encoded in such features and we test the comprehensibility of the features. An evaluation of the methods on the ImageNet dataset demonstrates that: (i) the proposed workflow is effective for discovering important failure modes, (ii) the visualization techniques help humans to understand the extracted features, and (iii) the extracted insights can assist engineers with error analysis and debugging.
To calculate the model accuracy on a computervision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requ...
详细信息
ISBN:
(纸本)9781665445092
To calculate the model accuracy on a computervision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many real-world scenarios involve unlabeled test data, rendering common model evaluation methods infeasible. We investigate this important and under-explored problem, Automatic model Evaluation (AutoEval). Specifically, given a labeled training set and a classifier, we aim to estimate the classification accuracy on unlabeled test datasets. We construct a meta-dataset: a dataset comprised of datasets generated from the original images via various transformations such as rotation, background substitution, foreground scaling, etc. As the classification accuracy of the model on each sample (dataset) is known from the original dataset labels, our task can be solved via regression. Using the feature statistics to represent the distribution of a sample dataset, we can train regression models (e.g., a regression neural network) to predict model performance. Using synthetic meta-dataset and real-world datasets in training and testing, respectively, we report a reasonable and promising prediction of the model accuracy. We also provide insights into the application scope, limitation, and potential future direction of AutoEval.
Deepfakes raised serious concerns on the authenticity of visual contents. Prior works revealed the possibility to disrupt deepfakes by adding adversarial perturbations to the source data, but we argue that the threat ...
详细信息
ISBN:
(纸本)9781665445092
Deepfakes raised serious concerns on the authenticity of visual contents. Prior works revealed the possibility to disrupt deepfakes by adding adversarial perturbations to the source data, but we argue that the threat has not been eliminated yet. This paper presents MagDR, a mask-guided detection and reconstruction pipeline for defending deepfakes from adversarial attacks. MagDR starts with a detection module that defines a few criteria to judge the abnormality of the output of deepfakes, and then uses it to guide a learnable reconstruction procedure. Adaptive masks are extracted to capture the change in local facial regions. In experiments, MagDR defends three main tasks of deepfakes, and the learned reconstruction pipeline transfers across input data, showing promising performance in defending both black-box and white-box attacks.
Efficient 3D space sampling to represent an underlying 3D object/scene is essential for 3D vision, robotics, and beyond. A standard approach is to explicitly sample a dense collection of views and formulate it as a vi...
详细信息
ISBN:
(纸本)9781665445092
Efficient 3D space sampling to represent an underlying 3D object/scene is essential for 3D vision, robotics, and beyond. A standard approach is to explicitly sample a dense collection of views and formulate it as a view selection problem, or, more generally, a set cover problem. In this paper, we introduce a novel approach that avoids dense view sampling. The key idea is to learn a view prediction network and a trainable aggregation module that takes the predicted views as input and outputs an approximation of their generic scores (e.g., surface coverage, viewing angle from surface normals). This methodology allows us to turn the set cover problem (or multi-view representation optimization) into a continuous optimization problem. We then explain how to effectively solve the induced optimization problem using continuation, i.e., aggregating a hierarchy of smoothed scoring modules. Experimental results show that our approach arrives at similar or better solutions with about 10 x speed up in running time, comparing with the standard methods.
The recent deep learning techniques of the last decade are opening new applications and innovations for numerous diverse fields. In this research, we study the problem of wildlife species recognition based on camera t...
详细信息
Ear recognition is an example of a biometric system that uses human biological traits for recognition. This kind of recognition has been recently examined due to its distinctive properties such as its invariant shape....
详细信息
暂无评论