Image inpainting (a.k.a. image completion) allows us to remove unexpected foreground objects from an observed image and to restore the removed region with background pixels. The performance of image inpainting is impr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Image inpainting (a.k.a. image completion) allows us to remove unexpected foreground objects from an observed image and to restore the removed region with background pixels. The performance of image inpainting is improved by auxiliary cues such as edge boundaries and segmentation regions. As a new auxiliary cue, this paper focuses on a depth image that is estimated from an input RGB image by monocular depth estimation. In the depth image, boundaries between different objects (e.g., objects located in different distances) with similar pixel values might be available, while those boundaries are difficult to be detected by edge detection and segmentation. Our proposed method employs those boundaries in the edge and depth images as auxiliary cues. Experiments demonstrate that our proposed method augmented by the depth image outperforms its baseline quantitatively (i.e., 1.17dB and 0.74dB PSNR gains on the Paris-StreetView and Places datasets, respectively) and qualitatively.
"Big Data" analysis is an emerging topic in computervision and patternrecognition. As one example problem of big data, we study semantic age labels and facial aging pattern analysis on a large database. In...
详细信息
ISBN:
(纸本)9780769549903
"Big Data" analysis is an emerging topic in computervision and patternrecognition. As one example problem of big data, we study semantic age labels and facial aging pattern analysis on a large database. In aging analysis, one of the great challenges is the lack of a large number of face images with ground truth age labels. Unlike many other example-based recognition problems where human annotations can be used as the ground truth labels for both training and testing, it is quite difficult to label the exact ages in face images by human annotators. An alternative is to exploit the unlabeled ages to enhance the age estimation performance. However, it is unclear whether the face images with unlabeled ages can be used or not for age estimation, and how to use the unlabeled data. In this paper, we study the two problems comprehensively under two paradigms: the semi-supervised learning and unsupervised learning for aging pattern analysis. We emphasize the importance of using ground truth age labels and a large database in order to derive a meaningful measure in the context of big data. Our study can make an impact on collecting aging patterns that is very expensive and time consuming in practice.
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aestheti...
详细信息
ISBN:
(纸本)9781665448994
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aesthetic appeal. In this paper, we propose an automated method to render brand logos in the coloring style of branding material such as movie posters. For this, we adopt a photo-realistic neural style transfer method using movie posters as the style source. We propose a color-based image segmentation and matching method to assign style segments to logo segments. Using these, we render the well-known Warner Bros. logo in the coloring style of 141 movie posters. We also present survey results where 287 participants rate the machine-stylized logos for their representativeness and visual appeal.
Reduced precision hardware-based matrix multiplication accelerators are commonly employed to reduce power consumption of neural network inference. Multiplier designs used in such accelerators possess an interesting pr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Reduced precision hardware-based matrix multiplication accelerators are commonly employed to reduce power consumption of neural network inference. Multiplier designs used in such accelerators possess an interesting property: When the same bit is 0 for two consecutive compute cycles, the multiplier consumes less power. In this paper we show that this effect can be used to reduce power consumption of neural networks by simulating low bit-width quantization on higher bit-width hardware. We show that simulating 4 bit quantization on 8 bit hardware can yield up to 17% relative reduction in power consumption on commonly used networks. Furthermore, we show that in this context, bit operations (BOPs) are a good proxy for power efficiency, and that learning mixed-precision configurations that target lower BOPs can achieve better trade-offs between accuracy and power efficiency.
In this work, we address the issues of the missing modalities that have arisen from the Visual Question Answer-Difference prediction task and find a novel method to solve the task at hand. We address the missing modal...
详细信息
ISBN:
(纸本)9781665448994
In this work, we address the issues of the missing modalities that have arisen from the Visual Question Answer-Difference prediction task and find a novel method to solve the task at hand. We address the missing modality-the ground truth answers-that are not present at test time and use a privileged knowledge distillation scheme to deal with the issue of the missing modality. In order to efficiently do so, we first introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline, then use a combination of models to distill knowledge to a target network (student) that only takes the image/question pair as its inputs. We experiment our models on the VizWiz and VQA-V2 Answer Difference datasets and show through extensive experimentation and ablation the performance of our method and a diverse possibility for future research.
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement tec...
详细信息
ISBN:
(纸本)9781665487399
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement technique, ResFPN (Residual Feature Pyramid Network), as a generic preprocessing method agnostic to the recognition pipeline. A bottom-up pyramidal architecture using the novel Structure Detection block (SDBlock) facilitates extraction of veins of varied widths. Using a feature aggregation module (FAM), we combine these vein-structures, and train the proposed ResFPN for detection of veins across scales. With enhanced presentations, our experiments indicate a reduction upto 5% in the average recognition errors for commonly used recognition pipeline over two publicly available datasets. These improvements are persistent even in cross-dataset scenario where the dataset used to train the ResFPN is different from the one used for recognition.
With the emergence of fashion recommendation, many researchers have attempted to recommend fashion items that fit consumers' tastes. However, few have looked into fashion outfits as a whole when making recommendat...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
With the emergence of fashion recommendation, many researchers have attempted to recommend fashion items that fit consumers' tastes. However, few have looked into fashion outfits as a whole when making recommendations. In this paper, we propose a neural network that learns one's fashion taste and predicts whether an individual likes a fashion outfit. To improve learning, we also develop a fashion outfit negative sampling scheme to sample fashion outfits that are different enough. With experiments on the collected Polyvore dataset, we find that using complete images offashion outfits performs well when learning individuals' tastes toward fashion outfits. Our proposed negative sampling scheme also improves the model's performance significantly, compared to random negative sampling.
In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data...
详细信息
ISBN:
(纸本)9781665448994
In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method that effectively utilizes both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show that MVFusenet scales well to large operating ranges while maintaining real-time performance.
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variabl...
详细信息
ISBN:
(纸本)9781665487399
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variables as a regularization term. The intuition behind it was to ensure high likelihood latent variables that produce the closest restoration. However, high-likelihood latent variables may generate unrealistic samples as we show in our experiments. We therefore propose a solver to directly produce high-likelihood reconstructions. We hypothesize that our approach could make generative flows a general purpose solver for inverse problems. Furthermore, we propose 1x1 coupling functions to introduce permutations in a generative flow. It has the advantage that its inverse does not require to be calculated in the generation process. Finally, we evaluate our method for denoising, deblurring, inpainting, and colorization. We observe a compelling improvement of our method over prior works.
暂无评论