Cross-view image generation has been recently proposed to generate images of one view from another dramatically different view. In this paper, we investigate third-person (exocentric) view to first-person (egocentric)...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Cross-view image generation has been recently proposed to generate images of one view from another dramatically different view. In this paper, we investigate third-person (exocentric) view to first-person (egocentric) view image generation. This is a challenging task since egocentric view sometimes is remarkably different from exocentric view. Thus, transforming the appearances across the two views is a non-trivial task. To this end, we propose a novel Parallel Generative Adversarial Network (P-GAN) with a novel cross-cycle loss to learn the shared information for generating egocentric images from exocentric view. We also incorporate a novel contextual feature loss in the learning procedure to capture the contextual information in images. Extensive experiments on the Exo-Ego datasets [5] show that our model outperforms the state-of-the-art approaches.
We propose a flexible person generation framework called Dressing in Order (DiOr), which supports 2D pose transfer, virtual try-on, and several fashion editing tasks. The key to DiOr is a novel recurrent generation pi...
详细信息
ISBN:
(纸本)9781665448994
We propose a flexible person generation framework called Dressing in Order (DiOr), which supports 2D pose transfer, virtual try-on, and several fashion editing tasks. The key to DiOr is a novel recurrent generation pipeline to sequentially put garments on a person, so that trying on the same garments in different orders will result in different looks. Our system can produce dressing effects not achievable by existing work, including different interactions of garments (e.g., wearing a top tucked into the bottom or over it), as well as layering of multiple garments of the same type (e.g., jacket over shirt over t-shirt). DiOr explicitly encodes the shape and texture of each garment, enabling these elements to be edited separately. Extensive evaluations show that DiOr outperforms other recent methods like ADGAN [18] in terms of output quality, and handles a wide range of editing functions for which there is no direct supervision.
We define a new representation for immersed surfaces in R-3 by combining the SRNF and the induced surface metric. Using the L-2 metric on the space of SRNFs and the DeWitt metric on the space of surface metrics, we ob...
详细信息
ISBN:
(纸本)9781728193601
We define a new representation for immersed surfaces in R-3 by combining the SRNF and the induced surface metric. Using the L-2 metric on the space of SRNFs and the DeWitt metric on the space of surface metrics, we obtain a 3-parameter family of metrics that corresponds to the family of "elastic metrics" proposed by Jermyn et al. in [19] on the space of immersed surfaces. Similar to the original SRNF representation this new representation results in an extrinsic distance function on the space of immersed surfaces that is easy to compute as it is given by an explicit formula. In addition to avoiding the degeneracy of the SRNF it allows for a data-driven choice of the parameters of the metric, while still providing for fast and accurate registration of surfaces.
"Big Data" analysis is an emerging topic in computervision and patternrecognition. As one example problem of big data, we study semantic age labels and facial aging pattern analysis on a large database. In...
详细信息
ISBN:
(纸本)9780769549903
"Big Data" analysis is an emerging topic in computervision and patternrecognition. As one example problem of big data, we study semantic age labels and facial aging pattern analysis on a large database. In aging analysis, one of the great challenges is the lack of a large number of face images with ground truth age labels. Unlike many other example-based recognition problems where human annotations can be used as the ground truth labels for both training and testing, it is quite difficult to label the exact ages in face images by human annotators. An alternative is to exploit the unlabeled ages to enhance the age estimation performance. However, it is unclear whether the face images with unlabeled ages can be used or not for age estimation, and how to use the unlabeled data. In this paper, we study the two problems comprehensively under two paradigms: the semi-supervised learning and unsupervised learning for aging pattern analysis. We emphasize the importance of using ground truth age labels and a large database in order to derive a meaningful measure in the context of big data. Our study can make an impact on collecting aging patterns that is very expensive and time consuming in practice.
Image inpainting (a.k.a. image completion) allows us to remove unexpected foreground objects from an observed image and to restore the removed region with background pixels. The performance of image inpainting is impr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Image inpainting (a.k.a. image completion) allows us to remove unexpected foreground objects from an observed image and to restore the removed region with background pixels. The performance of image inpainting is improved by auxiliary cues such as edge boundaries and segmentation regions. As a new auxiliary cue, this paper focuses on a depth image that is estimated from an input RGB image by monocular depth estimation. In the depth image, boundaries between different objects (e.g., objects located in different distances) with similar pixel values might be available, while those boundaries are difficult to be detected by edge detection and segmentation. Our proposed method employs those boundaries in the edge and depth images as auxiliary cues. Experiments demonstrate that our proposed method augmented by the depth image outperforms its baseline quantitatively (i.e., 1.17dB and 0.74dB PSNR gains on the Paris-StreetView and Places datasets, respectively) and qualitatively.
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect use...
详细信息
ISBN:
(纸本)9781665448994
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect users' privacy. This paper demonstrated a people segmentation framework called CE-PeopleSeg, which employed an efficient segmentation method, structural pruning, and dynamic frame skipping techniques, leading to a fast inference speed on CPU. Our extensive experiments show that the proposed CE-PeopleSeg can achieve a high prediction mIoU of 87.9% on Supervised People Dataset while reaching a real-time inference speed of 32.40 fps on CPU with very low usage of 10%. Our code would be released at https://***/geekJZY/***.
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aestheti...
详细信息
ISBN:
(纸本)9781665448994
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aesthetic appeal. In this paper, we propose an automated method to render brand logos in the coloring style of branding material such as movie posters. For this, we adopt a photo-realistic neural style transfer method using movie posters as the style source. We propose a color-based image segmentation and matching method to assign style segments to logo segments. Using these, we render the well-known Warner Bros. logo in the coloring style of 141 movie posters. We also present survey results where 287 participants rate the machine-stylized logos for their representativeness and visual appeal.
In this paper we present the Women in computervision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in computer vis...
详细信息
ISBN:
(纸本)9781728125060
In this paper we present the Women in computervision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in computervision field. computervision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in the academia and in the industry. WiCV is organized especially for this reason: to raise visibility of female researchers, to increase collaborations between them, and to provide mentorship to female junior researchers in the field. In this paper, we present a report of trends over the past years, along with a summary of statistics regarding presenters, attendees, and sponsorship for the current workshop.
Many real-world machine learning systems require the ability to continually learn new knowledge. Class incremental learning receives increasing attention recently as a solution towards this goal. However, existing met...
详细信息
ISBN:
(纸本)9781728193601
Many real-world machine learning systems require the ability to continually learn new knowledge. Class incremental learning receives increasing attention recently as a solution towards this goal. However, existing methods often introduce some assumptions to simplify the problem setting, which rarely holds in real-world scenarios. In this paper, we formulate a Generalized Class Incremental Learning (GCIL) framework to systematically alleviate these restrictions, and introduce several novel realistic incremental learning scenarios. In addition, we propose a simple yet effective method, namely ReMix, which combines Exemplar Replay (ER) and Mixup to deal with different challenges in realistic GCIL setups. We demonstrate on CIFAR-100 that ReMix outperforms the state-of-the-art methods in different GCIL setups by significant margins without introducing additional computation cost.
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement tec...
详细信息
ISBN:
(纸本)9781665487399
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement technique, ResFPN (Residual Feature Pyramid Network), as a generic preprocessing method agnostic to the recognition pipeline. A bottom-up pyramidal architecture using the novel Structure Detection block (SDBlock) facilitates extraction of veins of varied widths. Using a feature aggregation module (FAM), we combine these vein-structures, and train the proposed ResFPN for detection of veins across scales. With enhanced presentations, our experiments indicate a reduction upto 5% in the average recognition errors for commonly used recognition pipeline over two publicly available datasets. These improvements are persistent even in cross-dataset scenario where the dataset used to train the ResFPN is different from the one used for recognition.
暂无评论