Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect use...
详细信息
ISBN:
(纸本)9781665448994
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect users' privacy. This paper demonstrated a people segmentation framework called CE-PeopleSeg, which employed an efficient segmentation method, structural pruning, and dynamic frame skipping techniques, leading to a fast inference speed on CPU. Our extensive experiments show that the proposed CE-PeopleSeg can achieve a high prediction mIoU of 87.9% on Supervised People Dataset while reaching a real-time inference speed of 32.40 fps on CPU with very low usage of 10%. Our code would be released at https://***/geekJZY/***.
Digital art restoration has benefited from inpainting models to correct the degradation or missing sections of a painting. This work compares three current state-of-the art models for inpainting of large missing regio...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Digital art restoration has benefited from inpainting models to correct the degradation or missing sections of a painting. This work compares three current state-of-the art models for inpainting of large missing regions. We provide qualitative and quantitative comparison of the performance by CoModGANs, LaMa and GLIDE in inpainting of blurry and missing sections of images. We use Escher's incomplete painting Print Gallery as our test study since it presents several of the challenges commonly present in restorative inpainting.
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set ...
详细信息
ISBN:
(纸本)9781665448994
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism. This allows synthesis of visually plausible, novel compositions of a scene, with varying content and layout, while preserving the context of the original sample. Compared to previous single-image GAN models, One-Shot GAN achieves higher diversity and quality of synthesis. It is also not restricted to the single image setting, successfully learning in the introduced setting of a single video.
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and ob...
详细信息
ISBN:
(纸本)9781728193601
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and object entities in a specified relationship. We introduce a simple yet effective proposal-based method for referring relationships. Different from the existing methods such as SSAS, our method can generate a high-resolution result while reducing its complexity and ambiguity. Our method is composed of two modules: a category-based proposal generation module to select the proposals related to the entities and a predicate analysis module to score the compatibility of pairs of selected proposals. We show state-of-the-art performance on the referring relationship task on two public datasets: Visual Relationship Detection and Visual Genome.
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of ...
详细信息
ISBN:
(纸本)9781665448994
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event recognition. Moreover, the weighted in-degrees (WiDs) derived from the graph's adjacency matrix at frame level are used for identifying the objects that were considered most (or least) salient for event recognition and contributed the most (or least) to the final event recognition decision, thus providing an explanation for the latter. The experimental results show that the proposed method achieves state-of-the-art performance on the publicly available FCVID and YLI-MED datasets(1).
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the met...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning features the capability of generalizing from a few examples. In this paper, we first identify that a discriminative feature space, namely a rectified metric space, that is learned to maintain the metric consistency from training to testing, is an essential component to the success of metric-based few-shot learning. Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains. The resulting approach, called rectified metric propagation (ReMP), further optimizes an attentive prototype propagation network, and applies a repulsive force to make confident predictions. Extensive experiments demonstrate that the proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (D...
详细信息
ISBN:
(纸本)9781728193601
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (DNN) to conventional software is a challenge for practical software verification. In this work, we show how existing methods from software engineering provide benefits for the development of a DNN and in particular for dataset design and analysis. We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aestheti...
详细信息
ISBN:
(纸本)9781665448994
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aesthetic appeal. In this paper, we propose an automated method to render brand logos in the coloring style of branding material such as movie posters. For this, we adopt a photo-realistic neural style transfer method using movie posters as the style source. We propose a color-based image segmentation and matching method to assign style segments to logo segments. Using these, we render the well-known Warner Bros. logo in the coloring style of 141 movie posters. We also present survey results where 287 participants rate the machine-stylized logos for their representativeness and visual appeal.
With one in four individuals afflicted with malnutrition, computervision may provide a way of introducing a new level of automation in the nutrition field to reliably monitor food and nutrient intake. In this study, ...
详细信息
ISBN:
(纸本)9781728125060
With one in four individuals afflicted with malnutrition, computervision may provide a way of introducing a new level of automation in the nutrition field to reliably monitor food and nutrient intake. In this study, we present a novel approach to modeling the link between color and vitamin A content using transmittance imaging of a pureed foods dilution series in a computervision powered nutrient sensing system via a fine-tuned deep autoencoder network, which in this case was trained to predict the relative concentration of sweet potato purees. Experimental results show the deep autoencoder network can achieve an accuracy of 80% across beginner (6 month) and intermediate (8 month) commercially prepared pureed sweet potato samples. Prediction errors may be explained by fundamental differences in optical properties which are further discussed.
Neural Architecture Search (NAS) can automatically design model architecture with better performance. Current researchers have searched for local architecture similar to block, then stacked to construct entire models,...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Neural Architecture Search (NAS) can automatically design model architecture with better performance. Current researchers have searched for local architecture similar to block, then stacked to construct entire models, or searched the entire model based on a manually designed benchmark module. There is no method to directly search the architecture of the global(entire) model at the operation level. The purpose of this article is to search the entire model directly in the operation level search space. We analyzed the search space of past methods which searching for local architectures, then a working mode for global model architecture search named CAM is proposed. Proposed CAM decouples the architectural parameters of the entire model which can complete the entire model architecture search with few architecture parameters. In the experiment, the test error 2.68 % in CIFAR-10 is obtained by the proposed method at the global architecture level, which can compare with the stage-of-art local architecture search methods.
暂无评论