We propose a simple yet effective proposal-free architecture for lidar panoptic segmentation. We jointly optimize both semantic segmentation and class-agnostic instance classification in a single network using a pilla...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose a simple yet effective proposal-free architecture for lidar panoptic segmentation. We jointly optimize both semantic segmentation and class-agnostic instance classification in a single network using a pilla-rbased bird's-eye view representation. The instance classification head learns pairwise affinity between pillars to determine whether the pillars belong to the same instance or not. We further propose a local clustering algorithm to propagate instance ids by merging semantic segmentation and affinity predictions. Our experiments on nuScenes dataset show that our approach outperforms previous proposal-free methods and is comparable to proposal-based methods which requires extra annotation from object detection.
While virtual try-on has rapidly progressed recently, existing virtual try-on methods still struggle to faithfully represent various details of the clothes when worn. In this paper, we propose a simple yet effective m...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
While virtual try-on has rapidly progressed recently, existing virtual try-on methods still struggle to faithfully represent various details of the clothes when worn. In this paper, we propose a simple yet effective method to better preserve details of the clothing and person by introducing an additional fitting step after geometric warping. This minimal modification enables disentangling representations of the clothing from the wearer, hence we are able to preserve the wearer-agnostic structure and details of the clothing, to fit a garment naturally to a variety of poses and body shapes. Moreover, we propose a novel evaluation framework applicable to any metric, to better reflect the semantics of clothes fitting. From extensive experiments, we empirically verify that the proposed method not only learns to disentangle clothing from the wearer, but also preserves details of the clothing on the try-on results.
Egocentric vision provides a unique perspective of the visual world that is inherently human-centric. Since egocentric cameras are mounted on the user (typically on the user's head), they are naturally primed to g...
详细信息
ISBN:
(纸本)9781479943098
Egocentric vision provides a unique perspective of the visual world that is inherently human-centric. Since egocentric cameras are mounted on the user (typically on the user's head), they are naturally primed to gather visual information from our everyday interactions, and can even act on that information in real-time (e. g. for a vision aid). We believe that this human-centric characteristic of egocentric vision can have a large impact on the way we approach central computervision tasks such as visual detection, recognition, prediction, and socio-behavioral analysis. By taking advantage of the first-person point-of-view paradigm, there have been recent advances in areas such as personalized video summarization, understanding concepts of social saliency, activity analysis with inside-out cameras (a camera to capture eye gaze and an outward-looking camera), recognizing human interactions and modeling focus of attention. However, in many ways people are only beginning to understand the full potential (and limitations) of the first-person paradigm. In the 3rd workshop on Egocentric (First-Person) vision, we bring together researchers to discuss emerging topics such as: Personalization of visual analysis;Socio-behavioral modeling;Understanding group dynamics and interactions;Egocentric video as big data;First-person vision for robotics;and Egographical User Interfaces (EUIs).
In this work, we provide a detailed description on our submitted methods ANTxNN and ANTxNN SSIM to Workshop and Challenge on Learned Image Compression (CLIC) 2021. We propose to incorporate Relativistic average Least ...
详细信息
ISBN:
(纸本)9781665448994
In this work, we provide a detailed description on our submitted methods ANTxNN and ANTxNN SSIM to Workshop and Challenge on Learned Image Compression (CLIC) 2021. We propose to incorporate Relativistic average Least Squares GANs (RaLSGANs) into Rate-Distortion Optimization for end-to-end training, to achieve perceptual image compression. We also compare two types of discriminator networks and visualize their reconstructed images. Experimental results have validated our method optimized by RaLSGANs can achieve higher subjective quality compared to PSNR, MS-SSIM or LPIPS-optimized models.
Fashion-on-demand is becoming an important concept for fashion industries. Many attempts have been made to leverage machine learning methods to generate fashion designs tailored to customers' tastes. However, how ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Fashion-on-demand is becoming an important concept for fashion industries. Many attempts have been made to leverage machine learning methods to generate fashion designs tailored to customers' tastes. However, how to assemble items together (e.g., compatibility) is crucial in designing high-quality outfits for synthesis images. Here we propose a fashion generation model, named OutfitGAN, which contains two core modules: a Generative Adversarial Network and a Compatibility Network. The generative module is able to generate new realistic high quality fashion items from a specific category, while the compatibility network ensures reasonable compatibility among all items. The experimental results show the superiority of our OutfitGAN.
Nowadays, there are outstanding strides towards a future with autonomous vehicles on our roads. While the perception of autonomous vehicles performs well under closed-set conditions, they still struggle to handle the ...
详细信息
ISBN:
(纸本)9781665487399
Nowadays, there are outstanding strides towards a future with autonomous vehicles on our roads. While the perception of autonomous vehicles performs well under closed-set conditions, they still struggle to handle the unexpected. This survey provides an extensive overview of anomaly detection techniques based on camera, lidar, radar, multimodal and abstract object level data. We provide a systematization including detection approach, corner case level, ability for an online application, and further attributes. We outline the state-of-the-art and point out current research gaps.
Learned lossy image compression has demonstrated impressive progress via end-to-end neural network training. However, this end-to-end training belies the fact that lossy compression is inherently not differentiable, d...
详细信息
ISBN:
(纸本)9781665448994
Learned lossy image compression has demonstrated impressive progress via end-to-end neural network training. However, this end-to-end training belies the fact that lossy compression is inherently not differentiable, due to the necessity of quantisation. To overcome this difficulty in training, researchers have used various approximations to the quantisation step. However, little work has studied the mechanism of quantisation approximation itself. We address this issue, identifying three gaps arising in the quantisation approximation problem. These gaps are visualised, and show the effect of applying different quantisation approximation methods. Following this analysis, we propose a Soft-STE quantisation approximation method, which closes these gaps and demonstrates better performance than other quantisation approaches on the Kodak dataset.
Deep learning and patternrecognition in smart farming has seen rapid growth as a building bridge between crop science and computervision. One of the important application is anomaly segmentation in agriculture like ...
详细信息
ISBN:
(纸本)9781665448994
Deep learning and patternrecognition in smart farming has seen rapid growth as a building bridge between crop science and computervision. One of the important application is anomaly segmentation in agriculture like weed, standing water, cloud shadow, etc. Our research work focuses on aerial farmland image dataset known as Agriculture vision. We propose to have data fusion of R, G, B, and NIR modalities that enhances the feature extraction and also propose Efficient Fused Pyramid Network (Fuse-PN) for anomaly pattern segmentation. The proposed encoder module is a bottom-up pathway having a compound scaled network and decoder module is a top-down pyramid network enhancing features at different scales having rich semantic features with lateral connections of low-level features. This proposed approach achieved a mean dice similarity score of 0.8271 for six agricultural anomaly patterns of Agriculture vision dataset and outperforms various approaches in literature.
The goal of this paper is to model the fashion compatibility of an outfit and provide the explanations. We first extract features of all attributes of all items via convolutional neural networks, and then train the bi...
详细信息
ISBN:
(纸本)9781665448994
The goal of this paper is to model the fashion compatibility of an outfit and provide the explanations. We first extract features of all attributes of all items via convolutional neural networks, and then train the bidirectional Long Shortterm Memory (Bi-LSTM) model to learn the compatibility of an outfit by treating these attribute features as a sequence. Gradient penalty regularization is exploited for training inter-factor compatibility net which is used to compute the loss for judgment and provide its explanation which is generated from the recognized reasons related to the judgment. To train and evaluate the proposed approach, we expanded the EVALUATION3 dataset in terms of the number of items and attributes. Experiment results show that our approach can successfully evaluate compatibility with reason.
In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR) an image is combined with a text that provides infor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR) an image is combined with a text that provides information regarding user intentions, and is relevant for application domains like e-commerce. The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder. Then in a second training stage we learn a more complex combiner network that merges visual and textual features. Contrastive learning is used in both stages. The proposed approach obtains state-of-the-art performance for conditioned CBIR on the FashionIQ dataset and for composed CBIR on the more recent CIRR dataset.
暂无评论