Image Search is a fundamental task playing a significant role in the success of wide variety of frameworks and applications. However, with the increasing sizes of product catalogues and the number of attributes per pr...
详细信息
ISBN:
(纸本)9781665448994
Image Search is a fundamental task playing a significant role in the success of wide variety of frameworks and applications. However, with the increasing sizes of product catalogues and the number of attributes per product, it has become difficult for users to express their needs effectively. Therefore, we focus on the problem of Image Retrieval with Text Feedback, which involves retrieving modified images according to the natural language feedback provided by users. In this work, we hypothesise that since an image can be delineated by its content and style features, modifications to the image can also take place in the two sub spaces respectively. Hence, we decompose an input image into its corresponding style and content features, apply modification of the text feedback individually in both the style and content spaces and finally fuse them for retrieval. Our experiments show that our approach outperforms a recent state of the art method in this task, TIRG, that seeks to use a single vector in contrast to leveraging the modification via text over style and content spaces separately.
We study the problem of unsupervised discovery and segmentation of object parts, which, as an intermediate local representation, are capable of finding intrinsic object structure and providing more explainable recogni...
详细信息
ISBN:
(纸本)9781665445092
We study the problem of unsupervised discovery and segmentation of object parts, which, as an intermediate local representation, are capable of finding intrinsic object structure and providing more explainable recognition results. Recent unsupervised methods have greatly relaxed the dependency on annotated data which are costly to obtain, but still rely on additional information such as object segmentation mask or saliency map. To remove such a dependency and further improve the part segmentation performance, we develop a novel approach by disentangling the appearance and shape representations of object parts followed with reconstruction losses without using additional object mask information. To avoid degenerated solutions, a bottleneck block is designed to squeeze and expand the appearance representation, leading to a more effective disentanglement between geometry and appearance. Combined with a self-supervised part classification loss and an improved geometry concentration constraint, we can segment more consistent parts with semantic meanings. Comprehensive experiments on a wide variety of objects such as face, bird, and PASCAL VOC objects demonstrate the effectiveness of the proposed method.
Prior work in scene graph generation requires categorical supervision at the level of triplets-subjects and objects, and predicates that relate them, either with or without bounding box information. However, scene gra...
详细信息
ISBN:
(纸本)9781665445092
Prior work in scene graph generation requires categorical supervision at the level of triplets-subjects and objects, and predicates that relate them, either with or without bounding box information. However, scene graph generation is a holistic task: thus holistic, contextual supervision should intuitively improve performance. In this work, we explore how linguistic structures in captions can benefit scene graph generation. Our method captures the information provided in captions about relations between individual triplets, and context for subjects and objects (e.g. visual properties are mentioned). Captions are a weaker type of supervision than triplets since the alignment between the exhaustive list of human-annotated subjects and objects in triplets, and the nouns in captions, is weak. However, given the large and diverse sources of multimodal data on the web (e.g. blog posts with images and captions), linguistic supervision is more scalable than crowdsourced triplets. We show extensive experimental comparisons against prior methods which leverage instance- and image-level supervision, and ablate our method to show the impact of leveraging phrasal and sequential context, and techniques to improve localization of subjects and objects.
We present DeepSurfels, a novel hybrid scene representation for geometry and appearance information. DeepSurfels combines explicit and neural building blocks to jointly encode geometry and appearance information. In c...
详细信息
ISBN:
(纸本)9781665445092
We present DeepSurfels, a novel hybrid scene representation for geometry and appearance information. DeepSurfels combines explicit and neural building blocks to jointly encode geometry and appearance information. In contrast to established representations, DeepSurfels better represents high-frequency textures, is well-suited for online updates of appearance information, and can be easily combined with machine learning methods. We further present an end-to-end trainable online appearance fusion pipeline that fuses information from RGB images into the proposed scene representation and is trained using self-supervision imposed by the reprojection error with respect to the input images. Our method compares favorably to classical texture mapping approaches as well as recent learning-based techniques. Moreover, we demonstrate lower runtime, improved generalization capabilities, and better scalability to larger scenes compared to existing methods.
Novelty recognition is essential for exploring and collecting unknown but interesting information, in which multiple brain regions related to cognitive memory are involved. Dorsal hippocampus (dHPC) is revealed to be ...
详细信息
3D object detection is an essential perception task in autonomous driving to understand the environments. The Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with ...
详细信息
This endeavors to address pressing societal concerns by introducing innovative solutions. It confronts the challenges of communication barriers encountered by the hearing-impaired, the excessive reliance on smartphone...
详细信息
Sketch-based image retrieval (SBIR) has undergone an increasing interest in the community of computervision bringing high impact in real applications. For instance, SBIR brings an increased benefit to eCommerce searc...
详细信息
ISBN:
(纸本)9781665448994
Sketch-based image retrieval (SBIR) has undergone an increasing interest in the community of computervision bringing high impact in real applications. For instance, SBIR brings an increased benefit to eCommerce search engines because it allows users to formulate a query just by drawing what they need to buy. However, current methods showing high precision in retrieval work in a high dimensional space, which negatively affects aspects like memory consumption and time processing. Although some authors have also proposed compact representations, these drastically degrade the performance in a low dimension. Therefore in this work, we present different results of evaluating methods for producing compact embeddings in the context of sketch-based image retrieval. Our main interest is in strategies aiming to keep the local structure of the original space. The recent unsupervised local-topology preserving dimension reduction method UMAP fits our requirements and shows outstanding performance, improving even the precision achieved by SOTA methods. We evaluate six methods in two different datasets. We use Flickr15K and eCommerce datasets;the latter is another contribution of this work. We show that UMAP allows us to have feature vectors of 16 bytes improving precision by more than 35%.
Semantic dictionaries are the basis of natural language processing tasks. With the construction of WordNet, CCD, HowNet and other semantic dictionaries, the semantic resources of Chinese and English have been greatly ...
详细信息
Estimating the 6D pose of objects accurately, quickly, and robustly remains a difficult task. However, recent methods for directly regressing poses from RGB images using dense features have achieved state-of...
详细信息
暂无评论