User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash tags have become increasingly popular on ...
详细信息
ISBN:
(纸本)9781450347532
User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of similar to 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.
In recent years, three-dimensional measurement techniques have been widely used in medical sciences, and thus, depth detection in an image plays an important role in computervision applications. In this paper, we dis...
详细信息
ISBN:
(纸本)9781728186290
In recent years, three-dimensional measurement techniques have been widely used in medical sciences, and thus, depth detection in an image plays an important role in computervision applications. In this paper, we discuss the estimation of the distance between the head of an endoscope and the small intestine septum and its problems. the main objective is to detect the depth of the small intestine to estimate distance. images were collected through video sampling, and then the data are preprocessed. Morphological reconstruction, bounding box, Convex Hull, and Euclidean distance are employed to update the mentioned distance. At the end of this process, the outputs are simulated, and we are given the output distance in centimeters. this method will assist the endoscope to move inside the small intestine without injuries.
Withthe increasing complexity of machine vision algorithms and growing applications of imageprocessing, how do computers without a dedicated graphics processor perform? this research discusses the computational abil...
详细信息
ISBN:
(纸本)9781665414906
Withthe increasing complexity of machine vision algorithms and growing applications of imageprocessing, how do computers without a dedicated graphics processor perform? this research discusses the computational abilities of two low-cost single board computers (SBCs) by subjecting them to various Visual Inertial Odometry (VIO) algorithms. the end goal of this research is to identify a SBC which meets the requirements of being employed on an Unmanned Aerial System for autonomous navigation.
images have a very special role for humans in interpreting our world. Developments in image acquisition technology are enabling us to extend our visual faculties beyond the limitations of our physical presence and the...
详细信息
images have a very special role for humans in interpreting our world. Developments in image acquisition technology are enabling us to extend our visual faculties beyond the limitations of our physical presence and the resolving power of our eyes. this tutorial paper focuses on image acquisition technologies which give us insights that cannot be achieved at the wavelengths that we can perceive unaided. It aims to introduce two of the most rapidly developing and exciting areas in this field synthetic aperture radar imaging and tomographic medical imaging - each of which has the capacity to provide significant, but widely differing, benefits to mankind.
Unsupervised domain adaptation (DA) techniques inherently assume the presence of ample amount of source domain training samples in addition to the target domain test data. the domains are characterized by domain-speci...
详细信息
ISBN:
(纸本)9781450347532
Unsupervised domain adaptation (DA) techniques inherently assume the presence of ample amount of source domain training samples in addition to the target domain test data. the domains are characterized by domain-specific probability distributions governing the data which are substantially different from each other. the goal is to build a task oriented classifier model that performs proportionately in boththe domains. In contrary to the standard unsupervised DA setup, we propose a maximum-margin clustering (MMC) based framework for the same which does not consider source domain labeled samples. Instead we formulate it as a joint clustering problem of all the samples from boththe domains in a common feature subspace. the Geodesic Flow Kernel (GFK) based subspace projection technique in the Grassmannian manifold is adopted to cast the samples in a domain invariant space. Further, the MMC stage is followed to simultaneously group the data based on the maximization of margins and a classifier is learned for each group. the data overlapping problem is taken care of by specifically learning a SVM-KNN classifier for the potentially unreliable samples per group. We validate the framework on a pair of remote sensing images of different modalities for the purpose of land-cover classification and a generic object dataset for recognition. We observe that the proposed method exhibits performances at par withthe fully supervised case for boththe tasks but without the requirement of costly annotations.
Although Deep Convolutional Neural Networks trained with strong pixel-level annotations have significantly pushed the performance in semantic segmentation, annotation efforts required for the creation of training data...
详细信息
In this paper, we propose a simple system model to generate a temporal sequence of biospeckle patterns which is based on speckle dynamicity by clockwise rotation of a diffuser plate in coherence illumination field. th...
详细信息
ISBN:
(纸本)9781509048748
In this paper, we propose a simple system model to generate a temporal sequence of biospeckle patterns which is based on speckle dynamicity by clockwise rotation of a diffuser plate in coherence illumination field. these generated patterns are applied to two popular biospeckle quantification strategies namely inertia moment and absolute value of difference. Performance of both algorithms is assessed and compared by using a number of fixed correlated synthetic speckle sequences.
Despite significant advancements in large-scale text-to-image generation and text-conditioned image editing, appearance transfer remains relatively unexplored. Transferring appearance aims to transfer an object's ...
详细信息
ISBN:
(纸本)9798400710759
Despite significant advancements in large-scale text-to-image generation and text-conditioned image editing, appearance transfer remains relatively unexplored. Transferring appearance aims to transfer an object's appearance in an appearance image to an object in the structure image so that background details are preserved and accurately reflect the transferred object's characteristics. Appearance transfer has practical applications in areas like virtual try-on and e-commerce product placement. Existing methods often require fine-tuning text-to-image diffusion models or are not applicable to virtual try-on and e-commerce scenarios, which is not ideal. In this paper, we introduce a Mask-Guided attention mechanism that replaces the existing self-attention in U-net architecture of Stable diffusion [29]. this approach can be easily integrated into the Masactrl [6] framework, enabling appearance transfer without model fine-tuning and suitable for a wide range of applications. Our method uses masks of objects in images to guide the appearance transfer process, withthese masks obtained from the Segment Anything Model (SAM) [17]. this integration of SAM-generated masks allows for precise object localization and more accurate appearance transfer. We have conducted comprehensive experiments on transferring various clothing items (shirts, jeans, t-shirts) onto people, as well as transferring sofas into living spaces.
In this paper, a novel algorithm for image encryption based on SHA-512 is proposed. the main idea of the algorithm is to use one half of image data for encryption of the other half of the image reciprocally. Distinct ...
详细信息
this paper describes a computer-aided system for analyzing immunohistochemically stained meningioma cancer cell images. Accurate segmentation of cells in such images plays a critical role in diagnosing diffrent type o...
详细信息
暂无评论