In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and ...
详细信息
ISBN:
(纸本)9781538637883
In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.
Facial Attractiveness Prediction (FAP) is a useful yet challenging problem in the domain of computervision. In this paper, we propose a deep learning based approach. Different from the existing deep methods, the prop...
详细信息
ISBN:
(纸本)9781538637883
Facial Attractiveness Prediction (FAP) is a useful yet challenging problem in the domain of computervision. In this paper, we propose a deep learning based approach. Different from the existing deep methods, the proposed one models boththe texture and shape clues within a multi-task learning framework consisting of attractiveness score prediction and fiducial landmark localization, thus highlighting both of their roles in assessing attractiveness of faces. Considering that the training data are not extensive, a lightweight CNN is designed to jointly learn the facial representation, landmark location, and facial attractiveness score. the proposed method is evaluated on the SCUT-FBP database, and a prediction correlation 0.92, is delivered, which shows the effectiveness of our method. Furthermore, two additional experiments in terms of comparison between facial images before and after make-up or beautification are conducted. the results also prove the advantage of the proposed method.
Image-based insect species identification is a comprehensive application of computervision technology, image processing technology and patternrecognition technology to realize insect species identification. It is of...
详细信息
Recent years have witnessed rapid progress in monocular human mesh recovery. Despite their impressive performance on public benchmarks, existing methods are vulnerable to unusual poses, which prevents them from deploy...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
Recent years have witnessed rapid progress in monocular human mesh recovery. Despite their impressive performance on public benchmarks, existing methods are vulnerable to unusual poses, which prevents them from deploying to challenging scenarios such as dance and martial arts. this issue is mainly attributed to the domain gap induced by the data scarcity in relevant cases. Most existing datasets are captured in constrained scenarios and lack samples of such complex movements. For this reason, we propose a data collection pipeline comprising automatic crawling, precise annotation, and hardcase mining. Based on this pipeline, we establish a large dataset in a short time. the dataset, named HardMo, contains 7M images along with precise annotations covering 15 categories of dance and 14 categories of martial arts. Empirically, we find that the prediction failure in dance and martial arts is mainly characterized by the misalignment of hand-wrist and foot-ankle. To dig deeper into the two hardcases, we leverage the proposed automatic pipeline to filter collected data and construct two subsets named HardMo-Hand and HardMo-Foot. Extensive experiments demonstrate the effectiveness of the annotation pipeline and the data-driven solution to failure cases. Specifically, after being trained on HardMo, HMR, an early pioneering method, can even outperform the current state of the art, 4DHumans, on our benchmarks. Dataset will be publicly available at https://***/ ***.
Image edge detection is a fundamental process in computervision. Image edges represent the major fraction of information in an image. Traditional edge-detection techniques focus on the gradient calculation method. In...
详细信息
ISBN:
(纸本)9781479921867
Image edge detection is a fundamental process in computervision. Image edges represent the major fraction of information in an image. Traditional edge-detection techniques focus on the gradient calculation method. In this paper, for the first time, the statistical patternrecognition method is used to detect the edge after the real-time image was processed via the median filtering method and implemented on FPGA. In comparison to the Sobel algorithm, the proposed method has superior anti-noise capability.
Video face recognition (VFR) has gained significant attention as a promising field combining computervision and artificial intelligence, revolutionizing identity authentication and verification. Unlike traditional im...
详细信息
ISBN:
(纸本)9789819984688;9789819984695
Video face recognition (VFR) has gained significant attention as a promising field combining computervision and artificial intelligence, revolutionizing identity authentication and verification. Unlike traditional image-based methods, VFR leverages the temporal dimension of video footage to extract comprehensive and accurate facial information. However, VFR heavily relies on robust computing power and advanced noise processing capabilities to ensure optimal recognition performance. this paper introduces a novel length-adaptive VFR framework based on a recurrent-mechanism-driven vision Transformer, termed TempoViT. TempoViT efficiently captures spatial and temporal information from face videos, enabling accurate and reliable face recognition while mitigating the high GPU memory requirements associated with video processing. By leveraging the reuse of hidden states from previous frames, the framework establishes recurring links between frames, allowing the modeling of long-term dependencies. Experimental results validate the effectiveness of TempoViT, demonstrating its state-of-the-art performance in video face recognition tasks on benchmark datasets including iQIYI-ViD, YTF, IJB-C, and Honda/UCSD.
Key structures extraction and matching are key steps in computervision. Many fields of application need large image acquisition and fast extraction of fine structures. In this study, we focus on situations where exis...
详细信息
ISBN:
(纸本)9781538637883
Key structures extraction and matching are key steps in computervision. Many fields of application need large image acquisition and fast extraction of fine structures. In this study, we focus on situations where existing local feature extractors give not enough satisfying results concerning both accuracy and time processing. Among good illustrations, we can quote short-line extraction in local weakly-contrasted images. We propose a new Fast Local Analysis by threSHolding (FLASH) designed to process large images under hard time constraints. We use "micro-line" points as key feature. these are used for shape reconstruction (like lines) and local signature design. We apply FLASH on the field of concrete infrastructure monitoring where robots and UAVs are more and more used for automated defect detection (like cracks). For large concrete surfaces, there are several hard constraints such as the computational time and the reliability. Results show us that the computations are faster than several existing algorithms in image matching and FLASH has invariance to rotation, partial occlusion, and scale range from 0.7 to 1.4 without scale-space exploration.
the proceedings contain 153 papers. the topics discussed include: robust facial expression recognition based on dual branch multi-feature learning;review of human violence recognition algorithms;interlaced perception ...
ISBN:
(纸本)9781665467346
the proceedings contain 153 papers. the topics discussed include: robust facial expression recognition based on dual branch multi-feature learning;review of human violence recognition algorithms;interlaced perception for person re-identification based on Swin transformer;skeleton-based dumbbell fitness action recognition using two-stream LSTM network;human action recognition based on three-stream network with frame sequence features;combining attention mechanism and dual-stream 3d convolutional neural network for micro-expression recognition;a robust approach for smile recognition via deep convolutional neural networks;a named entity recognition method based on deep learning for chinese legal documents;modality-independent regression and training for improving multispectral pedestrian detection;textile defect detection algorithm based on unsupervised learning;an optimization scheme of object detection model based on CNN feature visualization method;and a crosswalk stripe detection model based on gradient similarity tags.
Many computervision and image processing applications rely on local features. It is well-known that motion blur decreases the performance of traditional feature detectors and descriptors. We propose an inertial-based...
详细信息
ISBN:
(纸本)9781538637883
Many computervision and image processing applications rely on local features. It is well-known that motion blur decreases the performance of traditional feature detectors and descriptors. We propose an inertial-based deblurring method for improving the robustness of existing feature detectors and descriptors against the motion blur. Unlike most deblurring algorithms, the method can handle spatially-variant blur and rolling shutter distortion. Furthermore, it is capable of running in real-time contrary to state-of-the-art algorithms. the limitations of inertial-based blur estimation are taken into account by validating the blur estimates using image data. the evaluation shows that when the method is used with traditional feature detector and descriptor, it increases the number of detected keypoints, provides higher repeatability and improves the localization accuracy. We also demonstrate that such features will lead to more accurate and complete reconstructions when used in the application of 3D visual reconstruction.
Multi-label image annotation is one of the most important open problems in machine learning and computervision. In this paper, we propose a novel model for image annotation. Unlike existing works that usually use con...
详细信息
ISBN:
(纸本)9781538637883
Multi-label image annotation is one of the most important open problems in machine learning and computervision. In this paper, we propose a novel model for image annotation. Unlike existing works that usually use conventional visual features to annotate images, this paper adopts features based on convolutional neural network (CNN), which have shown potential to achieve outstanding performance. In particular, we use CNN to extract image features with higher semantic meaning and apply them to the image annotation method - Tag Propagation (TagProp). Experimental results on four challenging datasets indicate that our model makes a marked improvement as compared to the current state-of-the-art.
暂无评论