Traffic event retrieval is one of the important tasks for intelligent traffic system management. To find accurate candidate events in traffic videos corresponding to a specific text query, it is necessary to understan...
详细信息
ISBN:
(纸本)9781665448994
Traffic event retrieval is one of the important tasks for intelligent traffic system management. To find accurate candidate events in traffic videos corresponding to a specific text query, it is necessary to understand the text query's attributes, represent the visual and motion attributes of vehicles in videos, and measure the similarity between them. Thus we propose a promising method for vehicle event retrieval from a natural-language-based specification. We utilize both appearance and motion attributes of a vehicle and adapt the COOT model to evaluate the semantic relationship between a query and a video track. Experiments with the test dataset of Track 5 in AI City Challenge 2021 show that our method is among the top 6 with a score of 0.1560.
Fashion retrieval methods aim at learning a clothing-specific embedding space where images are ranked based on their global visual similarity with a given query. However, global embeddings struggle to capture localize...
详细信息
ISBN:
(纸本)9781665448994
Fashion retrieval methods aim at learning a clothing-specific embedding space where images are ranked based on their global visual similarity with a given query. However, global embeddings struggle to capture localized fine-grained similarities between images, because of aggregation operations. Our work deals with this problem by learning localized representations for fashion retrieval based on local interest points of prominent visual features specified by a user. We introduce a localized triplet loss function that compares samples based on corresponding patterns. We incorporate random local perturbation on the interest point as a key regularization technique to enforce local invariance of visual representations. Due to the absence of existing fashion datasets to train on localized representations, we introduce FashionLocalTriplets, a new high-quality dataset annotated by fashion specialists that contains triplets of women's dresses and interest points. The proposed model outperforms state-of-the-art global representations on FashionLocalTriplets.
Motion segmentation is a technique to detect and localize class-agnostic motion in videos. This motion is assumed to be relative to a stationary background and usually originates from objects such as vehicles or human...
详细信息
ISBN:
(纸本)9781665448994
Motion segmentation is a technique to detect and localize class-agnostic motion in videos. This motion is assumed to be relative to a stationary background and usually originates from objects such as vehicles or humans. When the camera moves, too, frame differencing approaches that do not have to model the stationary background over minutes, hours, or even days are more promising compared to background subtraction methods. In this paper, we propose a Deep Convolutional Neural Network (DCNN) for multi-modal motion segmentation: the current image contributes with appearance information to distinguish between relevant and irrelevant motion and frame differencing captures the temporal information, which is the scene's motion independent of the camera motion. We fuse this information to receive an effective and efficient approach for robust motion segmentation. The effectiveness is demonstrated using the multi-spectral CDNet-2014 dataset that we re-labeled for motion segmentation. We specifically show that we can detect tiny moving objects significantly better compared to methods based on optical flow.
The stomatopod (mantis shrimp) visual system has recently provided a blueprint for the design of paradigm-shifting polarization and multispectral imaging sensors, enabling solutions to challenging medical and remote s...
详细信息
We introduce WyPR, a Weakly-supervised framework for Point cloud recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmenta...
详细信息
ISBN:
(纸本)9781665445092
We introduce WyPR, a Weakly-supervised framework for Point cloud recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses. We show that in conjunction with standard multiple-instance learning objectives, WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time. We demonstrate its efficacy using the ScanNet and S3DIS datasets, outperforming prior state of the art on weakly-supervised segmentation by more than 6% mIoU. In addition, we set up the first benchmark for weakly-supervised 3D object detection on both datasets, where WyPR outperforms standard approaches and establishes strong baselines for future work.
Universities are concentrating on building industry characteristics to strengthen and expand their influence in the information age. Big data on intellectual output is a key representation of discipline construction. ...
详细信息
ISBN:
(纸本)9789819743865;9789819743872
Universities are concentrating on building industry characteristics to strengthen and expand their influence in the information age. Big data on intellectual output is a key representation of discipline construction. We created an algorithm to identify development features of disciplines, including temporal trend, research hotspots, and mutation characteristics. Using CNKI as data source, with the aid of scientific knowledge graph and social network analysis, 10786 core journal thesis published between 2012 and 2021 demonstrated the co-occurrence, keyword development trajectory, and mutation word development path. According to data mining, high-level intellectual output that is relevant to industry increased in quantity and proportion, and intimacy also improved. High levels of interdisciplinary interaction, a variety of disciplinary innovations, and in-depth disciplinary culture formation should characterize the pattern of disciplinary development. This study is an attempt of specialized disciplines development patternrecognition by big data intelligence, and the recognition algorithms can be used for feature recognition in multidisciplinary fields.
In today's rapidly evolving technological landscape, object recognition is a critical component of computervision, impacting numerous fields such as robotics, surveillance, and augmented reality. Object recogniti...
详细信息
Vehicle identification and recognition are essential computervision tasks with important applications in autonomous driving, traffic management, and surveillance systems. The Indian Driving Dataset (IDD) dataset used...
详细信息
Image-based methods for indoor lighting estimation suffer from the problem of intensity-distance ambiguity. This paper introduces a novel setup to help alleviate the ambiguity based on the event camera. We further dem...
详细信息
ISBN:
(纸本)9781665445092
Image-based methods for indoor lighting estimation suffer from the problem of intensity-distance ambiguity. This paper introduces a novel setup to help alleviate the ambiguity based on the event camera. We further demonstrate that estimating the distance of a light source becomes a well-posed problem under this setup, based on which an optimization-based method and a learning-based method are proposed. Our experimental results validate that our approaches not only achieve superior performance for indoor lighting estimation (especially for the close light) but also significantly alleviate the intensity-distance ambiguity.
We propose a new flash technique for low-light imaging, using deep-red light as an illuminating source. Our main observation is that in a dim environment, the human eye mainly uses rods for the perception of light, wh...
详细信息
ISBN:
(纸本)9781665445092
We propose a new flash technique for low-light imaging, using deep-red light as an illuminating source. Our main observation is that in a dim environment, the human eye mainly uses rods for the perception of light, which are not sensitive to wavelengths longer than 620 nm, yet the camera sensor still has a spectral response. We propose a novel modulation strategy when training a modern CNN model for guided image filtering, fusing a noisy RGB frame and a flash frame. This fusion network is further extended for video reconstruction. We have built a prototype with minor hardware adjustments and tested the new flash technique on a variety of static and dynamic scenes. The experimental results demonstrate that our method produces compelling reconstructions, even in extra dim conditions.
暂无评论