The automated recognition and identification of license plates is an essential element of intelligent transportation systems that enable effective traffic management, security measures, and the development of efficien...
详细信息
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in p...
详细信息
ISBN:
(纸本)9781665445092
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.
We develop a deep convolutional neural networks (CNNs) to deal with the blurry artifacts caused by the defocus of the camera using dual-pixel images. Specifically, we develop a double attention network which consists ...
详细信息
ISBN:
(纸本)9781665448994
We develop a deep convolutional neural networks (CNNs) to deal with the blurry artifacts caused by the defocus of the camera using dual-pixel images. Specifically, we develop a double attention network which consists of attentional encoders, triple locals and global local modules to effectively extract useful information from each image in the dual-pixels and select the useful information from each image and synthesize the final output image. We demonstrate the effectiveness of the proposed deblurring algorithm in terms of both qualitative and quantitative aspects by evaluating on the test set in the NTIRE 2021 Defocus Deblurring using Dual-pixel Images Challenge [1] [4].
Binocular vision-based target detection is one of the hot topics in computervision, where the technique aims to detect and localize target objects in images. The technology has applications in fields such as autonomo...
详细信息
As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the r...
详细信息
ISBN:
(纸本)9781665448994
As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the right relevance to each input pixel with respect to the output of the network. In this paper, we focus on Class Activation Mapping (CAM) approaches, which provide an effective visualization by taking weighted averages of the activation maps. To enhance the evaluation and the reproducibility of such approaches, we propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches. To evaluate the appropriateness of the proposal, we compare different CAM-based visualization methods on the entire ImageNet validation set, fostering proper comparisons and reproducibility.
In various fields such as medical imaging, object detection, and video surveillance, multi view natural language query systems utilize image data to provide a more comprehensive perspective, allowing users to intuitiv...
详细信息
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of ...
详细信息
ISBN:
(纸本)9781665448994
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event recognition. Moreover, the weighted in-degrees (WiDs) derived from the graph's adjacency matrix at frame level are used for identifying the objects that were considered most (or least) salient for event recognition and contributed the most (or least) to the final event recognition decision, thus providing an explanation for the latter. The experimental results show that the proposed method achieves state-of-the-art performance on the publicly available FCVID and YLI-MED datasets(1).
In recent years, computervision technology has made significant progress, expanding its application from simple image recognition tasks to complex real-world problems. One such area where computervision promises to ...
详细信息
In this paper, we propose an online movement-specific vehicle counting system to realize robust traffic flow analysis at crowded intersections. Our proposed framework adopts PP-YOLO as the vehicle detector and adapts ...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose an online movement-specific vehicle counting system to realize robust traffic flow analysis at crowded intersections. Our proposed framework adopts PP-YOLO as the vehicle detector and adapts the Deep-Sort algorithm to perform multi-object tracking. In order to realize online and robust vehicle counting, we further adopt a shape-based movement assignment strategy to differentiate movements and carefully designed spatial constraints to effectively reduce false-positive counts. Our proposed framework achieves the overall S1-score of 0.9467, ranking the first in the AICITY2021-track1 challenge.
Shadow removal is an important computervision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photorealistic restoration of the image contents. Decades of ...
详细信息
ISBN:
(纸本)9781665448994
Shadow removal is an important computervision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photorealistic restoration of the image contents. Decades of research produced a multitude of hand-crafted restoration techniques and, more recently, learned solutions from shadowed and shadow free training image pairs. In this work, we propose a single image shadow removal solution via self-supervised learning by using a conditioned mask. We rely on self-supervision and jointly learn deep models to remove and add shadows to images. We derive two variants for learning from paired images and unpaired images, respectively. Our validation on the recently introduced ISTD and USR datasets demonstrate large quantitative and qualitative improvements over the state-of-the-art for both paired and unpaired learning settings.
暂无评论