We propose a graph-based representation learning framework for video summarization. First, we convert an input video to a graph where nodes correspond to each of the video frames. Then, we impose sparsity on the graph...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
We propose a graph-based representation learning framework for video summarization. First, we convert an input video to a graph where nodes correspond to each of the video frames. Then, we impose sparsity on the graph by connecting only those pairs of nodes that are within a specified temporal distance. We then formulate the video summarization task as a binary node classification problem, precisely classifying video frames whether they should belong to the output summary video. A graph constructed this way aims to capture long-range interactions among video frames, and the sparsity ensures the model trains without hitting the memory and compute bottleneck. Experiments on two datasets(SumMe and TVSum) demonstrate the effectiveness of the proposed nimble model compared to existing state-of-the-art summarization approaches while being one order of magnitude more efficient in compute time and memory.
Whenever a visual perception system is employed in safety-critical applications such as automated driving, a thorough, task-oriented experimental evaluation is necessary to guarantee safe system behavior. While most s...
详细信息
ISBN:
(纸本)9781665448994
Whenever a visual perception system is employed in safety-critical applications such as automated driving, a thorough, task-oriented experimental evaluation is necessary to guarantee safe system behavior. While most standard evaluation methods in computervision provide a good comparability on benchmarks, they tend to fall short on assessing the system performance that is actually relevant for the given task. In our work, we consider pedestrian detection as a highly relevant perception task, and we argue that standard measures such as Intersection over Union (IoU) give insufficient results, mainly because they are insensitive to important physical cues including distance, speed, and direction of motion. Therefore, we investigate so-called relevance metrics, where specific domain knowledge is exploited to obtain a task-oriented performance measure focusing on distance in this initial work. Our experimental setup is based on the CARLA simulator and allows a controlled evaluation of the impact of that domain knowledge. Our first results indicate a linear decrease of the IoU related to the pedestrians' distance, leading to the proposal of a first relevance metric that is also conditioned on the distance.
Current technology utilizes a sophisticated standard room measurement system to analyze the average roughness of Stainless Steel Bearing Surface. However, this process relies on random sampling at infrequent intervals...
详细信息
Neuromorphic vision sensors are biologically inspired devices which differ fundamentally from well known frame-based sensors. Even though developments in this research area are increasing, applications that rely entir...
详细信息
ISBN:
(纸本)9781665448994
Neuromorphic vision sensors are biologically inspired devices which differ fundamentally from well known frame-based sensors. Even though developments in this research area are increasing, applications that rely entirely on event cameras are still relatively rare. This becomes particularly clear when considering real outdoor scenarios apart from laboratory conditions. One obstacle to the development of event-based vision applications in this context may be the lack of labeled datasets for algorithm development and evaluation. Therefore we describe a recording setting of a DVS-based long time monitoring of an urban public area and provide labeled DVS data that also contain effects of environmental outdoor influences recorded in this process. We also describe the processing chain used for label generation, as well as results from a performed denoising benchmark utilizing various spatio-temporal event stream filters. The dataset contains almost 7 hours of real world outdoor event-data with approximate to 47k labeled regions of interest and can be downloaded at http://***/DVS-OUTLAB/
Melanoma is the third most common type of skin cancer and is responsible for the most skin cancer deaths. A diagnosis of melanoma is made by the visual interpretation of tissue sections by a pathologist, a challenging...
详细信息
ISBN:
(纸本)9781665448994
Melanoma is the third most common type of skin cancer and is responsible for the most skin cancer deaths. A diagnosis of melanoma is made by the visual interpretation of tissue sections by a pathologist, a challenging task given the complexity and breadth of melanocytic lesions and the subjective nature of biopsy interpretation. We leverage advances in computervision to aid melanoma diagnosis by segmenting potential regions of lesions on digital images of whole slide skin biopsies. In this study, we demonstrate a Mask-R-CNN-based segmentation framework for such a purpose. To alleviate the cost of data annotation, we leverage a sparse annotation pipeline. Our model can be trained on sparse and noisy labels and achieves state-of-the-art performance in identifying melanocytic proliferations, producing a segmentation with Dice score 0.719, mIOU 0.740 and overall pixel accuracy 0.927.
In this paper, we propose IRTR-DETR, an Interactive and Real-Time Rotated DEtection TRansformer that extends IRT-DETR to predict rotated bounding boxes. IRTR-DETR maintains the Human-In-The-Loop (HIL) workflow of IRTD...
详细信息
Autonomous spacecraft critically depend on on-orbit inspection (i.e., relative navigation and inertial properties estimation) to intercept tumbling debris objects or defunct satellites. This work presents a practical ...
详细信息
ISBN:
(纸本)9781665448994
Autonomous spacecraft critically depend on on-orbit inspection (i.e., relative navigation and inertial properties estimation) to intercept tumbling debris objects or defunct satellites. This work presents a practical method for on-orbit inspection and demonstrates its performance in simulation using NASA's Astrobee robotic free-flyers. The problem is formulated as a simultaneous localization and mapping task, utilizing IMU data from an observing "chaser" spacecraft and point clouds of the observed "target" spacecraft obtained via a 3D time-of-flight camera. The relative navigation between the chaser and target is solved via a factor graph-based approach. The target's principal axes of inertia are then estimated via a conic fit optimization procedure using a polhode analysis. Simulation results indicate the accuracy of the proposed method in preparation for hardware experiments on the International Space Station.
To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extra...
详细信息
ISBN:
(纸本)9781665448994
To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side and transmitted over the network. In this context, we discuss and evaluate the benefits and disadvantages of several deep adversarial approaches. In particular, we explore quality and bandwidth trade-offs for approaches based on static landmarks, dynamic landmarks or segmentation maps. We design a mobile-compatible architecture based on the first order animation model of Siarohin et al. In addition, we leverage SPADE blocks to refine results in important areas such as the eyes and lips. We compress the networks down to about 3 MB, allowing models to run in real time on iPhone 8 (CPU). This approach enables video calling at a few kbits per second, an order of magnitude lower than currently available alternatives.
Neuromorphic cameras feature asynchronous event-based pixel-level processing and are particularly useful for object tracking in dynamic environments. Current approaches for feature extraction and optical flow with hig...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Neuromorphic cameras feature asynchronous event-based pixel-level processing and are particularly useful for object tracking in dynamic environments. Current approaches for feature extraction and optical flow with high-performing hybrid RGB-events vision systems require large computational models and supervised learning, which impose challenges for embedded vision and require annotated datasets. In this work, we propose ED-DCFNet, a small and efficient (< 72k) unsupervised multi-domain learning framework, which extracts events-frames shared features without requiring annotations, with comparable performance. Furthermore, we introduce an open-sourced event and frame-based dataset that captures indoor scenes with various lighting and motion-type conditions in realistic scenarios, which can be used for model building and evaluation. The dataset is available at https://***/NBELab/UnsupervisedTracking.
Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computervision. This paper propose...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computervision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance, with up to 10-fold gain in parameter efficiency for transfer learning tasks.
暂无评论