检索结果-内蒙古大学图书馆

26th computer vision Winter Workshop, CVWW 2023

作者： Lin, Wei Kukleva, Anna Possegger, Horst Kuehne, Hilde Bischof, Horst Institute of Computer Graphics and Vision Graz University of Technology Austria Christian Doppler Laboratory for Semantic 3D Computer Vision Austria Max-Planck-Institute for Informatics Germany Goethe University Frankfurt Germany

Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on large-scale datasets. To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences. In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-To-sequence learning, to preserve the spatial layout and sequential nature of the video features. A two-step clustering pipeline on these embedded feature representations then allows us to enforce temporal consistency within, as well as across videos. Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes. Our evaluation on three challenging datasets shows the impact of each component and, furthermore, demonstrates our state-of-The-Art unsupervised action segmentation results. © 2023 Copyright for this paper by its authors.

关键词： Large dataset

来源：评论

学校读者我要写书评

暂无评论

LaFTer: label-free tuning of zero-shot classifier using language and unlabeled image collections 23

LaFTer: label-free tuning of zero-shot classifier using lang...

引用

Proceedings of the 37th International Conference on Neural Information Processing Systems

作者： M. Jehanzeb Mirza Leonid Karlinsky Wei Lin Mateusz Kozinski Horst Possegger Rogerio Feris Horst Bischof Institute of Computer Graphics and Vision TU Graz Austria and Christian Doppler Laboratory for Embedded Machine Learning MIT-IBM Watson AI Lab Institute of Computer Graphics and Vision TU Graz Austria

Recently, large-scale pre-trained vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zero-shot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine-tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to 11.7% (3.8% on average) in the label-free setting. Moreover, despite our approach being label-free, we observe 1.3% average gains over leading few-shot prompting baselines that do use 5-shot supervision.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Multicenter aortic vessel tree extraction using deep learning

Multicenter aortic vessel tree extraction using deep learnin...

引用

Medical Imaging 2023: Biomedical Applications in Molecular, Structural, and Functional Imaging

作者： Scharinger, Bernhard Pepe, Antonio Jin, Yuan Gsaxner, Christina Li, Jianning Egger, Jan Graz University of Technology Institute for Computer Graphics and Vision Inffeldgasse 16c/II GrazA-8010 Austria Computer Algorithms for Medicine Laboratory Graz Austria Girardetstrasse 2 Essen45131 Germany Research Center for Connected Healthcare Big Data Zhejiang Lab Zhejiang Hangzhou311121 China

ISBN: (纸本)9781510660410

The aorta is the largest vessel of the human body and its pathological degenerations, such as dissections and aneurysms, can be life threatening. An automatic and fast segmentation of the aorta can therefore be a helpful tool to quickly identify an abnormal anatomy. The segmentation of the aortic vessel tree (AVT) typically requires extensive manual labor, but, in recent years, progress in deep learning techniques made the automation of this process viable. For this purpose, we tested different deep learning networks to segment the aortic vessel tree from computed tomography angiography (CTA) scans with a deep neural network consisting of an encoder-decoder architecture with skip connections and an optional self-attention block. The networks were trained on a dataset of 56 CTA scans from three different sources and resulted in Dice score similarities between 0.043-0.897. Generally, the classical U-Nets performed better than the ones containing a self-attention block, indicating that they might diminish performance for AVT segmentation. The quality of the resulting segmentations was highly dependent on the CTA image quality, especially on the contrast between the aorta and the surrounding tissues. However, the trained deep neural network can segment CTA scans well with limited computational resources and training data. © COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

CUDA and Applications to Task-based Programming

CUDA and Applications to Task-based Programming

引用

43rd Annual Conference on European Association for computer graphics, EUROgraphics 2022

作者： Kerbl, Bernhard Kenzel, Michael Winter, Martin Steinberger, Markus TU Wien Institute of Visual Computing and Human-Centered Technology Austria Saarland University Computer Graphics Lab Germany Intelligent Cloud Rendering Laboratory Huawei Technologies Austria Graz University of Technology Institute of Computer Graphics and Vision Austria

ISBN: (纸本)9783038681724

Since its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general-purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular. For instance, we will provide detailed examples of use cases for independent thread scheduling, cooperative groups, and the CUDA standard library, libcu++, which are certain to become an integral part of clean coding for CUDA in the near future. © 2022 The Author(s)

关键词： Regulatory compliance

来源：评论

学校读者我要写书评

暂无评论

ActMAD: Activation Matching to Align Distributions for Test-Time-Training

ActMAD: Activation Matching to Align Distributions for Test-...

引用

Conference on computer vision and Pattern Recognition (CVPR)

作者： M. Jehanzeb Mirza Pol Jané Soneira Wei Lin Mateusz Kozinski Horst Possegger Horst Bischof Institute for Computer Graphics and Vision TU Graz Austria Christian Doppler Laboratory for Embedded Machine Learning Institute of Control Systems KIT Germany Christian Doppler Laboratory for Semantic 3D Computer Vision

Test-Time-Training (TTT) is an approach to cope with out-of-distribution (OOD) data by adapting a trained model to distribution shifts occurring at test-time. We propose to perform this adaptation via Activation Matching (ActMAD): We analyze activations of the model and align activation statistics of the OOD test data to those of the training data. In contrast to existing methods, which model the distribution of entire channels in the ultimate layer of the feature extractor, we model the distribution of each feature in multiple layers across the network. This results in a more fine-grained supervision and makes ActMAD attain state of the art performance on CIFAR-100C and Imagenet-C. ActMAD is also architecture-and task-agnostic, which lets us go beyond image classification, and score 15.4% improvement over previous approaches when evaluating a KITTI-trained object detector on KITTI-Fog. Our experiments highlight that ActMAD can be applied to online adaptation in realistic scenarios, requiring little data to attain its full performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Multi-Armed Bandits Learning-Based Approach to Service Caching in Edge Computing Environment 30th

A Multi-Armed Bandits Learning-Based Approach to Service Ca...

引用

30th International Conference on Web Services, ICWS 2023

作者： Li, Jinpeng Zhao, Jiale Chen, Peng Xia, Yunni Li, Fan Li, Yin Zeng, Feng Liu, Hui School of Computer Chongqing University Chongqing400030 China School of Computer and Software Engineering Xihua University Chengdu610039 China Key Laboratory of Fundamental Synthetic Vision Graphics and Image Science for National Defense Sichuan University Chengdu610065 China Guangzhou Institute of Software Application Technology Guangzhou510990 China Limited Shenzhen518129 China School of Computer Science and Technology Beijing Institute of Technology Beijing100083 China

ISBN: (纸本)9783031448355

Mobile edge computing (MEC) is a newly emerging concept that provides significant local computing power and reduces end-to-end latency. In MEC environments, caching frequently accessed services on edge servers effectively reduces latency and improves system responsiveness. An ongoing research topic in such a cachable MEC context is to design novel algorithms for yielding high-quality caching decision that guarantee high user-perceived quality-of-service (QoS) and high system responsiveness of delivery of cached content with the difference of caching capacities of edge servers and diversified content popularity appropriately addressed. In this article, we propose a multi-armed bandits learning-based method busing a Thompson sampling for generating caching decisions. We introduce a genetic multi-armed bandits algorithm (GMAB), which synthesizes the genetic algorithm (GA) and multi-armed bandits (MAB), for optimizing caching effectiveness with timing and space constraints. The experiment results show that GMAB outperforms traditional methods in terms of multiple aspects. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

A Visual Surveillance System to Observe Realistic Road User Behavior for Improved Pedestrian and Cyclist Safety at Crossroads

A Visual Surveillance System to Observe Realistic Road User ...

引用

IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)

作者： Nadezda Kirillova Horst Possegger Horst Bischof Institute of Computer Graphics and Vision Graz University of Technology Austria Christian Doppler Laboratory for Semantic 3D Computer Vision

ISBN: (数字)9781665463829

ISBN: (纸本)9781665463836

Pedestrians and cyclists suffer the most serious injuries in traffic accidents. Existing Pedestrian Protection Systems and Road Safety Systems rely on an ideal model of pedestrian behavior and do not consider that people tend to take shortcuts, appear at unexpected places or can be distracted on the road, for example, by using a smartphone or wearing headphones. Collecting and analyzing realistic road user behavior is a crucial component to improve pedestrian and cyclist safety. However, such real-world data is still missing. To address this, we propose a visual surveillance system with two perpendicular partially overlapping fields of view, combined with a fully automated deep learning-based pipeline to process and collect video observations, detect and extract road user trajectories in real-world coordinates and estimate human attributes, such as age, gender, smartphone usage, etc. We demonstrate our prototype by deploying it in two locations in a European city.

关键词： Visualization Smart cities Surveillance Prototypes Streaming media Road safety Behavioral sciences

来源：评论

学校读者我要写书评

暂无评论

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

arXiv

引用

arXiv 2023年

Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on large-scale datasets. To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences. In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning, to preserve the spatial layout and sequential nature of the video features. A two-step clustering pipeline on these embedded feature representations then allows us to enforce temporal consistency within, as well as across videos. Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes. Our evaluation on three challenging datasets shows the impact of each component and, furthermore, demonstrates our state-of-the-art unsupervised action segmentation results. Copyright © 2023, The Authors. All rights reserved.

关键词： Large dataset

来源：评论

学校读者我要写书评

暂无评论

FAST3D: Flow-Aware Self-Training for 3D Object Detectors 32

FAST3D: Flow-Aware Self-Training for 3D Object Detectors

引用

32nd British Machine vision Conference, BMVC 2021

作者： Fruhwirth-Reisinger, Christian Opitz, Michael Possegger, Horst Bischof, Horst Christian Doppler Laboratory for Embedded Machine Learning Austria Institute of Computer Graphics and Vision Graz University of Technology Austria Amazon

In the field of autonomous driving, self-training is widely applied to mitigate distribution shifts in LiDAR-based 3D object detectors. This eliminates the need for expensive, high-quality labels whenever the environment changes (e.g. geographic location, sensor setup, weather condition). State-of-the-art self-training approaches, however, mostly ignore the temporal nature of autonomous driving data. To address this issue, we propose a flow-aware self-training method that enables unsupervised domain adaptation for 3D object detectors on continuous LiDAR point clouds. In order to get reliable pseudo-labels, we leverage scene flow to propagate detections through time. In particular, we introduce a flow-based multi-target tracker that exploits flow consistency to filter and refine resulting tracks. The emerged precise pseudo-labels then serve as a basis for model re-training. Starting with a pre-trained KITTI model, we conduct experiments on the challenging Waymo Open Dataset to demonstrate the effectiveness of our approach. Without any prior target domain knowledge, our results show a significant improvement over the state-of-the-art. © 2021. The copyright of this document resides with its authors.

关键词： Autonomous vehicles

来源：评论

学校读者我要写书评

暂无评论

Video Test-Time Adaptation for Action Recognition

Video Test-Time Adaptation for Action Recognition

引用

Conference on computer vision and Pattern Recognition (CVPR)

作者： Wei Lin Muhammad Jehanzeb Mirza Mateusz Kozinski Horst Possegger Hilde Kuehne Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology Austria Christian Doppler Laboratory for Semantic 3D Computer Vision Christian Doppler Laboratory for Embedded Machine Learning Goethe University Frankfurt Germany MIT-IBM Watson AI Lab

Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://***/wlin-at/ViTTA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：