检索结果-内蒙古大学图书馆

22nd Scandinavian Conference on Image Analysis, SCIA 2023

作者： Cevikalp, Hakan Saribas, Hasan Machine Learning and Computer Vision Laboratory Eskisehir Osmangazi Univerity Eskisehir Turkey Huawei Turkey R &D Center Istanbul Turkey

ISBN: (纸本)9783031314377

The classification loss functions used in deep neural network classifiers can be grouped into two categories based on maximizing the margin in either Euclidean or angular spaces. Euclidean distances between sample vectors are used during classification for the methods maximizing the margin in Euclidean spaces whereas the Cosine similarity distance is used during the testing stage for the methods maximizing margin in the angular spaces. This paper introduces a novel classification loss that maximizes the margin in both the Euclidean and angular spaces at the same time. This way, the Euclidean and Cosine distances will produce similar and consistent results and complement each other, which will in turn improve the accuracies. The proposed loss function enforces the samples of classes to cluster around the centers that represent them. The centers approximating classes are chosen from the boundary of a hypersphere, and the pairwise distances between class centers are always equivalent. This restriction corresponds to choosing centers from the vertices of a regular simplex. There is not any hyperparameter that must be set by the user in the proposed loss function, therefore the use of the proposed method is extremely easy for classical classification problems. Moreover, since the class samples are compactly clustered around their corresponding means, the proposed classifier is also very suitable for open set recognition problems where test samples can come from the unknown classes that are not seen in the training phase. Experimental studies show that the proposed method achieves the state-of-the-art accuracies on open set recognition despite its simplicity. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection

arXiv

引用

arXiv 2024年

作者： Fruhwirth-Reisinger, Christian Lin, Wei Malić, Dušan Bischof, Horst Possegger, Horst Christian Doppler Laboratory for Embedded Machine Learning Austria Institute of Computer Graphics and Vision Graz University of Technology Austria Institute for Machine Learning Johannes Kepler University Linz Austria

Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated data, which is expensive to obtain and restricted to predefined object categories. To mitigate manual labeling efforts, recent unsupervised object detection approaches generate class-agnostic pseudo-labels for moving objects, subsequently serving as supervision signal to bootstrap a detector. Despite promising results, these approaches do not provide class labels or generalize well to static objects. Furthermore, they are mostly restricted to data containing multiple drives from the same scene or images from a precisely calibrated and synchronized camera setup. To overcome these limitations, we propose a vision-language-guided unsupervised 3D detection approach that operates exclusively on LiDAR point clouds. We transfer CLIP knowledge to classify point clusters of static and moving objects, which we discover by exploiting the inherent spatio-temporal information of LiDAR point clouds for clustering, tracking, as well as box and label refinement. Our approach outperforms state-of-the-art unsupervised 3D object detectors on the Waymo Open Dataset (+23 AP3D) and Argoverse 2 (+7.9 AP3D) and provides class labels not solely based on object size assumptions, marking a significant advancement in the field. Code will be available at https://***/chreisinger/ViLGOD. © 2024, CC BY-NC-SA.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

End-Edge-Cloud Collaborative Offloading of Splittable Tasks in Internet of Vehicles: A Multi-Agent Reinforcement learning Approach

SSRN

引用

SSRN 2024年

作者： Fan, Weiwei Gao, Zhenguo Zhang, Jiahui Jiang, Yang College of Computer Science and Technology Huaqiao University Fujian Xiamen China Key Laboratory of Computer Vision Machine Learning of Fujian Province University Fujian Xiamen China

The rapid development of intelligent transportation and the exponential growth of data traffic drive the emerging of more computation-intensive latency-critical tasks in vehicles, bring challenges to the task offloading research in the Internet of Vehicles (IoV), which is for providing vehicles with ultralow latency task processing services via offloading tasks to Edge Servers (ESs) and Cloud Servers (CSs). Focusing on offloading splittable tasks, an end-edge-cloud cooperative splittable task offloading framework is presented for IoV where vehicles with idle resources are regarded as temporary ESs for complementing the computing services of ESs and CSs. Then, we propose a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to minimize task completion latency by making comprehensive decisions jointly involving task splition, communication and computation resource allocation. Furthermore, we propose a Group-based Random Updating Strategy (GRUS) for multi-agent deep reinforcement model training to promote training efficiency. To incentivize High-performance Vehicles (HVs) to offer edge computing services via sharing their spare computation resources, we construct a Multi-Leader Multi-Follower Stackelberg (MLMFS) incentive game model where CSs act as leaders and HVs act as followers. We prove the existence of Stackelberg Equilibrium (SE) point, and propose an Optimal Dynamic Response (ODR) algorithm to drive the CSs' resource renting price decisions and the HVs' resource sharing amount decisions to arrive at the SE point via multi-round negotiations. Simulation results demonstrate the superiority of the proposed algorithm over some selected benchmark algorithms. © 2024, The Authors. All rights reserved.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Diffusion Posterior Proximal Sampling for Image Restoration 24

Diffusion Posterior Proximal Sampling for Image Restoration

引用

32nd ACM International Conference on Multimedia, MM 2024

作者： Wu, Hongjie He, Linchao Zhang, Mingqin Chen, Dongdong Luo, Kunming Luo, Mengting Zhou, Ji-Zhe Chen, Hu Lv, Jiancheng College of Computer Science Sichuan University Chengdu China National Key Laboratory of Fundamental Science on Synthetic Vision Sichuan University Chengdu China Heriot-Watt University Edinburgh United Kingdom Hong Kong University of Science and Technology Hong Kong Engineering Research Center of Machine Learning and Industry Intelligence Ministry of Education China College of Computer Science Sichuan University China

ISBN: (纸本)9798400706868

Diffusion models have demonstrated remarkable efficacy in generating high-quality samples. Existing diffusion-based image restoration algorithms exploit pre-trained diffusion models to leverage data priors, yet they still preserve elements inherited from the unconditional generation paradigm. These strategies initiate the denoising process with pure white noise and incorporate random noise at each generative step, leading to over-smoothed results. In this paper, we present a refined paradigm for diffusion-based image restoration. Specifically, we opt for a sample consistent with the measurement identity at each generative step, exploiting the sampling selection as an avenue for output stability and enhancement. The number of candidate samples used for selection is adaptively determined based on the signal-to-noise ratio of the timestep. Additionally, we start the restoration process with an initialization combined with the measurement signal, providing supplementary information to better align the generative process. Extensive experimental results and analyses validate that our proposed method significantly enhances image restoration performance while consuming negligible additional computational resources. © 2024 ACM.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

ActMAD: Activation Matching to Align Distributions for Test-Time-Training

ActMAD: Activation Matching to Align Distributions for Test-...

引用

Conference on computer vision and Pattern Recognition (CVPR)

作者： M. Jehanzeb Mirza Pol Jané Soneira Wei Lin Mateusz Kozinski Horst Possegger Horst Bischof Institute for Computer Graphics and Vision TU Graz Austria Christian Doppler Laboratory for Embedded Machine Learning Institute of Control Systems KIT Germany Christian Doppler Laboratory for Semantic 3D Computer Vision

Test-Time-Training (TTT) is an approach to cope with out-of-distribution (OOD) data by adapting a trained model to distribution shifts occurring at test-time. We propose to perform this adaptation via Activation Matching (ActMAD): We analyze activations of the model and align activation statistics of the OOD test data to those of the training data. In contrast to existing methods, which model the distribution of entire channels in the ultimate layer of the feature extractor, we model the distribution of each feature in multiple layers across the network. This results in a more fine-grained supervision and makes ActMAD attain state of the art performance on CIFAR-100C and Imagenet-C. ActMAD is also architecture-and task-agnostic, which lets us go beyond image classification, and score 15.4% improvement over previous approaches when evaluating a KITTI-trained object detector on KITTI-Fog. Our experiments highlight that ActMAD can be applied to online adaptation in realistic scenarios, requiring little data to attain its full performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

FAST3D: Flow-Aware Self-Training for 3D Object Detectors 32

FAST3D: Flow-Aware Self-Training for 3D Object Detectors

引用

32nd British machine vision Conference, BMVC 2021

作者： Fruhwirth-Reisinger, Christian Opitz, Michael Possegger, Horst Bischof, Horst Christian Doppler Laboratory for Embedded Machine Learning Austria Institute of Computer Graphics and Vision Graz University of Technology Austria Amazon

In the field of autonomous driving, self-training is widely applied to mitigate distribution shifts in LiDAR-based 3D object detectors. This eliminates the need for expensive, high-quality labels whenever the environment changes (e.g. geographic location, sensor setup, weather condition). State-of-the-art self-training approaches, however, mostly ignore the temporal nature of autonomous driving data. To address this issue, we propose a flow-aware self-training method that enables unsupervised domain adaptation for 3D object detectors on continuous LiDAR point clouds. In order to get reliable pseudo-labels, we leverage scene flow to propagate detections through time. In particular, we introduce a flow-based multi-target tracker that exploits flow consistency to filter and refine resulting tracks. The emerged precise pseudo-labels then serve as a basis for model re-training. Starting with a pre-trained KITTI model, we conduct experiments on the challenging Waymo Open Dataset to demonstrate the effectiveness of our approach. Without any prior target domain knowledge, our results show a significant improvement over the state-of-the-art. © 2021. The copyright of this document resides with its authors.

关键词： Autonomous vehicles

来源：评论

学校读者我要写书评

暂无评论

LaFTer: label-free tuning of zero-shot classifier using language and unlabeled image collections 23

LaFTer: label-free tuning of zero-shot classifier using lang...

引用

Proceedings of the 37th International Conference on Neural Information Processing Systems

作者： M. Jehanzeb Mirza Leonid Karlinsky Wei Lin Mateusz Kozinski Horst Possegger Rogerio Feris Horst Bischof Institute of Computer Graphics and Vision TU Graz Austria and Christian Doppler Laboratory for Embedded Machine Learning MIT-IBM Watson AI Lab Institute of Computer Graphics and Vision TU Graz Austria

Recently, large-scale pre-trained vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification enabling open-vocabulary recognition of potentially unlimited set of categories defined as simple language prompts. However, despite these great advances, the performance of these zero-shot classifiers still falls short of the results of dedicated (closed category set) classifiers trained with supervised fine-tuning. In this paper we show, for the first time, how to reduce this gap without any labels and without any paired VL data, using an unlabeled image collection and a set of texts auto-generated using a Large Language Model (LLM) describing the categories of interest and effectively substituting labeled visual instances of those categories. Using our label-free approach, we are able to attain significant performance improvements over the zero-shot performance of the base VL model and other contemporary methods and baselines on a wide variety of datasets, demonstrating absolute improvement of up to 11.7% (3.8% on average) in the label-free setting. Moreover, despite our approach being label-free, we observe 1.3% average gains over leading few-shot prompting baselines that do use 5-shot supervision.

关键词：

来源：评论

学校读者我要写书评

暂无评论

EARP: Integration with Entity Attribute and Relation Path for Event Knowledge Graph Representation learning

EARP: Integration with Entity Attribute and Relation Path fo...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Ze Xu Hao Zhou Ting He Huazhen Wang College of Computer Science and Technology Huaqiao University Xiamen China Key Laboratory of Computer Vision and Machine Learning Huaqiao University Fujian Province University Xiamen China

Event knowledge graph (EKG) as a special case of knowledge graph (KG) can realize the goal of event prediction, and has been proved useful in medical diagnosis and intelligent recommendation. To successfully build an EKG, knowledge representation learning is often required to compute the semantic links of entities and relationships in a low-dimensional space and solve the data sparsity issue in knowledge acquisition, fusion and reasoning. This paper proposes a new EKG representation learning model featuring the integration of event entity attributes and relation paths. By utilizing the knowledge of entity attribute, which contains entity type and entity description, and the knowledge about relation paths, the entity initial vector is obtained by multiplying entity semantic vector, entity description representation vector and entity type representation vector, and the representation of relation path is obtained according to the relation between event pairs, a translation-based model framework is used to integrate and train all vectors to obtain the entity learning vector and the relation learning vector. our method can generate more expressive learning representations, and consequently, enhance the inference performance of EKG. Experiments on publicly available real-world EKG datasets show that our method achieves better performance than the state-of-the-art models on two typical tasks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

3d Human Pose Estimation from Video Via Multi-Scale Multi-Level Spatial Temporal Features

SSRN

引用

SSRN 2023年

作者： Fan, Liling Jiang, Kunliang Zhou, Weixue Gao, Zhenguo Luo, Yanmin The College of Computer Science and Technology in Huaqiao University Fujian Xiamen China Key Laboratory of Computer Vision Machine Learning of Fujian Province University Fujian Xiamen China

In this paper, a novel framework for 2D-to-3D human pose estimation from video is proposed by exploiting multi-scale multi-level spatial temporal features. To extract and exploit the rich features, the framework consists of three branch networks: a temporal feature core network for extracting temporal coherence among frames, a multi-scale feature branch network for extracting multi-scale features using multiple receptive fields with various sizes, and a multi-level feature branch network for extracting multilevel features from layers at different depths. In the framework, the features are consolidated to capture various spatial and temporal relationships associated with the human body, and are exploited to resolve depth ambiguity and self-occlusions, leading to more accurate estimations. Extensive experiments on Human3.6M and HumanEva-I show that our framework achieves competitive performance on 2D-to-3D human pose estimation in video. Code is available at: https://***/fll123/3Dhumanpose. © 2023, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Determining mice sex from chest X-rays using deep learning 2

Determining mice sex from chest X-rays using deep learning

引用

2nd IEEE International Conference on Cyberspace, CYBER NIGERIA 2020

作者： Ajiboye, Abiodun Babalola, Kola Institute of Computer Vision and Machine Learning Lagos Nigeria European Molecular Biology Laboratory European Bioinformatics Institute Cambridgshire United Kingdom

ISBN: (纸本)9781665444095

This Following on from work by Babalola et al. It is shown that the sex of mice can be determined from x-ray images of the chest region alone using convolutional neural networks. The anatomical differences that may be responsible for this is further sinvestigated, as it may be useful in determining phenotype changes caused by knocking out genes - hence in understanding genotype-phenotype effects. Our results indicate that the cervical vertebrae may play an important role in the ability of our convolutional neural network to classify the sex of mice correctly using only x-rays of the chest region. © 2021 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：