检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Seth, Siddharth Sonth, Akash Chakraborty, Anirban Visual Computing Lab Department of Computational and Data Sciences Indian Institute of Science Bangalore India

Person re-identification (re-ID) aims to tackle the problem of matching identities across non-overlapping cameras. Supervised approaches require identity information that may be difficult to obtain and are inherently biased towards the dataset they are trained on, making them unscalable across domains. To overcome these challenges, we propose an unsupervised approach to the person re-ID setup. Having zero knowledge of true labels, our proposed method enhances the discriminating ability of the learned features via a novel two-stage training strategy. The first stage involves training a deep network on an expertly designed pose-transformed dataset obtained by generating multiple perturbations for each original image in the pose space. Next, the network learns to map similar features closer in the feature space using the proposed discriminative clustering algorithm. We introduce a novel radial distance loss, that attends to the fundamental aspects of feature learning - compact clusters with low intra-cluster and high inter-cluster variation. Extensive experiments on several large-scale re-ID datasets demonstrate the superiority of our method compared to state-of-the-art approaches. © 2024, CC BY.

关键词： Clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Advances in wireless sensor networks under AI-5 G for augmented reality

引用

Virtual Reality & Intelligent Hardware 2022年第3期4卷 I0001-I0003页

作者： Muhammad KHAN Visual Analytics for Knowledge Laboratory(VIS 2 KNOW Lab) Department of Applied Artificial IntelligenceSchool of ConvergenceCollege of Computing and InformaticsSungkyunkwan UniversitySeoul 03063Republic of Korea

Due to the recent rapid development in the 5 G technology,the usage of sensor networks especially wireless sensor networks(WSNs)has boosted advances in the augmented reality(AR),supporting decision making in AR *** decision-making needs support and consideration of artificial intelligence(AI)techniques capable of adapting to changes in AR environments for creating systems that evolve autonomously over ***,it is important to apply new information fusion techniques that allow for the processing of information at low and high levels to improve the accuracy of such systems.

关键词： networks evolve sensor

来源：评论

学校读者我要写书评

暂无评论

GenRec: Unifying Video Generation and Recognition with Diffusion Models 38

GenRec: Unifying Video Generation and Recognition with Diffu...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Weng, Zejia Yang, Xitong Xing, Zhen Wu, Zuxuan Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China Department of Computer Science University of Maryland United States

Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. Building upon Stable Video Diffusion, we introduce GenRec, the first unified framework trained with a random-frame conditioning process so as to learn generalized spatial-temporal representations. The resulting framework can naturally supports generation and recognition, and more importantly is robust even when visual inputs contain limited information. Extensive experiments demonstrate the efficacy of GenRec for both recognition and generation. In particular, GenRec achieves competitive recognition performance, offering 75.8% and 87.2% accuracy on SSV2 and K400, respectively. GenRec also performs the best on class-conditioned image-to-video generation, achieving 46.5 and 49.3 FVD scores on SSV2 and EK-100 datasets. Furthermore, GenRec demonstrates extraordinary robustness in scenarios that only limited frames can be observed. Code will be available at https://***/wengzejia1/GenRec. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Patch-based CNN Model for 360 Image Quality Assessment with Adaptive Pooling Strategies 19

Patch-based CNN Model for 360 Image Quality Assessment with ...

引用

IS and T International Symposium on Electronic Imaging: 19th Image Quality and System Performance, IQSP 2022

作者： Sendjasni, Abderrezzaq Larabi, Mohamed-Chaker Cheikh, Faouzi Alaya CNRS Xlim UMR 7252 Université de Poitiers France NTNU Norwegian Colour and Visual Computing Lab Gjøvik Norway

Patch-based training for 360-degree images allows to significantly reduce the complexity compared to multichannel models while maintaining good performances. Differently from multichannel models where multi neural networks are trained in parallel to predict the score of the whole 360-degree image, a pooling stage is required to map local qualities to the global one. This step is often neglected by using a simple arithmetic mean, which does not account for (i) the non-uniformity distribution of quality and (ii) the variability among local qualities. In this paper, we analyze several pooling strategies, including basic statistic methods and adaptive pooling ones. Additionally, we propose a pooling strategy based on scene exploration behavior relying on visual scan-path. The performance analysis showed the benefit of using adaptive pooling over arithmetic mean, as well as the incorporation of perceptual properties during the pooling stage. Besides, the comparison with state-of-the-art multichannel models asserts the effectiveness of patch-based training compared to multichannel models. © 2022, Society for Imaging Science and Technology.

关键词： Image quality

来源：评论

学校读者我要写书评

暂无评论

3D Pose Estimation Using a Global and Local Cross-Attention Mechanism

3D Pose Estimation Using a Global and Local Cross-Attention ...

引用

IEEE International Workshop on Imaging Systems and Techniques (IST)

作者： Theocharis Chatzis Dimitrios Konstantinidis Kosmas Dimitropoulos Petros Daras Visual Computing Lab Information Technologies Institute Centre for Research and Technology Hellas

The task of 3D human pose and shape estimation involves the accurate prediction of 3D joint coordinates using a single image or a video sequence and it is crucial in several computer vision fields, such as sign language recognition, human-computer interaction and autonomous vehicles. Existing methodologies typically rely on modeling global and local temporal relationships among image frames without paying much attention to the interaction between these relationships and the modeling of the input space in other manifolds that possess important statistical and geometrical properties. This work proposes a novel multi-stage 3D pose estimation method that seamlessly combines global and local temporal modeling through self-attention mechanisms operating on multiple manifolds, thus leveraging the ability of different manifolds to model complementary features of the input space. Through the extraction of global and local attention maps and the fusion of these maps using a novel cross-attention mechanism, the proposed method aims to enhance the contextual understanding and improve the capacity of the model to capture the intricate human motion dynamics present in a video sequence. The effectiveness of the proposed method in achieving precise 3D pose and shape across successive frames is confirmed by the experimental results on two challenging datasets, namely 3DPW and MPI-INF-3DHP.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Federated Learning Aggregation based on Weight Distribution Analysis

Federated Learning Aggregation based on Weight Distribution ...

引用

IEEE International Workshop on Imaging Systems and Techniques (IST)

作者： Christos Chatzikonstantinou Dimitrios Konstantinidis Kosmas Dimitropoulos Petros Daras Visual Computing Lab Information Technologies Institute Centre for Research and Technology Hellas

Federated learning has recently been proposed as a solution to the problem of using private or sensitive data for training a central deep model, without exchanging the local data. In federated learning, local models are trained on the client side using the available data, while a server is responsible for aggregating the weights of these models into a global model. However, the traditional weight averaging approach does not take into consideration the importance of the different weights for the performance of a model. To this end, this work proposes a novel federated learning weight aggregation method that estimates the statistical distance of each client’s parameters from the Gaussianity, and weighs the contribution of each client to the global model accordingly so that the most significant information is retained and enhanced. To create an accurate global model, a complex weighted averaging of the parameters of clients’ models at the layer level is performed, considering as low quality the parameters following the Gaussian distribution. The proposed method can be employed to both convolutional and linear layers and it is based on the notion that parameters following a Gaussian distribution do not significantly affect the output of a model. Experiments with different network architectures and a comparison with a plethora of state-of-the-art approaches on three well-known image classification datasets demonstrate the superiority of the proposed method for federated learning weight aggregation.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Development and Evaluation of a Prototype VR Application for the Elderly, that can Help to Prevent Effects Related to Social Isolation 2

Development and Evaluation of a Prototype VR Application for...

引用

2nd International Conference on Interactive Media, Smart Systems and Emerging Technologies, IMET 2022

作者： Anastasiadou, Zoe Lanitis, Andreas Cyprus University of Technology Visual Media Computing Lab Department of Multimedia and Graphic Arts Limassol Cyprus Cyens Centre of Excellence Cyprus

ISBN: (数字)9781665470162

ISBN: (纸本)9781665470162

The elderly need to communicate with their loved ones but they also need to get engaged in activities that require mental awareness as a means of preventing negative side-effects related to brain inactivity. This area of research is becoming increasingly important during periods of social isolation caused either by external factors, such as a pandemic, or by factors associated with reduced mobility in the elderly. In this paper, a prototype Virtual Reality application that will allow elderly users to deal with the problems of social isolation, while providing an entertaining brain-triggering activity, is presented. While this is work in progress, the results of an initial user evaluation provides insights related to the strengths and limitations of the prototype application, allowing the derivations of conclusions that can guide further development of the final application. © 2022 IEEE.

关键词： Brain

来源：评论

学校读者我要写书评

暂无评论

Rapidvol: Rapid Reconstruction of 3D Ultrasound Volumes from Sensorless 2D Scans 22

Rapidvol: Rapid Reconstruction of 3D Ultrasound Volumes from...

引用

22nd IEEE International Symposium on Biomedical Imaging, ISBI 2025

作者： Eid, Mark C. Yeung, Pak-Hei Wyburd, Madeleine K. Henriques, João F. Namburete, Ana I.L. University of Oxford Visual Geometry Group United Kingdom University of Oxford Oxford Machine Learning in Neuroimaging Lab United Kingdom College of Computing and Data Science Nanyang Technological University Singapore

ISBN: (纸本)9798331520526

Two-dimensional (2D) freehand ultrasonography is a widely used medical imaging modality, particularly in obstetrics and gynaecology. However, it only captures 2D cross-sectional views of inherently 3D anatomies, losing valuable contextual information. As an alternative to costly 3D ultrasound (US) scanners, 3D volumes can be artificially reconstructed from 2D scans, but this is usually prohibitively slow. Hence, we propose RapidVol: a neural representation framework to speed up slice-to-volume US reconstruction. We use tensor-rank decomposition to decompose the typical 3D volume into tri-planes, which are stored alongside a small neural network. With a set of 2D US scans and their estimated 3D orientation, RapidVol can achieve complete 3D reconstruction. To evaluate our method, we form reconstructions from real fetal brain scans, and then request novel cross-sectional views. Compared to prior fully implicit (e.g. neural radiance field) approaches, our method is over 3x quicker, 46% more accurate, and more robust to errors in pose estimation. We also demonstrate that further speed-up is achievable by reconstructing from a structural prior rather than from random initialisation. © 2025 IEEE.

关键词： 3D Reconstruction NeRF Ultrasound

来源：评论

学校读者我要写书评

暂无评论

INPC: Implicit Neural Point Clouds for Radiance Field Rendering

arXiv

引用

arXiv 2024年

作者： Hahlbohm, Florian Franke, Linus Kappel, Moritz Castillo, Susana Eisemann, Martin Stamminger, Marc Magnor, Marcus Computer Graphics Lab TU Braunschweig Germany Visual Computing Erlangen FAU Erlangen-Nürnberg Germany

We introduce a new approach for reconstruction and novel view synthesis of unbounded real-world scenes. In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxies, we propose a hybrid scene representation, which implicitly encodes the geometry in a continuous octree-based probability field and view-dependent appearance in a multi-resolution hash grid. This allows for extraction of arbitrary explicit point clouds, which can be rendered using rasterization. In doing so, we combine the benefits of both worlds and retain favorable behavior during optimization: Our novel implicit point cloud representation and differentiable bilinear rasterizer enable fast rendering while preserving the fine geometric detail captured by volumetric neural fields. Furthermore, this representation does not depend on priors like structure-from-motion point clouds. Our method achieves state-of-the-art image quality on common benchmarks. Furthermore, we achieve fast inference at interactive frame rates, and can convert our trained model into a large, explicit point cloud to further enhance performance. Copyright © 2024, The Authors. All rights reserved.

关键词： Rasterization

来源：评论

学校读者我要写书评

暂无评论

ANNE: Adaptive Nearest Neighbors and Eigenvector-based Sample Selection for Robust Learning with Noisy labels

arXiv

引用

arXiv 2024年

作者： Cordeiro, Filipe R. Carneiro, Gustavo Visual Computing Lab Department of Computing Universidade Federal Rural de Pernambuco Brazil Centre for Vision Speech and Signal Processing University of Surrey United Kingdom

An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbor (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbors and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at https://***/filipe-research/anne. © 2024, CC BY-NC-ND.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：