Person re-identification (re-ID) aims to tackle the problem of matching identities across non-overlapping cameras. Supervised approaches require identity information that may be difficult to obtain and are inherently ...
详细信息
Due to the recent rapid development in the 5 G technology,the usage of sensor networks especially wireless sensor networks(WSNs)has boosted advances in the augmented reality(AR),supporting decision making in AR *** de...
详细信息
Due to the recent rapid development in the 5 G technology,the usage of sensor networks especially wireless sensor networks(WSNs)has boosted advances in the augmented reality(AR),supporting decision making in AR *** decision-making needs support and consideration of artificial intelligence(AI)techniques capable of adapting to changes in AR environments for creating systems that evolve autonomously over ***,it is important to apply new information fusion techniques that allow for the processing of information at low and high levels to improve the accuracy of such systems.
Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative pr...
Patch-based training for 360-degree images allows to significantly reduce the complexity compared to multichannel models while maintaining good performances. Differently from multichannel models where multi neural net...
详细信息
The task of 3D human pose and shape estimation involves the accurate prediction of 3D joint coordinates using a single image or a video sequence and it is crucial in several computer vision fields, such as sign langua...
The task of 3D human pose and shape estimation involves the accurate prediction of 3D joint coordinates using a single image or a video sequence and it is crucial in several computer vision fields, such as sign language recognition, human-computer interaction and autonomous vehicles. Existing methodologies typically rely on modeling global and local temporal relationships among image frames without paying much attention to the interaction between these relationships and the modeling of the input space in other manifolds that possess important statistical and geometrical properties. This work proposes a novel multi-stage 3D pose estimation method that seamlessly combines global and local temporal modeling through self-attention mechanisms operating on multiple manifolds, thus leveraging the ability of different manifolds to model complementary features of the input space. Through the extraction of global and local attention maps and the fusion of these maps using a novel cross-attention mechanism, the proposed method aims to enhance the contextual understanding and improve the capacity of the model to capture the intricate human motion dynamics present in a video sequence. The effectiveness of the proposed method in achieving precise 3D pose and shape across successive frames is confirmed by the experimental results on two challenging datasets, namely 3DPW and MPI-INF-3DHP.
Federated learning has recently been proposed as a solution to the problem of using private or sensitive data for training a central deep model, without exchanging the local data. In federated learning, local models a...
Federated learning has recently been proposed as a solution to the problem of using private or sensitive data for training a central deep model, without exchanging the local data. In federated learning, local models are trained on the client side using the available data, while a server is responsible for aggregating the weights of these models into a global model. However, the traditional weight averaging approach does not take into consideration the importance of the different weights for the performance of a model. To this end, this work proposes a novel federated learning weight aggregation method that estimates the statistical distance of each client’s parameters from the Gaussianity, and weighs the contribution of each client to the global model accordingly so that the most significant information is retained and enhanced. To create an accurate global model, a complex weighted averaging of the parameters of clients’ models at the layer level is performed, considering as low quality the parameters following the Gaussian distribution. The proposed method can be employed to both convolutional and linear layers and it is based on the notion that parameters following a Gaussian distribution do not significantly affect the output of a model. Experiments with different network architectures and a comparison with a plethora of state-of-the-art approaches on three well-known image classification datasets demonstrate the superiority of the proposed method for federated learning weight aggregation.
The elderly need to communicate with their loved ones but they also need to get engaged in activities that require mental awareness as a means of preventing negative side-effects related to brain inactivity. This area...
详细信息
Two-dimensional (2D) freehand ultrasonography is a widely used medical imaging modality, particularly in obstetrics and gynaecology. However, it only captures 2D cross-sectional views of inherently 3D anatomies, losin...
详细信息
We introduce a new approach for reconstruction and novel view synthesis of unbounded real-world scenes. In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxie...
详细信息
An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subset...
详细信息
暂无评论