Point cloud completion aims to infer complete point clouds based on partial 3D point cloud *** previous methods apply coarseto-fine strategy networks for generating complete point ***,such methods are not only relativ...
详细信息
Point cloud completion aims to infer complete point clouds based on partial 3D point cloud *** previous methods apply coarseto-fine strategy networks for generating complete point ***,such methods are not only relatively time-consuming but also cannot provide representative complete shape features based on partial *** this paper,a novel feature alignment fast point cloud completion network(FACNet)is proposed to directly and efficiently generate the detailed shapes of *** aligns high-dimensional feature distributions of both partial and complete point clouds to maintain global information about the complete *** its decoding process,the local features from the partial point cloud are incorporated along with the maintained global information to ensure complete and time-saving generation of the complete point *** results show that FACNet outperforms the state-of-theart on PCN,Completion3D,and MVP datasets,and achieves competitive performance on ShapeNet-55 and KITTI ***,FACNet and a simplified version,FACNet-slight,achieve a significant speedup of 3–10 times over other state-of-the-art methods.
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical...
详细信息
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical *** paper analyzes two fundamental failure cases in the baseline AD model and identifies key reasons that limit the recognition accuracy of existing approaches. Specifically, by Case-1, we found that the main reason detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel recover-then-discriminate(ReDi) framework for *** takes a self-generated feature map(e.g., histogram of oriented gradients) and a selected prompted image as explicit input information to address the identified in Case-1. Additionally, a feature-level discriminative network is introduced to amplify abnormal differences between the recovered and input representations. Extensive experiments on two widely used yet challenging AD datasets demonstrate that ReDi achieves state-of-the-art recognition accuracy.
In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation *** this paper,we aim to reduce the annotation cost of crowd datasets,a...
详细信息
In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation *** this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised *** this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the ***,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd *** addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density *** experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.
Multiplex collaboration networks facilitate intricate connections among individuals, enabling multidimensional collaborations across various domains and fostering synergistic knowledge exchange. This study focuses on ...
详细信息
Rank aggregation is the combination of several ranked lists from a set of candidates to achieve a better ranking by combining information from different sources. In feature selection problem, due to the heterogeneity ...
详细信息
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inher...
详细信息
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inherent biases and computational burdens, especially when used to relax the rank function, making them less effective and efficient in real-world scenarios. To address these challenges, our research focuses on generalized nonconvex rank regularization problems in robust matrix completion, low-rank representation, and robust matrix regression. We introduce innovative approaches for effective and efficient low-rank matrix learning, grounded in generalized nonconvex rank relaxations inspired by various substitutes for the ?0-norm relaxed functions. These relaxations allow us to more accurately capture low-rank structures. Our optimization strategy employs a nonconvex and multi-variable alternating direction method of multipliers, backed by rigorous theoretical analysis for complexity and *** algorithm iteratively updates blocks of variables, ensuring efficient convergence. Additionally, we incorporate the randomized singular value decomposition technique and/or other acceleration strategies to enhance the computational efficiency of our approach, particularly for large-scale constrained minimization problems. In conclusion, our experimental results across a variety of image vision-related application tasks unequivocally demonstrate the superiority of our proposed methodologies in terms of both efficacy and efficiency when compared to most other related learning methods.
In this work, VoteDroid a novel fine-tuned deep learning models-based ensemble voting classifier has been proposed for detecting malicious behavior in Android applications. To this end, we proposed adopting the random...
详细信息
Constructing an effective common latent embedding by aligning the latent spaces of cross-modal variational autoencoders(VAEs) is a popular strategy for generalized zero-shot learning(GZSL). However, due to the lac...
详细信息
Constructing an effective common latent embedding by aligning the latent spaces of cross-modal variational autoencoders(VAEs) is a popular strategy for generalized zero-shot learning(GZSL). However, due to the lack of fine-grained instance-wise annotations, existing VAE methods can easily suffer from the posterior collapse problem. In this paper, we propose an innovative asymmetric VAE network by aligning enhanced feature representation(AEFR) for GZSL. Distinguished from general VAE structures, we designed two asymmetric encoders for visual and semantic observations and one decoder for visual reconstruction. Specifically, we propose a simple yet effective gated attention mechanism(GAM) in the visual encoder for enhancing the information interaction between observations and latent variables, alleviating the possible posterior collapse problem effectively. In addition, we propose a novel distributional decoupling-based contrastive learning(D2-CL) to guide learning classification-relevant information while aligning the representations at the taxonomy level in the latent representation space. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. The source code is available at https://***/seeyourmind/AEFR.
In the realm of medical datasets, particularly when considering diabetes, the occurrence of data incompleteness is a prevalent issue. Unveiling valuable patterns through medical data analysis is crucial for early and ...
详细信息
Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and ***,achieving precise segmentation remains a challenge due to various factors,in...
详细信息
Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and ***,achieving precise segmentation remains a challenge due to various factors,including scattering noise,low contrast,and limited resolution in ultrasound *** existing segmentation models have made progress,they still suffer from several limitations,such as high error rates,low generalizability,overfitting,limited feature learning capability,*** address these challenges,this paper proposes a Multi-level Relation Transformer-based U-Net(MLRT-UNet)to improve thyroid nodule *** MLRTUNet leverages a novel Relation Transformer,which processes images at multiple scales,overcoming the limitations of traditional encoding *** transformer integrates both local and global features effectively through selfattention and cross-attention units,capturing intricate relationships within the *** approach also introduces a Co-operative Transformer Fusion(CTF)module to combine multi-scale features from different encoding layers,enhancing the model’s ability to capture complex patterns in the ***,the Relation Transformer block enhances long-distance dependencies during the decoding process,improving segmentation *** results showthat the MLRT-UNet achieves high segmentation accuracy,reaching 98.2% on the Digital Database Thyroid Image(DDT)dataset,97.8% on the Thyroid Nodule 3493(TG3K)dataset,and 98.2% on the Thyroid Nodule3K(TN3K)*** findings demonstrate that the proposed method significantly enhances the accuracy of thyroid nodule segmentation,addressing the limitations of existing models.
暂无评论