Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inher...
详细信息
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inherent biases and computational burdens, especially when used to relax the rank function, making them less effective and efficient in real-world scenarios. To address these challenges, our research focuses on generalized nonconvex rank regularization problems in robust matrix completion, low-rank representation, and robust matrix regression. We introduce innovative approaches for effective and efficient low-rank matrix learning, grounded in generalized nonconvex rank relaxations inspired by various substitutes for the ?0-norm relaxed functions. These relaxations allow us to more accurately capture low-rank structures. Our optimization strategy employs a nonconvex and multi-variable alternating direction method of multipliers, backed by rigorous theoretical analysis for complexity and *** algorithm iteratively updates blocks of variables, ensuring efficient convergence. Additionally, we incorporate the randomized singular value decomposition technique and/or other acceleration strategies to enhance the computational efficiency of our approach, particularly for large-scale constrained minimization problems. In conclusion, our experimental results across a variety of image vision-related application tasks unequivocally demonstrate the superiority of our proposed methodologies in terms of both efficacy and efficiency when compared to most other related learning methods.
The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps(CAMs)to generate pseudo masks as ***,existing methods often incorporate trainable modules to expand the immature cl...
详细信息
The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps(CAMs)to generate pseudo masks as ***,existing methods often incorporate trainable modules to expand the immature class activation maps,which can result in significant computational overhead and complicate the training *** this work,we investigate the semantic structure information concealed within the CNN network,and propose a semantic structure aware inference(SSA)method that utilizes this information to obtain high-quality CAM without any additional training ***,the semantic structure modeling module(SSM)is first proposed to generate the classagnostic semantic correlation representation,where each item denotes the affinity degree between one category of objects and all the ***,the immature CAM are refined through a dot product operation that utilizes semantic structure ***,the polished CAMs from different backbone stages are fused as the *** advantage of SSA lies in its parameter-free nature and the absence of additional training costs,which makes it suitable for various weakly supervised pixel-dense prediction *** conducted extensive experiments on weakly supervised object localization and weakly supervised semantic segmentation,and the results confirm the effectiveness of SSA.
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical...
详细信息
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical *** paper analyzes two fundamental failure cases in the baseline AD model and identifies key reasons that limit the recognition accuracy of existing approaches. Specifically, by Case-1, we found that the main reason detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel recover-then-discriminate(ReDi) framework for *** takes a self-generated feature map(e.g., histogram of oriented gradients) and a selected prompted image as explicit input information to address the identified in Case-1. Additionally, a feature-level discriminative network is introduced to amplify abnormal differences between the recovered and input representations. Extensive experiments on two widely used yet challenging AD datasets demonstrate that ReDi achieves state-of-the-art recognition accuracy.
The use of generative adversarial network(GAN)-based models for the conditional generation of image semantic segmentation has shown promising results in recent ***,there are still some limitations,including limited di...
详细信息
The use of generative adversarial network(GAN)-based models for the conditional generation of image semantic segmentation has shown promising results in recent ***,there are still some limitations,including limited diversity of image style,distortion of detailed texture,unbalanced color tone,and lengthy training *** address these issues,we propose an asymmetric pre-training and fine-tuning(APF)-GAN model.
Recently,Generative Adversarial Networks(GANs)have become the mainstream text-to-image(T2I)***,a standard normal distribution noise of inputs cannot provide sufficient information to synthesize an image that approache...
详细信息
Recently,Generative Adversarial Networks(GANs)have become the mainstream text-to-image(T2I)***,a standard normal distribution noise of inputs cannot provide sufficient information to synthesize an image that approaches the ground-truth image ***,the multistage generation strategy results in complex T2I ***,this study proposes a novel feature-grounded single-stage T2I model,which considers the“real”distribution learned from training images as one input and introduces a worst-case-optimized similarity measure into the loss function to enhance the model's generation *** results on two benchmark datasets demonstrate the competitive performance of the proposed model in terms of the Frechet inception distance and inception score compared to those of some classical and state-of-the-art models,showing the improved similarities among the generated image,text,and ground truth.
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggr...
详细信息
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggregating content-and position-based object contexts for semantic ***, motivated because each deep feature map is a global, class-wise representation of the input,we first propose an augmented nonlocal interaction(AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable,relative position embedding branch is then supportably installed in AugNI to capture the global positionbased contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU(standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.
Forest fires pose a serious threat to ecological balance, air quality, and the safety of both humans and wildlife. This paper presents an improved model based on You Only Look Once version 5 (YOLOv5), named YOLO Light...
详细信息
Forest fires pose a serious threat to ecological balance, air quality, and the safety of both humans and wildlife. This paper presents an improved model based on You Only Look Once version 5 (YOLOv5), named YOLO Lightweight Fire Detector (YOLO-LFD), to address the limitations of traditional sensor-based fire detection methods in terms of real-time performance and accuracy. The proposed model is designed to enhance inference speed while maintaining high detection accuracy on resource-constrained devices such as drones and embedded systems. Firstly, we introduce Depthwise Separable Convolutions (DSConv) to reduce the complexity of the feature extraction network. Secondly, we design and implement the Lightweight Faster Implementation of Cross Stage Partial (CSP) Bottleneck with 2 Convolutions (C2f-Light) and the CSP Structure with 3 Compact Inverted Blocks (C3CIB) modules to replace the traditional C3 modules. This optimization enhances deep feature extraction and semantic information processing, thereby significantly increasing inference speed. To enhance the detection capability for small fires, the model employs a Normalized Wasserstein Distance (NWD) loss function, which effectively reduces the missed detection rate and improves the accuracy of detecting small fire sources. Experimental results demonstrate that compared to the baseline YOLOv5s model, the YOLO-LFD model not only increases inference speed by 19.3% but also significantly improves the detection accuracy for small fire targets, with only a 1.6% reduction in overall mean average precision (mAP)@0.5. Through these innovative improvements to YOLOv5s, the YOLO-LFD model achieves a balance between speed and accuracy, making it particularly suitable for real-time detection tasks on mobile and embedded devices.
Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse *** exploration,the agent tries to discover unexplored(novel)ar...
详细信息
Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse *** exploration,the agent tries to discover unexplored(novel)areas or high reward(quality)*** existing methods perform exploration by only utilizing the novelty of *** novelty and quality in the neighboring area of the current state have not been well utilized to simultaneously guide the agent’s *** address this problem,this paper proposes a novel RL framework,called clustered reinforcement learning(CRL),for efficient exploration in *** adopts clustering to divide the collected states into several clusters,based on which a bonus reward reflecting both novelty and quality in the neighboring area(cluster)of the current state is given to the *** leverages these bonus rewards to guide the agent to perform efficient ***,CRL can be combined with existing exploration strategies to improve their performance,as the bonus rewards employed by these existing exploration strategies solely capture the novelty of *** on four continuous control tasks and six hard-exploration Atari-2600 games show that our method can outperform other state-of-the-art methods to achieve the best performance.
In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation *** this paper,we aim to reduce the annotation cost of crowd datasets,a...
详细信息
In a crowd density estimation dataset,the annotation of crowd locations is an extremely laborious task,and they are not taken into the evaluation *** this paper,we aim to reduce the annotation cost of crowd datasets,and propose a crowd density estimation method based on weakly-supervised learning,in the absence of crowd position supervision information,which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised *** this purpose,we design a new training method,which exploits the correlation between global and local image features by incremental learning to train the ***,we design a parent-child network(PC-Net)focusing on the global and local image respectively,and propose a linear feature calibration structure to train the PC-Net simultaneously,and the child network learns feature transfer factors and feature bias weights,and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd *** addition,we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels,and design a global-local feature loss function(L2).We combine it with a crowd counting loss(LC)to enhance the sensitivity of the network to crowd features during the training process,which effectively improves the accuracy of crowd density *** experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation,and outperforms the comparison methods on five datasets of Shanghai Tech Part A,ShanghaiTech Part B,UCF_CC_50,UCF_QNRF and JHU-CROWD++.
In foggy traffic scenarios, existing object detection algorithms face challenges such as low detection accuracy, poor robustness, occlusion, missed detections, and false detections. To address this issue, a multi-scal...
详细信息
In foggy traffic scenarios, existing object detection algorithms face challenges such as low detection accuracy, poor robustness, occlusion, missed detections, and false detections. To address this issue, a multi-scale object detection algorithm based on an improved YOLOv8 has been proposed. Firstly, a lightweight attention mechanism, Triplet Attention, is introduced to enhance the algorithm’s ability to extract multi-dimensional and multi-scale features, thereby improving the receptive capability of the feature maps. Secondly, the Diverse Branch Block (DBB) is integrated into the CSP Bottleneck with two Convolutions (C2F) module to strengthen the fusion of semantic information across different layers. Thirdly, a new decoupled detection head is proposed by redesigning the original network head based on the Diverse Branch Block module to improve detection accuracy and reduce missed and false detections. Finally, the Minimum Point Distance based Intersection-over-Union (MPDIoU) is used to replace the original YOLOv8 Complete Intersection-over-Union (CIoU) to accelerate the network’s training convergence. Comparative experiments and dehazing pre-processing tests were conducted on the RTTS and VOC-Fog datasets. Compared to the baseline YOLOv8 model, the improved algorithm achieved mean Average Precision (mAP) improvements of 4.6% and 3.8%, respectively. After defogging pre-processing, the mAP increased by 5.3% and 4.4%, respectively. The experimental results demonstrate that the improved algorithm exhibits high practicality and effectiveness in foggy traffic scenarios.
暂无评论