To mitigate the challenges posed by data uncertainty in Full-Self Driving (FSD) systems. This paper proposes a novel feature extraction learning model called Adaptive Region of Interest Optimized Pyramid Network (ARO)...
详细信息
In the analysis of drone aerial images, object detection tasks are particularly challenging, especially in the presence of complex terrain structures, extreme differences in target sizes, suboptimal shooting angles, a...
详细信息
In the analysis of drone aerial images, object detection tasks are particularly challenging, especially in the presence of complex terrain structures, extreme differences in target sizes, suboptimal shooting angles, and varying lighting conditions, all of which exacerbate the difficulty of recognition. In recent years, the DETR model based on the Transformer architecture has eliminated traditional post-processing steps such as NMS(Non-Maximum Suppression), thereby simplifying the object detection process and improving detection accuracy, which has garnered widespread attention in the academic community. However, DETR has limitations such as slow training convergence, difficulty in query optimization, and high computational costs, which hinder its application in practical fields. To address these issues, this paper proposes a new object detection model called OptiDETR. This model first employs a more efficient hybrid encoder to replace the traditional Transformer encoder. The new encoder significantly enhances feature processing capabilities through internal and cross-scale feature interaction and fusion logic. Secondly, an IoU (Intersection over Union) aware query selection mechanism is introduced. This mechanism adds IoU constraints during the training phase to provide higher-quality initial object queries for the decoder, significantly improving the decoding performance. Additionally, the OptiDETR model integrates SW-Block into the DETR decoder, leveraging the advantages of Swin Transformer in global context modeling and feature representation to further enhance the performance and efficiency of object detection. To tackle the problem of small object detection, this study innovatively employs the SAHI algorithm for data augmentation. Through a series of experiments, It achieved a significant performance improvement of more than two percentage points in the mAP (mean Average Precision) metric compared to current mainstream object detection models. Furthermore, ther
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical...
详细信息
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical *** paper analyzes two fundamental failure cases in the baseline AD model and identifies key reasons that limit the recognition accuracy of existing approaches. Specifically, by Case-1, we found that the main reason detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel recover-then-discriminate(ReDi) framework for *** takes a self-generated feature map(e.g., histogram of oriented gradients) and a selected prompted image as explicit input information to address the identified in Case-1. Additionally, a feature-level discriminative network is introduced to amplify abnormal differences between the recovered and input representations. Extensive experiments on two widely used yet challenging AD datasets demonstrate that ReDi achieves state-of-the-art recognition accuracy.
The integration of deep learning with conventional structured light center extraction techniques improves the accuracy of extracting structural gold centers. The method is divided into three steps. The initial step in...
详细信息
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introdu...
详细信息
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements.(1) Strong vision encoder: we explored a continuous learning strategy for the large-scale vision foundation model — InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs.(2) Dynamic high-resolution: we divide images into tiles ranging from 1 to 40 of 448×448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input.(3) High-quality bilingual dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images,and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in optical character recognition(OCR) and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary commercial models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 multimodal benchmarks. Code and models are available at https://***/OpenGVLab/InternVL.
Reinforcement learning(RL) interacts with the environment to solve sequential decision-making problems via a trial-and-error approach. Errors are always undesirable in real-world applications, even though RL excels at...
详细信息
Reinforcement learning(RL) interacts with the environment to solve sequential decision-making problems via a trial-and-error approach. Errors are always undesirable in real-world applications, even though RL excels at playing complex video games that permit several trial-and-error attempts. To improve sample efficiency and thus reduce errors, model-based reinforcement learning(MBRL) is believed to be a promising direction, as it constructs environment models in which trial-and-errors can occur without incurring actual costs. In this survey, we investigate MBRL with a particular focus on the recent advancements in deep RL. There is a generalization error between the learned model of a non-tabular environment and the actual environment. Consequently, it is crucial to analyze the disparity between policy training in the environment model and that in the actual environment, guiding algorithm design for improved model learning, model utilization, and policy training. In addition, we discuss the recent developments of model-based techniques in other forms of RL, such as offline RL, goal-conditioned RL, multi-agent RL, and meta-RL. Furthermore,we discuss the applicability and benefits of MBRL for real-world tasks. Finally, this survey concludes with a discussion of the promising future development prospects for MBRL. We believe that MBRL has great unrealized potential and benefits in real-world applications, and we hope this survey will encourage additional research on MBRL.
We consider the problem of energy efficiency aware dynamic adaptation of data transmission rate and transmission power of the users in carrier sensing based Wireless Local Area Networks(WLANs)in the presence of path l...
详细信息
We consider the problem of energy efficiency aware dynamic adaptation of data transmission rate and transmission power of the users in carrier sensing based Wireless Local Area Networks(WLANs)in the presence of path loss,Rayleigh fading and log-normal *** a data packet transmission,we formulate an optimization problem,solve the problem,and propose a rate and transmission power adaptation scheme with a restriction methodology of data packet transmission for achieving the optimal energy *** the restriction methodology of data packet transmission,a user does not transmit a data packet if the instantaneous channel gain of the user is lower than a *** evaluate the performance of the proposed scheme,we develop analytical models for computing the throughput and energy efficiency of WLANs under the proposed scheme considering a saturation traffic *** then validate the analytical models via *** find that the proposed scheme provides better throughput and energy efficiency with acceptable throughput fairness if the restriction methodology of data packet transmission is *** means of the analytical models and simulations,we demonstrate that the proposed scheme provides significantly higher throughput,energy efficiency and fairness index than a traditional non-adaptive scheme and an existing most relevant adaptive *** and energy efficiency gains obtained by the proposed scheme with respect to the existing adapting scheme are about 75%and 103%,respectively,for a fairness index of *** also study the effect of various system parameters on throughput and energy efficiency and provide various engineering insights.
Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is *** physicians’time is limited in outdoor situations due to many patients;therefo...
详细信息
Pneumonia is a dangerous respiratory disease due to which breathing becomes incredibly difficult and painful;thus,catching it early is *** physicians’time is limited in outdoor situations due to many patients;therefore,automated systems can be a *** input images from the X-ray equipment are also highly unpredictable due to variances in radiologists’***,radiologists require an automated system that can swiftly and accurately detect pneumonic lungs from chest *** medical classifications,deep convolution neural networks are commonly *** research aims to use deep pretrained transfer learning models to accurately categorize CXR images into binary classes,i.e.,Normal and *** MDEV is a proposed novel ensemble approach that concatenates four heterogeneous transfer learning models:Mobile-Net,DenseNet-201,EfficientNet-B0,and VGG-16,which have been finetuned and trained on 5,856 CXR *** evaluation matrices used in this research to contrast different deep transfer learning architectures include precision,accuracy,recall,AUC-roc,and *** model effectively decreases training loss while increasing *** findings conclude that the proposed MDEV model outperformed cutting-edge deep transfer learning models and obtains an overall precision of 92.26%,an accuracy of 92.15%,a recall of 90.90%,an auc-roc score of 90.9%,and f-score of 91.49%with minimal data pre-processing,data augmentation,finetuning and hyperparameter adjustment in classifying Normal and Pneumonia chests.
The human face displays expressions through the contraction of various facial muscles. The Facial Action Coding System (FACS) is a widely accepted taxonomy that describes all visible changes in the face in terms of ac...
详细信息
Recommender systems are effective in mitigating information overload, yet the centralized storage of user data raises significant privacy concerns. Cross-user federated recommendation(CUFR) provides a promising distri...
详细信息
Recommender systems are effective in mitigating information overload, yet the centralized storage of user data raises significant privacy concerns. Cross-user federated recommendation(CUFR) provides a promising distributed paradigm to address these concerns by enabling privacy-preserving recommendations directly on user devices. In this survey, we review and categorize current progress in CUFR, focusing on four key aspects: privacy, security, accuracy, and efficiency. Firstly,we conduct an in-depth privacy analysis, discuss various cases of privacy leakage, and then review recent methods for privacy protection. Secondly, we analyze security concerns and review recent methods for untargeted and targeted *** untargeted attack methods, we categorize them into data poisoning attack methods and parameter poisoning attack methods. For targeted attack methods, we categorize them into user-based methods and item-based methods. Thirdly,we provide an overview of the federated variants of some representative methods, and then review the recent methods for improving accuracy from two categories: data heterogeneity and high-order information. Fourthly, we review recent methods for improving training efficiency from two categories: client sampling and model compression. Finally, we conclude this survey and explore some potential future research topics in CUFR.
暂无评论