Small objects' detection from a drone's perspective has always been a challenging issue in the field of object detection. To address the problems of low recognition accuracy and information loss in small objec...
详细信息
Small objects' detection from a drone's perspective has always been a challenging issue in the field of object detection. To address the problems of low recognition accuracy and information loss in small object detection, this article proposed MIS-YOLOv8 algorithm, primarily aimed at resolving the issue of small object loss during the detection process of the classic YOLOv8s algorithm. First, a multilevel feature extraction (MFE) module was designed for enriching the feature representation capabilities capable of extracting objects from different scales. Second, a small object detection mechanism was incorporated for improving the detection ability. Finally, the integration of depthwise atrous flexible convolutions is introduced, enabling a rich capture of information from spatial to depth dimensions, thereby reducing the loss of small objects. The improved MIS-YOLOv8 algorithm validation was conducted on the VisDrone2019 dataset, where MIS-YOLOv8 demonstrated a 9% and 6.2% increase in mAP@0.5 and mAP@0.5:0.95, respectively, compared with YOLOv8s. The experimental results indicated that the improved model exhibits superior performance in small object detection for drones.
The high reliability and maintainability of complex electromechanical systems make the collection of representative fault data difficult in routine operations. This issue poses significant challenges to data-driven mo...
详细信息
The high reliability and maintainability of complex electromechanical systems make the collection of representative fault data difficult in routine operations. This issue poses significant challenges to data-driven models, hindering their ability to accurately capture system fault mechanisms. Generating simulated fault samples is a popular and effective approach to augment fault data. However, state-of-the-art data generation methods can only produce synthetic data with respect to the distribution of the collected data and may violate the actual fault mechanisms. To address this, this article proposes a counterfactual data generation method grounded in causality, aiming to simulate the intrinsic generation process of monitoring data and thereby obtain counterfactual data that conforms to the system's fault mechanisms. In the proposed method, a priori-constrained causal discovery method is designed to uncover the causalities among the monitoring variables in complex electromechanical systems. A graph decoupling network is designed to disentangle causal mechanisms and extract decoupled features representing different uncertainty sources of the monitoring data. Finally, a causal-based generative adversarial network (CGAN) is proposed to generate counterfactual monitoring data that satisfies the mined causalities. The experimental results show that the generalizability and stability of fault diagnosis models can be enhanced with the counterfactual data.
Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combina...
详细信息
Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to sensors configuration. Moreover, sensor failure can cause loss of information in one modality. In this letter, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by only one sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
High-performance vision-based decision-making networks are often limited by hardware capabilities in practical applications. To address this challenge, this study proposes lightweight optimization strategies for decis...
详细信息
High-performance vision-based decision-making networks are often limited by hardware capabilities in practical applications. To address this challenge, this study proposes lightweight optimization strategies for decision-making models from the aspects of parameter size, training memory usage, and inference speed. Specifically, an innovative solution is proposed to achieve lightweight parameters. The Video Swin Transformer is employed to simultaneously extract temporal and spatial features, with the network trained using a Prioritized Replay Deep Q-Network (PRDQN) that incorporates risk assessment. To further reduce training memory usage, the Q-target network in PRDQN is removed, and the mellowmax operator is integrated to enhance the training process, resulting in the PRDeepMellow Swin Transformer. After analyzing the inference speed problems encountered by the algorithm in practical applications, the vanilla self-attention is replaced by a linear self-attention based on double softmax, namely Double Softmax Linear Video Swin Transformer (DSLVS Transformer) which improves the inference speed for long sequences. The proposed methods were evaluated across three high-speed lane change scenarios (a static scenario, a dynamic scenario, and a randomly changing scenario). Experimental results demonstrate that the proposed methods can still maintain excellent decision performance after the corresponding lightweight optimizations.
Location-based mobile services, while improving user daily life, also raise significant privacy concerns in the sharing of location data. These trajectories indicate users' traveling behavioural traces with rich s...
详细信息
Location-based mobile services, while improving user daily life, also raise significant privacy concerns in the sharing of location data. These trajectories indicate users' traveling behavioural traces with rich semantics derived from open-source information. Behavioral-semantic analysis reveals users' travelling motivations and underlying behavioral patterns. It contributes to attackers launching inferential attacks for behavior prediction, identity identification, or other privacy invasions, even when the location data is protected. It remains open to the issues of behavioral-semantic privacy-risk quantification and privacy-protection evaluation. This paper aims to reveal such semantic privacy risks of user behaviors arising from the publication of location trajectories in mobile scenarios. We formalize user semantic-mobility process to analyze his underlying behavior patterns. Then, we design semantic inference algorithms conditional on the released trajectory to reason about the observation-based likelihood of the user's actual staying and transfer behaviours and behavioural-trace tracking. Extensive experiments with real-world data demonstrate their performance on inference accuracy and semantic similarity, offering a quantification criterion for deploying mobile privacy protection.
As with classification models, object detection models are vulnerable to adversarial attacks. In particular, adversarial attacks on key components of object detection models such as Region Proposal Network (RPN) and N...
详细信息
As with classification models, object detection models are vulnerable to adversarial attacks. In particular, adversarial attacks on key components of object detection models such as Region Proposal Network (RPN) and Non-Maximum Suppression (NMS) algorithms have recently been highlighted. As a representative adversarial attack method targeting the NMS algorithm, PhantomSponges disrupts the normal operation of the NMS algorithm by using universal perturbations. However, since PhantomSponges focuses solely on the attack success rate, the adversarial examples it generates are visually noticeable. In this paper, we propose a simple yet effective attack method called Attack method using Brightness Information for Visibility Improvement(AB-VIP), to address the limitation of adversarial attack methods targeting the NMS algorithm. To improve the invisibility of adversarial examples, the proposed method adjusts universal perturbation using the brightness information of each target image. From the experimental results under various conditions, we demonstrate that the proposed method improves the invisibility of the existing NMS-targeted attack method while maintaining a high attack success rate. When applying the proposed method, the SSIM metric improved by 13.23%, while the L2 metric decreased by 66.23%.
Machine learning (ML) has been considered a promising approach for indoor localization. Nevertheless, existing ML approaches require a large amount of computational resources and data samples in the new scenarios or f...
详细信息
Machine learning (ML) has been considered a promising approach for indoor localization. Nevertheless, existing ML approaches require a large amount of computational resources and data samples in the new scenarios or for different types of mobile devices (MD). To address the issues, we first propose a hierarchical framework that divides the deep neural network (DNN) for localization into two parts and deploys them on the server and MD, respectively. The on-device part can estimate a coarse location of the MD with local information, and the other part can improve the accuracy with global information on the server. Then, we develop a meta-learning-based training algorithm that trains the DNN in different scenarios or with different types of MDs. Then, it is implemented in unseen scenarios or on new types of MDs, where a small number of data samples are available for fine-tuning the DNN. By implementing a learning-based localization algorithm in the hierarchical framework, the root mean square error (RMSE) achieved by the on-device DNN is 1.18 m and the inference time is less than 20 ms. With the help of global information at the server, the RMSE is reduced to 0.6 m.
Digital contact tracing aims to curb epidemics by identifying and mitigating public health emergencies through technology. Backward contact tracing, which tracks the sources of infection, proved crucial in places like...
详细信息
Digital contact tracing aims to curb epidemics by identifying and mitigating public health emergencies through technology. Backward contact tracing, which tracks the sources of infection, proved crucial in places like Japan for identifying COVID-19 infections from superspreading events. This paper presents a novel perspective on digital contact tracing by modeling it as an online graph exploration problem, framing forward and backward tracing strategies as maximum-likelihood estimation tasks that leverage iterative sampling of epidemic network data. The challenge lies in the combinatorial complexity and rapid spread of infections. We introduce DeepTrace, an algorithm based on a Graph Neural Network that iteratively updates its estimations as new contact tracing data is collected, learning to optimize the maximum likelihood estimation by utilizing topological features to accelerate learning and improve convergence. The contact tracing process combines either BFS or DFS to expand the network and trace the infection source, ensuring efficient real-time exploration. Additionally, the GNN model is fine-tuned through a two-phase approach: pre-training with synthetic networks to approximate likelihood probabilities and fine-tuning with high-quality data to refine the model. Using COVID-19 variant data, we illustrate that DeepTrace surpasses current methods in identifying superspreaders, providing a robust basis for a scalable digital contact tracing strategy.
This paper investigates a fully distributed federated learning (FL) problem, in which each device is restricted to only utilize its local dataset and the information received from its adjacent devices that are defined...
详细信息
This paper investigates a fully distributed federated learning (FL) problem, in which each device is restricted to only utilize its local dataset and the information received from its adjacent devices that are defined in a communication graph to update the local model weights for minimizing the global loss function. To incorporate the communication graph constraint into the joint posterior distribution, we exploit the fact that the model weights on each device is a function of its local likelihood and local prior and then, the connectivity between adjacent devices is modeled by a Dirac Delta distribution. In this way, the joint distribution can be factorized naturally by a factor graph. Based on the Dirac Delta-based factor graph, we propose a novel distributed approximate Bayesian inference algorithm that combines loopy belief propagation (LBP) and variational Bayesian inference (VBI) for distributed FL. Specifically, VBI is used to approximate the non-Gaussian marginal posterior as a Gaussian distribution in local training process and then, the global training process resembles Gaussian LBP where only the mean and variance are passed among adjacent devices. Furthermore, we propose a new damping factor design according to the communication graph topology to mitigate the potential divergence and achieve consensus convergence. Simulation results verify that the proposed solution achieves faster convergence speed with better performance than baselines.
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application t...
详细信息
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and stochastic control algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under two settings: 1) centrally controlled LLMAs and 2) autonomous LLMAs with incentives. Throughout the paper, we numerically demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and Mistral and closed-source models like ChatGPT. The main takeaway of this paper, based on substantial empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting. Tradi
暂无评论