Vision transformers (ViTs) are increasingly utilized for HSI classification due to their outstanding performance. However, ViTs encounter challenges in capturing global dependencies among objects of varying sizes, and...
详细信息
Vision transformers (ViTs) are increasingly utilized for HSI classification due to their outstanding performance. However, ViTs encounter challenges in capturing global dependencies among objects of varying sizes, and fail to effectively exploit the spatial-spectral information inherent in HSI. In response to this limitation, we propose a novel solution: the multi-scale spatial-spectral transformer (MSST). Within the MSST framework, we introduce a spatial-spectral token generator (SSTG) and a token fusion self-attention (TFSA) module. Serving as the feature extractor for the MSST, the SSTG incorporates a dual-branch multi-dimensional convolutional structure, enabling the extraction of semantic characteristics that encompass spatial-spectral information from HSI and subsequently tokenizing them. TFSA is a multi-head attention module with the ability to encode attention to features across various scales. We integrated TFSA with cross-covariance attention (CCA) to construct the transformer encoder (TE) for the MSST. Utilizing this TE to perform attention modeling on tokens derived from the SSTG, the network effectively simulates global dependencies among multi-scale features in the data, concurrently making optimal use of spatial-spectral information in HSI. Finally, the output of the TE is fed into a linear mapping layer to obtain the classification results. Experiments conducted on three popular public datasets demonstrate that the MSST method achieved higher classification accuracy compared to state-of-the-art (SOTA) methods.
Facial attribute editing is a popular direction in face generation, which aims to modify facial attributes in the face image and remain unedited attributes unchanged. However, generative models are prone to affect the...
详细信息
Facial attribute editing is a popular direction in face generation, which aims to modify facial attributes in the face image and remain unedited attributes unchanged. However, generative models are prone to affect the unedited attributes when editing multiple facial attributes. Currently, the concatenation of the prior knowledge with hidden features is still data-driven work. Due to the feature coupling in data-driven models, high-entanglement implicit semantics are generated, which is incomprehensible for human beings. Besides, multi-attribute boundaries of the implicit semantics are ambiguous, which is complicated to effectively control the editing process. In this paper, we propose a knowledge-guided explicit feature disentanglement network that is compatible with human cognition, leveraging a classification method with the prior knowledge to encode features. Specifically, we select 13 facial attribute labels for a comprehensive and explicit presentation of this task and design a knowledge-guided feature disentanglement module to transform the implicit feature representations into explicit feature semantics. We also construct a semantic space that can independently manipulate facial attributes. In addition, our proposed model can be combined with existing facial attribute editing models to obtain multiple variant models. Our proposed model is fully validated by various experiments and the variant model has achieved better performance than the benchmark model in facial attribute editing.
Microservices as an emerging architecture are creating new opportunities to enable superior network services in Mobile Edge Computing (MEC). In the presence of huge amounts of user requests, the massive communications...
详细信息
Microservices as an emerging architecture are creating new opportunities to enable superior network services in Mobile Edge Computing (MEC). In the presence of huge amounts of user requests, the massive communications among microservices have become notoriously complicated. Due to the intricate data dependencies of the microservices, the overall performance of large-scale MEC applications simultaneously depends on both service deployment and request routing. However, most existing work ignores the interdependencies of microservices and studies the deployment and routing as two isolated problems. In this case, this article investigates the joint optimization of service deployment and request routing in edge computing. We first formulate a delay minimization problem via mixed integer linear programming and queuing analysis, and then provide a hardness proof on the problem. In addition, this article presents a 2-approximation algorithm, followed with rigorous mathematical proofs to demonstrate the approximation ratio. The proposed two-phase algorithm consists of rounding based service deployment and adaptive-scaling-based request routing policies, which employ fine grained joint optimization to minimize service response delay. Finally, we illustrate the near-optimal performance of the proposed algorithm via comprehensive experiments.
Facial paralysis refers to the abnormal behavior of facial muscles caused by a disorder of the facial nerve, mainly manifested as facial asymmetry. In recent years, deep learning has found extensive applications in fa...
详细信息
Facial paralysis refers to the abnormal behavior of facial muscles caused by a disorder of the facial nerve, mainly manifested as facial asymmetry. In recent years, deep learning has found extensive applications in facial paralysis detection research. However, most existing methods are constrained to assessing the severity of facial paralysis, thereby concealing crucial symptoms within black-box models. Compared to the severity of facial paralysis, the symptoms of facial paralysis are of greater significance to both physicians and patients. To address this issue, this paper proposes a facial paralysis symptom detection model based on facial action units (AUs). To enhance the accuracy of AU intensity prediction, a novel Difference Ensemble Method (DEM) is introduced. This method leverages differential information between frames within the same video to improve the accuracy of predictions for the current frame. Building upon the predicted AU intensity sequences for keyframes in a video, an interpretable model for detecting facial paralysis symptoms is designed. This model employs an active means to describe the asymmetry in facial muscle strength and utilizes co-occurrence matrices to detect synkinesis. It is noteworthy that DEM is exclusively trained on a dataset of normal faces but exhibits excellent performance when transferred to a facial paralysis dataset. Additionally, DEM exhibits higher accuracy in predicting AU intensity compared to existing methods. The F1 scores for detecting facial muscle function in the eyebrow, eye, and mouth regions with our proposed model are 80.0%, 79.23%, and 90.91%, respectively. To demonstrate the model's performance, a synkinesis detection experiment is conducted, further validating its applicability in facial paralysis detection.
Semantic change detection (SCD) is a challenging task in remote sensing image (RSI) interpretation, which adopts multitemporal images to detect, locate, and analyze pixel-level land-cover "from-to" changes. ...
详细信息
Semantic change detection (SCD) is a challenging task in remote sensing image (RSI) interpretation, which adopts multitemporal images to detect, locate, and analyze pixel-level land-cover "from-to" changes. In SCD, the severe class imbalance problem and the occurrence of confusing categories are very typical, making it challenging to accurately distinguish the easily confused categories with limited semantic context information. However, previous works did not address these issues in depth. This article proposes a novel SCD method named semi-supervised contrastive learning (SSCLNet), in which a simple and effective SCD network is designed as a strong baseline, and a semi-supervised contrastive learning module of semantic segmentation (SS) is presented to enhance the distinguishability of categories. Our baseline extracts semantic context through high-resolution network (HRNet), gets change information simply through an absolute difference, and then directly performs SCD based on the fusion of semantic context and change information. To utilize the semantic context information of the unlabeled non-changed regions, we employ a self-training (ST) method for semi-supervised SS. To learn distinguishable feature representations for easily confused categories, we present contrastive learning with an adaptive sampling strategy for SS. It selects challenging negative samples for each category from the other categories that exhibit similar features or attributes. The sampling space includes both the labeled changed samples and the non-changed samples predicted by ST. The comprehensive experiments on the SECOND and the Landsat-SCD dataset demonstrate that the proposed SSCLNet achieves the state-of-the-art (SOTA) performance, with a significant improvement of 2.07% and 4.15% in the score value, respectively.
作者:
Luo, HuiFeng, XiboDu, BoZhang, YuxiangChina Univ Geosci
Sch Comp Sci Wuhan 430079 Peoples R China Wuhan Univ
Inst Artificial Intelligence Natl Engn Res Ctr Multimedia Software Sch Comp Sci Wuhan 430072 Peoples R China Wuhan Univ
Hubei Key Lab Multimedia & Network Commun Engn Wuhan 430072 Peoples R China China Univ Geosci
Inst Geophys & Geomat Wuhan 430079 Peoples R China
Building extraction from remote sensing images is extremely important for urban planning, land-cover change analysis, disaster monitoring, and so on. With the growing diversity in building features, shape, and texture...
详细信息
Building extraction from remote sensing images is extremely important for urban planning, land-cover change analysis, disaster monitoring, and so on. With the growing diversity in building features, shape, and texture, coupled with frequent occurrences of shadowing and occlusion, the use of high-resolution remote sensing image (HRI) alone has limitations in building extraction. Therefore, feature fusion using multisource data has gradually become one of the most popular. However, the unique characteristics and noise issues make it difficult to achieve effective fusion and utilization. Thus, it is very challenging to realize the full fusion of multisource data to achieve complementary advantages. In this article, we propose an end-to-end multimodal feature fusion building extraction network based on segformer, which utilizes the fusion of HRI and light detection and ranging (LiDAR) data to realize the building extraction. First, we utilize the segformer encoder to break through the limitations of the traditional convolutional neural network (CNN) with the restricted receptive field, so as to achieve effective feature extraction of complex buildings. In addition, we propose a cross-modal feature fusion (CMFF) method utilizing the self-attention mechanism to ensure the fusion of multisource data. In the decoder part, we propose a multiscale upsampling decoder (MSUD) strategy to achieve a full fusion of multilevel features. As demonstrated by experiments on three datasets, our model shows better performance than several multisource building extraction and semantic segmentation models. The intersection over union (IoU) for buildings on the three datasets reaches 91.80%, 93.03%, and 84.59%. Subsequent ablation experiments further validate the effectiveness of each strategy.
The effective identification and mitigation of non-line-of-sight (NLOS) ranging errors are essential for achieving high-precision positioning and navigation with ultra-wideband (UWB) technology in harsh indoor environ...
详细信息
The effective identification and mitigation of non-line-of-sight (NLOS) ranging errors are essential for achieving high-precision positioning and navigation with ultra-wideband (UWB) technology in harsh indoor environments. In this paper, an efficient UWB ranging-error mitigation strategy that uses novel channel impulse response parameters based on the results of a two-step NLOS identification, composed of a decision tree and feedforward neural network, is proposed to realize indoor locations. NLOS ranging errors are classified into three types, and corresponding mitigation strategies and recall mechanisms are developed, which are also extended to partial line-of-sight (LOS) errors. Extensive experiments involving three obstacles (humans, walls, and glass) and two sites show an average NLOS identification accuracy of 95.05%, with LOS/NLOS recall rates of 95.72%/94.15%. The mitigated LOS errors are reduced by 50.4%, while the average improvement in the accuracy of the three types of NLOS ranging errors is 61.8%, reaching up to 76.84%. Overall, this method achieves a reduction in LOS and NLOS ranging errors of 25.19% and 69.85%, respectively, resulting in a 54.46% enhancement in positioning accuracy. This performance surpasses that of state-of-the-art techniques, such as the convolutional neural network (CNN), long short-term memory-extended Kalman filter (LSTM-EKF), least-squares-support vector machine (LS-SVM), and k-nearest neighbor (K-NN) algorithms.
Tunneling magnetoresistance (TMR) sensors have shown the capability of operating in weak magnetic fields. However, the environmental magnetic noise limits their applications in open field detection. This article propo...
详细信息
Tunneling magnetoresistance (TMR) sensors have shown the capability of operating in weak magnetic fields. However, the environmental magnetic noise limits their applications in open field detection. This article proposes a novel background noise cancellation method based on a backpropagation (BP) neural network for TMR sensor arrays. According to simulation results, the BP-based noise reduction method can eliminate background noise more effectively than the traditional coherence coefficient method. The signal-to-noise ratio (SNR) of the sensor can, thus, be improved by over 20 dB, especially when detecting extremely low SNR signals. This algorithm is demonstrated using a TMR sensor array, which shows a capability of greatly enhancing the sensor array's limit of detection in open field testing.
Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we pr...
详细信息
Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.
Methane(CH4) is a critical but overlooked component in the study of the deep carbon *** CH4produced by serpentinization of ultramafic rocks has received extensive attention,but its formation and flux in mafic rocks ...
详细信息
Methane(CH4) is a critical but overlooked component in the study of the deep carbon *** CH4produced by serpentinization of ultramafic rocks has received extensive attention,but its formation and flux in mafic rocks during subduction remain poorly ***,we report massive CH4-rich fluid inclusions in well-zoned garnet from eclogites in Western Tianshan,*** characteristics and carbon-hydrogen isotopic compositions confirm the abiotic origin of this *** P-T-fO2-fluid trajectories and Deep Earth Water modeling imply that massive abiotic CH4was generated during cold subduction at depths of 50-120 km,whereas CO2was produced during *** massive production of abiotic CH4in eclogites may result from multiple mechanisms during prograde high pressure-ultrahigh pressure *** flux calculation proposes that abiotic CH4that has been formed in HP-UHP eclogites in cold sub duction zones may represent one of the largest,yet overlooked,sources of abiotic CH4on Earth.
暂无评论