Using skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition(HAR).Graph Convolutional Networks(GCN)have gained popularity in this discipl...
详细信息
Using skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition(HAR).Graph Convolutional Networks(GCN)have gained popularity in this discipline due to their capacity to efficiently process graph-structured ***,it is challenging for current models to handle distant dependencies that commonly exist between human skeleton nodes,which hinders the development of algorithms in related *** solve these problems,the Lightweight Multiscale Spatio-Temporal Graph Convolutional Network(LMSTGCN)is ***,the Lightweight Multiscale spatial Graph Convolutional Network(LMSGCN)is constructed to capture the information in various hierarchies,and multiple inner connections between skeleton joints are captured by dividing the input features into a number of subsets along the channel ***,the dilated convolution is incorporated into the temporal convolution to construct Lightweight Multiscale Temporal Convolutional Network(LMTCN),which allows to obtain a wider receptive field while keeping the size of the convolution kernel ***,the Spatio-Temporal Location Attention(STLAtt)module is used to identify the most informative joints in the sequence of skeletal information at a specific frame,hence improving the model’s ability to extract features and recognize ***,multi-stream data fusion input structure is used to enhance the input data and expand the feature *** on three public datasets illustrate the effectiveness of the proposed network.
In response to a community-identified need for ground-based thermodynamic (TD) profiling of the troposphere, we present the further development and validation of a differential absorption LiDAR (DIAL) technique to ret...
详细信息
In response to a community-identified need for ground-based thermodynamic (TD) profiling of the troposphere, we present the further development and validation of a differential absorption LiDAR (DIAL) technique to retrieve temperature. This paper showcases the accuracy of temperature retrievals using a perturbative technique, combining a DIAL measurement of a temperature-sensitive oxygen (O2) absorption profile with a high spectral resolution LiDAR measurement of the backscatter ratio profile near 770 nm. This study introduces three key advancements. First, the spectroscopic model used to represent the absorption of light by O2 is enhanced via a more complete physical representation, improving measurement accuracy. Second, the error estimation and masking are developed using the bootstrapping technique. Third, we present a comparison of temperature profiles from our laboratory-based instrument with collocated radiosondes, evaluating the accuracy of our updated measurements. It is essential to clarify that the instrument described in this paper does not operate as a stand-alone TD profiler, as it is not capable of measuring water vapor (WV). Instead, we focus on demonstrating the perturbative retrieval technique with temperature profiles inferred using ancillary radiosonde WV profiles. Results from a full TD profiling instrument will be presented in a future publication. The laboratory-based LiDAR instrument was operated over a 6-month period between April 21, 2022, and September 22, 2022. During this time, we launched 40 radiosondes, providing reference data to validate the accuracy of the DIAL-based temperature profiles. The results indicate that DIAL-based temperature retrievals are within +/- 2.5 degrees C between 0.4 and 3 km (3.5 km) during daytime (nighttime) operation, using a 300-m range resolution and a 60-min time resolution.
Current semantic change detection (SCD) methods face challenges in modeling temporal correlations (TCs) between bitemporal semantic features and difference features. These methods lead to inaccurate detection results,...
详细信息
Current semantic change detection (SCD) methods face challenges in modeling temporal correlations (TCs) between bitemporal semantic features and difference features. These methods lead to inaccurate detection results, particularly for complex SCD scenarios. This paper presents a hierarchical semantic graph interaction network (HGINet) for SCD from high-resolution remote sensing images. This multitask neural network combines semantic segmentation and change detection tasks. For semantic segmentation, we construct a multilevel perceptual aggregation network with a pyramidal architecture. It extracts semantic features that discriminate between different categories at multiple levels. We model the correlations between bitemporal semantic features using a TC module that enhances the identification of unchanged areas. For change detection, we design a semantic difference interaction module based on a graph convolutional network. It measures the interactions among bitemporal semantic features, their corresponding difference features, and the combination of both. Extensive experiments on four datasets, namely SECOND, HRSCD, Fuzhou, and Xiamen, show that HGINet performs better in identifying changed areas and categories across various scenarios and regions than nine existing methods. Compared with the existing methods applied on the four datasets, it achieves the highest F1scd values of 59.48%, 64.12%, 64.45%, and 84.93%, and SeK values of 19.34%, 14.55%, 18.28%, and 51.12%, respectively. Moreover, HGINet mitigates the influence of fake changes caused by seasonal effects, producing results with well-delineated boundaries and shapes. Furthermore, HGINet trained on the Fuzhou dataset is successfully transferred to the Xiamen dataset, demonstrating its effectiveness and robustness in identifying changed areas and categories from high-resolution remote sensing images. The code of our paper is accessible at https://***/long123524/HGINet-torch.
Building damage assessment is a critical subtask within GeoAIdriven remote sensing semantic segmentation, where deep neural networks have been widely applied. Most existing works typically use pre- and post-disaster i...
详细信息
ISBN:
(纸本)9798400711763
Building damage assessment is a critical subtask within GeoAIdriven remote sensing semantic segmentation, where deep neural networks have been widely applied. Most existing works typically use pre- and post-disaster images as input of a siamese deep neural network under supervised learning, which requires a large amount of labeled data. However, in real-world scenarios, acquiring massive labeled datasets is often difficult, making fully supervised methods less practical. To overcome this, we propose a self-supervised pretraining framework based on pre-post remote sensing image pairs. In the first stage, a dual denoising autoencoder with Vision Transformer backbone is proposed for image representation learning. In the second stage, two downstream tasks-building localization and building damage severity-are performed. Additionally, we incorporate an edge guidance module and an edge detection loss to further enhance performance in downstream tasks. On the xBD dataset, the largest building damage assessment dataset, the proposed method achieves an F1 score of 0.895 for building localization, outperforming state-of-the-art image segmentation techniques. Additionally, it receives an F1 score of 0.704 in building damage severity compared to state-of-the-art in all self-supervised learning methods.
Accurate traffic forecasting is one of the key applications within Internet of Things (IoT)-based intelligent transportation systems (ITS), playing a vital role in enhancing traffic quality, optimizing public transpor...
详细信息
Accurate traffic forecasting is one of the key applications within Internet of Things (IoT)-based intelligent transportation systems (ITS), playing a vital role in enhancing traffic quality, optimizing public transportation, and planning infrastructure. However, existing spatial-temporal methods encounter two primary limitations: 1) they have difficulty differentiating samples over time and often ignore dependencies among road network nodes at different time scales and 2) they are limited in capturing dynamic spatial correlations with predefined and adaptive graphs. To overcome these limitations, we introduce a novel temporal identity interaction dynamic graph convolutional network (TIIDGCN) for traffic forecasting. The central concept involves assigning temporal identity features to raw data and extracting distinctive, representative spatial-temporal features through multiscale interactive learning. Specifically, we design a multiscale interactive model incorporating both spatial and temporal components. This network aims to explore spatial-temporal patterns at various scales from macro to micro, facilitating their mutual enhancement through positive feedback mechanisms. For the spatial component, we design a new dynamic graph learning method to depict the changing dependencies among nodes. We conduct comprehensive experiments using four real-world traffic datasets (PeMS04/07/08 and NYCTaxi Drop-off/Pick-up). Specifically, TIIDGCN achieves a 16.46% reduction in mean absolute error compared to the spatial-Temporal Graph Attention Gated Recurrent Transformer Network model on the PeMS08 dataset.
We demonstrate thermodynamic profile estimation with data obtained using the MicroPulse DIAL such that the retrieval is entirely self contained. The only external input is surface meteorological variables obtained fro...
详细信息
We demonstrate thermodynamic profile estimation with data obtained using the MicroPulse DIAL such that the retrieval is entirely self contained. The only external input is surface meteorological variables obtained from a weather station installed on the instrument. The estimator provides products of temperature, absolute humidity and backscatter ratio such that cross dependencies between the lidar data products and raw observations are accounted for and the final products are self consistent. The method described here is applied to a combined oxygen DIAL, potassium HSRL, water vapor DIAL system operating at two pairs of wavelengths (nominally centered at 770 and 828 nm). We perform regularized maximum likelihood estimation through the Poisson Total Variation technique to suppress noise and improve the range of the observations. A comparison to 119 radiosondes indicates that this new processing method produces improved temperature retrievals, reducing total errors to less than 2 K below 3 km altitude and extending the maximum altitude of temperature retrievals to 5 km with less than 3 K error. The results of this work definitively demonstrates the potential for measuring temperature through the oxygen DIAL technique and furthermore that this can be accomplished with low -power semiconductor -based lidar sensors.
There are huge differences in data distribution and feature representation of different modalities. How to flexibly and accurately retrieve data from different modalities is a challenging problem. The mainstream commo...
详细信息
There are huge differences in data distribution and feature representation of different modalities. How to flexibly and accurately retrieve data from different modalities is a challenging problem. The mainstream common subspace methods only focus on the heterogeneity gap, and use a unified method to jointly learn the common representation of different modalities, which can easily lead to the difficulty of multi-modal unified fitting. In this work, we innovatively propose the concept of multi-modal information density discrepancy, and propose a modality-specific adaptive scaling method incorporating prior knowledge, which can adaptively learn the most suitable network for different modalities. Secondly, for the problem of efficient semantic fusion and interference features, we propose a multi-level modal feature attention mechanism, which realizes the efficient fusion of text semantics through attention mechanism, explicitly captures and shields the interference features from multiple scales. In addition, to address the bottleneck of cross-modal retrieval task caused by the insufficient quality of multimodal common subspace and the defects of Transformer structure, this paper proposes a cross-level interaction injection mechanism to fuse multi-level patch interactions without affecting the pre-trained model to construct higher quality latent representation spaces and multimodal common subspaces. Comprehensive experimental results on four widely used cross-modal retrieval datasets show the proposed MASAN achieves the state-of-the-art results and significantly outperforms other existing methods.
作者:
Guo, WenzhongZhang, KairuiKe, XiaoFuzhou Univ
Coll Comp & Data Sci Fujian Prov Key Lab Networking Comp & Intelligent Fuzhou 350116 Peoples R China Minist Educ
Key Lab Spatial Data Min & Informat Sharing Fuzhou 350003 Peoples R China
While feature extraction employing pre-trained models proves effective and efficient for no-reference video tasks, it falls short of adequately accounting for the intricacies of the Human Visual System (HVS). In this ...
详细信息
While feature extraction employing pre-trained models proves effective and efficient for no-reference video tasks, it falls short of adequately accounting for the intricacies of the Human Visual System (HVS). In this study, we proposed a novel approach to Integration of spatio-temporal Visual Stimuli into Video Quality Assessment (IVS-VQA) for the inaugural time. Exploiting the heightened sensitivity of optic rod cells to edges and motion, along with the capability to track motion via conjugate gaze, our approach affords a distinctive perspective on video quality assessment. To capture significant changes at each timestamp, we incorporate edge information to enhance the feature extraction of the pre-trained model. To tackle pronounced motion across the timeline, we introduce an interactive temporal disparity query employing a dual-branch transformer architecture. This approach adeptly introduces feature biases and extracts comprehensive global attention, culminating in enhanced emphasis on non-continuous segments within the video. Additionally, we integrate low-level color texture information within the temporal domain to comprehensively capture distortions spanning various scales, both higher and lower. Empirical results illustrate that the proposed model attains state-of-the-art performance across all six benchmark databases, along with their corresponding weighted averages.
Simultaneous Localization and Mapping (SLAM) stands as one of the critical challenges in robot navigation. A SLAM system often consists of a front-end component for motion estimation and a back-end system for eliminat...
详细信息
Simultaneous Localization and Mapping (SLAM) stands as one of the critical challenges in robot navigation. A SLAM system often consists of a front-end component for motion estimation and a back-end system for eliminating estimation drifts. Recent advancements suggest that data-driven methods are highly effective for front-end tasks, while geometry-based methods continue to be essential in the back-end processes. However, such a decoupled paradigm between the data-driven front-end and geometry-based back-end can lead to sub-optimal performance, consequently reducing the system's capabilities and generalization potential. To solve this problem, we proposed a novel self-supervised imperative learning framework, named imperative SLAM (iSLAM), which fosters reciprocal correction between the front-end and back-end, thus enhancing performance without necessitating any external supervision. Specifically, we formulate the SLAM problem as a bilevel optimization so that the front-end and back-end are bidirectionally connected. As a result, the front-end model can learn global geometric knowledge obtained through pose graph optimization by back-propagating the residuals from the back-end component. We showcase the effectiveness of this new framework through an application of stereo-inertial SLAM. The experiments show that the iSLAM training strategy achieves an accuracy improvement of 22% on average over a baseline model. To the best of our knowledge, iSLAM is the first SLAM system showing that the front-end and back-end components can mutually correct each other in a self-supervised manner.
作者:
Wang, AnSu, HuaFuzhou Univ
Coll Comp & Data Sci Key Lab Spatial Data Min & Informat Sharing Minist Educ Fuzhou Peoples R China Fuzhou Univ
Acad Digital China Key Lab Spatial Data Min & Informat Sharing Minist Educ Fuzhou Peoples R China Fuzhou Univ
Natl & Local Joint Engn Res Ctr Satellite Geospati Fuzhou Peoples R China Univ Ghent
Dept Geog Ghent Belgium
Long time series and accurate subsurface temperature data in the global ocean are essential for ocean warming and climate change studies. The sparse in situ observations in the pre-Argo era hinder the reconstruction o...
详细信息
Long time series and accurate subsurface temperature data in the global ocean are essential for ocean warming and climate change studies. The sparse in situ observations in the pre-Argo era hinder the reconstruction of long-time series observational data for the global ocean. This study proposes a novel Adaptive Spatio-TEmporal Neighbors with two-point differences (ASTEN) method for subsurface temperature reconstruction, which adaptively learns and adjusts spatio-temporal neighbors depending on the distribution of in situ observations to ensure robust gaps-filling performance across four dimensions. By integrating geoscience domain knowledge and utilizing spatiotemporal autocorrelation, ASTEN simultaneously learns the spatial pattern and temporal variation of subsurface temperature, and significantly enhances the interpretability and accuracy of ocean temperature reconstructions over a long time series compared to the DINCAE and DINEOF. The ASTEN reconstructed temperature data for the upper 1000 m from 1960 to 2022 can effectively track the ocean warming process for more than six decades. This study demonstrates the ASTEN method is well suited for subsurface temperature reconstruction, and holds great potential in the gaps-filling of sparse ocean observations with high missing rates over a large scale. The new reconstruction of subsurface temperature can effectively reduce the uncertainty of subsurface ocean warming analysis.
暂无评论