The prediction of pedestrian trajectories represents a crucial and widely discussed topic in the field of AI-driven traffic scenarios. The prediction of pedestrian trajectories is constrained by two factors. First, pe...
详细信息
The prediction of pedestrian trajectories represents a crucial and widely discussed topic in the field of AI-driven traffic scenarios. The prediction of pedestrian trajectories is constrained by two factors. First, pedestrians do not have the same traffic rule constraints as vehicles. Second, the computational power of in-vehicle systems is limited. This renders the application of traditional methods challenging. Previous methods have been observed to utilize redundant information, which can result in feature imbalance and the potential for model overfitting. In light of these limitations, we propose a lightweight conditional variational autoencoder model with post-process (L-CVAE-P) for pedestrian prediction scenarios. The L-CVAE-P focuses on the efficient interaction of multidimensional features to achieve a comprehensive enhancement of the model for real-world use. The model is tested on two public datasets and achieved state-of-the-art performance, while maintaining efficiency. The experimental results demonstrate that our work has developed and optimized a pedestrian trajectory prediction model for practical applications. (c) 2025 The Author(s). IEEJ Transactions on Electrical and Electronic Engineering published by Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.
Predicting trajectories of multiple agents in interactive driving scenarios such as intersections, and roundabouts are challenging due to the high density of agents, varying speeds, and environmental obstacles. Existi...
详细信息
Predicting trajectories of multiple agents in interactive driving scenarios such as intersections, and roundabouts are challenging due to the high density of agents, varying speeds, and environmental obstacles. Existing approaches use relative distance and semantic maps of intersections to improve trajectory prediction. However, drivers base their driving decision on the overall traffic state of the intersection and the surrounding vehicles. So, we propose to use traffic states that denote changing spatio-temporal interaction between neighboring vehicles, to improve trajectory prediction. An example of a traffic state is a clump state which denotes that the vehicles are moving close to each other, i.e., congestion is forming. We develop three prediction models with different architectures, namely, Transformer-based (TS-Transformer), Generative Adversarial Network-based (TS-GAN), and conditional variational autoencoder-based (TS-CVAE). We show that traffic state-based models consistently predict better future trajectories than the vanilla models. TS-Transformer produces state-of-the-art results on two challenging interactive trajectory prediction datasets, namely, Eye-on-Traffic (EOT), and INTERACTION. Our qualitative analysis shows that traffic state-based models have better aligned trajectories to the ground truth.
This paper presents a hybrid motion planning strategy that combines a deep generative network with a conventional motion planning method. Existing planning methods such as A* and Hybrid A* are widely used in path plan...
详细信息
This paper presents a hybrid motion planning strategy that combines a deep generative network with a conventional motion planning method. Existing planning methods such as A* and Hybrid A* are widely used in path planning tasks because of their ability to determine feasible paths even in complex environments;however, they have limitations in terms of efficiency. To overcome these limitations, a path planning algorithm based on a neural network, namely the neural Hybrid A*, is introduced. This paper proposes using a conditional variational autoencoder (CVAE) to guide the search algorithm by exploiting the ability of CVAE to learn information about the planning space given the information of the parking environment. An efficient expansion strategy is utilized based on a distribution of feasible trajectories learned in the demonstrations. The proposed method effectively learns the representations of a given state, and shows improvement in terms of computational time and the number of node expanded related to algorithm performance.
Long sequence time-series forecasting (LSTF) problems, such as weather forecasting, stock market forecasting, and power resource management, are widespread in the real world. The LSTF problem requires a model with hig...
详细信息
Long sequence time-series forecasting (LSTF) problems, such as weather forecasting, stock market forecasting, and power resource management, are widespread in the real world. The LSTF problem requires a model with high prediction accuracy. Recent studies have shown that the transformer model architecture is the most promising model structure for LSTF problems compared with other model architectures. The transformer model has the property of permutation equivalence, which leads to the importance of sequence position encoding, an essential process in model training. Currently, the continuous dynamics models constructed for position encoding using the neural differential equations (neural ODEs) method can model sequence position information well. However, we have found that there are some limitations when neural ODEs are applied to the LSTF problem, including the time cost problem, the baseline drift problem, and the information loss problem;thus, neural ODEs cannot be directly applied to the LSTF problem. To address this problem, we design a binary position encoding-based regularization model for long sequence time-series prediction, named Seformer, which has the following structure: 1) The binary position encoding mechanism, including intrablock and interblock position encoding. For intrablock position encoding, we design a simple ODE method by discretizing the continuum dynamics model, which reduces the time cost required to compute neural ODEs while maintaining their dynamics properties to the maximum extent. In interblock position encoding, a chunked recursive form is adopted to alleviate the baseline drift problem caused by eigenvalue explosion. 2) Information transfer regularization mechanism: By regularizing the model intermediate hidden variables as well as the encoder-decoder connection variables, we can reduce information loss during the model training process while ensuring the smoothness of the position information. Extensive experimental results obtained
With the development of intelligent agents pursuing humanisation,artificial intelligence must consider emotion,the most basic spiritual need in human *** emotional dialogue systems usually use an external emotional di...
详细信息
With the development of intelligent agents pursuing humanisation,artificial intelligence must consider emotion,the most basic spiritual need in human *** emotional dialogue systems usually use an external emotional dictionary to select appropriate emotional words to add to the response or concatenate emotional tags and semantic features in the decoding step to generate appropriate ***,selecting emotional words from a fixed emotional dictionary may result in loss of the diversity and consistency of the *** propose a semantic and emotion-based dual latent variable generation model(Dual-LVG)for dialogue systems,which is able to generate appropriate emotional responses without an emotional *** from previous work,the conditional variational autoencoder(CVAE)adopts the standard transformer ***,Dual-LVG regularises the CVAE latent space by introducing a dual latent space of semantics and *** content diversity and emotional accuracy of the generated responses are improved by learning emotion and semantic features ***,the average attention mechanism is adopted to better extract semantic features at the sequence level,and the semi-supervised attention mechanism is used in the decoding step to strengthen the fusion of emotional features of the *** results show that Dual-LVG can successfully achieve the effect of generating different content by controlling emotional factors.
Each link prediction task requires different degrees of answer diversity. While a link prediction task may expect up to a couple of answers, another may expect nearly a hundred answers. Given this fact, the performanc...
详细信息
Each link prediction task requires different degrees of answer diversity. While a link prediction task may expect up to a couple of answers, another may expect nearly a hundred answers. Given this fact, the performance of a link prediction model can be estimated more accurately if a flexible number of obtained answers are estimated instead of a predefined number of answers. Inspired by this, in this article, we analyze two evaluation criteria for link prediction tasks, respectively ranking-based protocol and sampling-based protocol. Furthermore, we study two classes of models on link prediction task, direct model and latent-variable model respectively, to demonstrate that latent-variable model performs better under the sampling-based protocol. We then propose a latent-variable model where the framework of conditional variational autoencoder (CVAE) is applied. Experimental study suggests that the proposed model performs comparably to the current state-of-the-art even under the conventional rank-based protocol. Under the sampling-based protocol, the proposed model is shown to outperform various state-of-the-art models.
The advent of automated vehicles (AVs) will provide opportunities for safer, smoother, and smarter road transportation. During the transition from the current human-driven vehicle (HV) to a fully AV traffic environmen...
详细信息
The advent of automated vehicles (AVs) will provide opportunities for safer, smoother, and smarter road transportation. During the transition from the current human-driven vehicle (HV) to a fully AV traffic environment, there will be a mixed traffic flow including both HVs and AVs. The impact of introducing AVs into existing traffic, however, has not yet been fully understood. In this paper, we advance this understanding by conducting mixed traffic safety evaluation from the perspective of car-following behavior using real-world AV operational data of mixed traffic. To understand how the AVs impact other vehicles on the road, we analyzed the operational behaviors of HV-following-HV, AV-following-HV, and HV-following-AV. A selected car-following model is calibrated, and results show that there are significant differences between the HV-following-HV and the other two groups, indicating safe AV behavior and changes in HV behavior (i.e. less aggressive, safer) after the introduction of AVs into the traffic. Additionally, to understand AV behavioral safety, we investigate behavior predictions (one of the most critical inputs for AVs to make car-following decisions) of AVs and their surrounding vehicles using a mature baseline model and a new conditional variational autoencoder (CVAE) framework. The result shows potential risks of inaccurate predictions of the baseline model and the necessity to consider additional factors, such as vehicle interactions and driver behavior, into the prediction for risk mitigation. Arterial vehicle trajectory data from the Lyft Level 5 Dataset is applied to test the proposed methodological framework to understand the car-following safety risks of HVs and AVs in the mixed traffic stream.
This paper introduces a graph neural network-based model for predicting pedestrian trajectories in architectural spaces. Compared to traditional simulations based on physics-based models, this data-driven model has a ...
详细信息
ISBN:
(纸本)9789887891819
This paper introduces a graph neural network-based model for predicting pedestrian trajectories in architectural spaces. Compared to traditional simulations based on physics-based models, this data-driven model has a stronger ability to learn and predict pedestrian behaviour patterns from real-world data. The model is pre-trained based on Hongqiao Railway Station Dataset, then trained and tested based on the ETH Dataset and the Stanford Drone Dataset, enabling comparisons with other AI models. By creating a more intelligent model, we can establish a digital replica of the real world that can predict pedestrian flow with higher accuracy in daily life or extreme situations such as sudden fires. Our results underscore the critical role of such models in comprehending how architectural spaces are utilized, and thus in improving architectural design and urban planning.
Spherical images taken in all directions (360 degrees by 180 degrees) can represent an entire space including the subject, providing free direction viewing and an immersive experience to viewers. It is convenient and ...
详细信息
Spherical images taken in all directions (360 degrees by 180 degrees) can represent an entire space including the subject, providing free direction viewing and an immersive experience to viewers. It is convenient and expands the usage scenarios to generate a spherical image from a few normal-field-of-view (NFOV) images, which are partial observations. The primary challenge is generating a plausible image and controlling the high degree of freedom involved in generating a wide area that includes all directions. We focus on scene symmetry, which is a basic property of the global structure of spherical images, such as the rotational and plane symmetries. We propose a method for generating a spherical image from a few NFOV images and controlling the generated regions using scene symmetry. We incorporate the intensity of the symmetry as a latent variable into conditional variational autoencoders to estimate the possible range of symmetry and decode a spherical image whose features are represented through a combination of symmetric transformations of the NFOV image features. Our experiments show that the proposed method can generate various plausible spherical images controlled from asymmetrically to symmetrically, and can reduce the reconstruction errors of the generated images based on the estimated symmetry.
Benefit from the rapid evolution of artificial intelligence and wireless communication technology, diverse Internet of Things (IoT) devices with edge computing ability have widely penetrated every aspect of daily huma...
详细信息
Benefit from the rapid evolution of artificial intelligence and wireless communication technology, diverse Internet of Things (IoT) devices with edge computing ability have widely penetrated every aspect of daily human life. However, the deviations of private datasets and the heterogeneity of local models caused by the difference in device composition and application scenarios have hampering the aggregation of global recognition model in modulation classification task, thus constraining the classification performance of intelligent IoT-edge devices severely. To address this problem, we propose a heterogenous Federated learning framework based on Bidirectional Knowledge Distillation (FedBKD) for IoT system, which integrates knowledge distillation into the local model upload (client-to-cloud) and global model download (cloud-to-client) steps of federated learning. The client-to-cloud distillation is regarded as a process of multi-teacher knowledge distillation and the global network is regarded as a student network that unifies the heterogeneous knowledge from multiple local teacher networks. A public dataset is generated by conditional variational autoencoder (CVAE) and stored in the cloud server for supporting the obtaining of heterogeneous knowledge without sharing the private data of IoT devices. The cloud-to-client distillation is single-teacher-multiple-students process, which distills the knowledge from the single global model back to multiple heterogeneous local networks and partial knowledge distillation is used in this process. We implement our FedBKD method in the modulation classification task and the simulation results have proven the effectiveness of our proposed method.
暂无评论