Detecting surrounding situations and reacting accordingly to avoid collisions remains a challenging task for autonomous driving. This task requires predicting the trajectories of surrounding agents and assessing the p...
详细信息
Detecting surrounding situations and reacting accordingly to avoid collisions remains a challenging task for autonomous driving. This task requires predicting the trajectories of surrounding agents and assessing the potential risk of future situations, which can be difficult to achieve solely through onboard vehicle devices. Therefore, this paper proposes a cooperative architecture for trajectory prediction and risk assessment conducted on roadside devices (RSUs) to assist Connected and Autonomous Vehicles (CAVs). Firstly, we develop a segmentbased prediction model (SegNet) tailored to hub signalized intersections. Intersections are divided into multiple segments, and the Curvilinear coordinates are utilized to indicate the geometric road features. The model leverages individual interaction cues in the ego segment and group features in the merging segments, while also incorporating traffic signal information to generate multimodal prediction results. In terms of risk assessment, we utilize the prediction results to provide hierarchical assistance, such as risk values, risk maps, and reference trajectories. Offline experimental results demonstrate that our SegNet model achieves competitive and well-balanced performance compared to stateof-the-art methods on the CitySim Database, with more accurate and smooth prediction trajectories. Through real-time CARLA and SUMO co-simulation, the performance of assisted CAVs indicates that they can safely and effectively navigate with the support of the proposed architecture. IEEE
Video super-resolution is a pivotal task that involves the recovery of high-resolution video frames from their low-resolution counterparts, possessing a multitude of applications in real-world scenarios. Within the do...
详细信息
Cross-database facial expression recognition (CD-FER) has been widely studied due to its promising applicability in real-life situations, while the generalization performance is the main concern in this task. For impr...
详细信息
ISBN:
(数字)9798350394948
ISBN:
(纸本)9798350394955
Cross-database facial expression recognition (CD-FER) has been widely studied due to its promising applicability in real-life situations, while the generalization performance is the main concern in this task. For improving cross-database generalization, current works frequently resort to masked auto encoder (MAE) to learn the expression representation in an unsupervised manner, and disentanglement of expression and domain features. (i) For MAE, current algorithms mainly employ random masking, and leverage the reconstruction of these masked regions to enable networks to learn the expression representation. However, these masked regions are expression-irrelevant, can not well reflect the characteristics of expression, thus are not efficient enough in representation learning. To this end, we propose an expression-aware masking in MAE to improve the learning efficiency of expression representation, by guiding MAE to mask out expression-aware regions during training. (ii) For disentanglement of expression and domain features, current algorithms realize it mainly in the deep layers. However, the coupling of these features in the shallow layers are rarely concerned, which may largely affect the disentanglement performance in deep layers. Thus, we propose a progressive decoupler to disentangle these features block by block, to use the feature disentanglement in shallow layers to facilitate that in deep layers. Extensive quantitative and qualitative results on multiple expression datasets show that our method can largely outperform the state of the arts in terms of cross-database generalization performance.
The joint optimization of Neural Radiance Fields (NeRF) and camera trajectories has been widely applied in SLAM tasks due to its superior dense mapping quality and consistency. NeRF-based SLAM learns camera poses usin...
详细信息
The prospect of assistive robots aiding in object organization has always been compelling. In an image-goal setting, the robot rearranges the current scene to match the single image captured from the goal scene. The k...
详细信息
ISBN:
(数字)9798350384574
ISBN:
(纸本)9798350384581
The prospect of assistive robots aiding in object organization has always been compelling. In an image-goal setting, the robot rearranges the current scene to match the single image captured from the goal scene. The key to an image-goal rearrangement system is estimating the desired placement pose of each object based on the single goal image and observations from the current scene. In order to establish sufficient associations for accurate estimation, the system should observe an object from a viewpoint similar to that in the goal image. Existing image-goal rearrangement systems, due to their reliance on a fixed viewpoint for perception, often require redundant manipulations to randomly adjust an object’s pose for a better perspective. Addressing this inefficiency, we introduce a novel object rearrangement system that employs multi-view fusion. By observing the current scene from multiple viewpoints before manipulating objects, our approach can estimate a more accurate pose without redundant manipulation times. A standard visual localization pipeline at the object level is developed to capitalize on the advantages of multi-view observations. Simulation results demonstrate that the efficiency of our system outperforms existing single-view systems. The effectiveness of our system is further validated in a physical experiment. For videos, please visit https: //***/view/multi-view-rearr.
Task-oriented grasping (TOG) is crucial for robots to accomplish manipulation tasks, requiring the determination of TOG positions and directions. Existing methods either rely on costly manual TOG annotations or only e...
详细信息
In this paper, we present the design and development of a novel optical tactile sensor that uses a single-pixel color light-to-frequency converter (TCS3200) and spectral decoding to recognize presses at different posi...
详细信息
While significant progress has been made in multi-modal learning driven by large-scale image-text datasets, there is still a noticeable gap in the availability of such datasets within the facial domain. To facilitate ...
详细信息
Airway segmentation serves as an essential foundational process for both the diagnosis of lung conditions and the navigation of surgical interventions. Although numerous attempts have been proposed to address airway s...
详细信息
In the specialized domain of brain tumor segmentation, supervised segmentation approaches are hindered by the limited availability of high-quality labeled data, a condition arising from data privacy concerns, signific...
详细信息
暂无评论