The advancements towards autonomous driving have propelled the need for reference/ground truth data for development and validation of various functionalities. Traditional data labelling methods are time consuming, ski...
详细信息
The advancements towards autonomous driving have propelled the need for reference/ground truth data for development and validation of various functionalities. Traditional data labelling methods are time consuming, skills intensive and have many drawbacks. These challenges are addressed through ALiVA (automatic lidar, image & video annotator), a semi-automated framework assisting for event detection and generation of reference data through annotation/labelling of video & point-cloud data. ALiVA is capable of processing large volumes of camera & lidar sensor data. Main pillars of framework are object detection-classification models, object tracking algorithms, cognitive algorithms and annotation results review functionality. Automatic object detection functionality creates a precise bounding box around the area of interest and assigns class labels to annotated objects. Object tracking algorithms tracks detected objects in video frames, provides a unique object id for each object and performs distance ranging. A unique feature of cognitive algorithms is the elimination of non-realistic objects of interests which appear in billboards or advertisements on buses/trucks. The framework also has a feature of event detection like overtaking scenarios or pedestrians/animals crossing the roads. Annotation review functionality is provided where assessment and correction of auto annotated data can be done manually. The results can be saved in standard file formats such as txt, csv, Json and open ASAM, ensuring compatibility across different systems. ALiVA replaces traditional annotation methods, thereby reducing the effort, the need for skilled resources and the time required to annotate large datasets. This eliminates human biases, manual errors and inconsistencies. ALiVA is validated for numerous customer requirements and offers a large amount and variety of data to quantify the benefits offered. Some of the distinguishing features are models and functionalities that are optimi
This paper presents a stereo object matching method that exploits both 2D contextual information from images as well as 3D object-level information. Unlike existing stereo matching methods that exclusively focus on th...
详细信息
ISBN:
(纸本)9781728190778
This paper presents a stereo object matching method that exploits both 2D contextual information from images as well as 3D object-level information. Unlike existing stereo matching methods that exclusively focus on the pixel-level correspondence between stereo images within a volumetric space (i.e., cost volume), we exploit this volumetric structure in a different manner. The cost volume explicitly encompasses 3D information along its disparity axis, therefore it is a privileged structure that can encapsulate the 3D contextual information from objects. However, it is not straightforward since the disparity values map the 31) metric space in a non-linear fashion. Thus, we present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RolSeled) and 2D-3D fusion (fusion-by-occupancy), which allow us to seamlessly incorporate 3D object-level information and achieve accurate depth performance near the object boundary regions. Our depth estimation achieves competitive performance in the KITTI dataset and the Virtual-KITTI 2.0 dataset.
As a crucial process in autonomous driving systems, path tracking control determines the safety and ride comfort of intelligent vehicles. However, the response delay of the steering actuator is commonly not considered...
详细信息
ISBN:
(数字)9798350380323
ISBN:
(纸本)9798350380330
As a crucial process in autonomous driving systems, path tracking control determines the safety and ride comfort of intelligent vehicles. However, the response delay of the steering actuator is commonly not considered in control tracking algorithms. The steering delay may directly impact the control accuracy, particularly when the target path curvature changed rapidly. This may lead to lower control precision and system oscillations, resulting in vehicle instability. Therefore, this paper proposed a Linear Quadratic Regulator (LQR) control algorithm considering steering delay characteristic that can avoid such problems. Firstly, the response delay characteristics of steering actuator is modelled as first order delay model and the model parameter is identified. Secondly, the actuator model is integrated into the vehicle model, and the optimization objective function of the discrete LQR in the infinite time domain is constructed. Then the optimal control feedback can be solved from the Riccati equation. Next, by designing the control feedforward item, the steady lateral error can be eliminated, and realizing the path tracking control considering steering delay characteristic. Finally, in the simulation environment, the control accuracy of the proposed steering delay LQR method, is compared with the original LQR and MPC algorithm under various test conditions to validate the effectiveness of the proposed approach.
As an important assembly part of the automobile, the steering column module is mainly used to control the turn signal and the wiper switch. Therefore, it is necessary to check the integrity of its printed beacon patte...
详细信息
ISBN:
(数字)9798331541699
ISBN:
(纸本)9798331541705
As an important assembly part of the automobile, the steering column module is mainly used to control the turn signal and the wiper switch. Therefore, it is necessary to check the integrity of its printed beacon pattern. However, the image quality of the traditional image processing vision system is easily affected by the field environment, and due to the existence of a large number of reflection areas in the image, the image segmentation is low precision, low adaptability and high error rate. Therefore, this paper uses K-Net deep learning network semantic segmentation model to automatically remove image reflection. By using Swin-T backbone network and introducing self-attention mechanism, the segmentation performance of beacon is further improved, and the anti-reflective binary segmentation of beacon is realized. Finally, the superiority of the above model is verified in the production line data set of enterprise steering column products. The experimental results show that: (1) K-Net network has a good segmentation effect, and its mIoU reaches 95.77%; (2) After using the Swin-T backbone network, the mIoU of K-Net model increased by 1.1% compared with the traditional CNN backbone network.
With the increasing complexity of the mission environment, the cooperative execution of tasks by multiple UAVs has become an inevitable trend. This paper focuses on the automatic allocation of reconnaissance tasks for...
详细信息
ISBN:
(数字)9798331531225
ISBN:
(纸本)9798331531232
With the increasing complexity of the mission environment, the cooperative execution of tasks by multiple UAVs has become an inevitable trend. This paper focuses on the automatic allocation of reconnaissance tasks for multiple UAVs and proposes an optimized discrete particle swarm algorithm. Firstly, the process of collaborative task instruction requests is optimized to reduce the response time. Then, an automated task allocation model with various constraints and performance functions is constructed. Subsequently, the model is solved based on the optimized discrete particle swarm algorithm. Through comparative experiments with other common methods, the results show that this algorithm performs well in terms of task allocation efficiency, the optimal value of the objective function, and execution time. It effectively improves the automatic task allocation ability of multiple UAVs in reconnaissance tasks and provides a strong guarantee for their cooperative operations.
Estimating the 6D pose presents a formidable challenge due to the intricate nature of world objects and the myriad of issues encountered when acquiring data from real-world scenes. This task is compounded by occlusion...
Estimating the 6D pose presents a formidable challenge due to the intricate nature of world objects and the myriad of issues encountered when acquiring data from real-world scenes. This task is compounded by occlusions, fluctuating lighting conditions, and data noise. The conventional CNN or Transformer architectures primarily focus on either local or global information, and relying solely on RGB or depth maps proves inadequate for extracting the most robust features. To address these challenges, we introduce Hybird6D, a dual-stream fusion deep learning approach that leverages a Transformer-CNN structure and attention mechanism to regress the 6D pose of objects using RGB-D images as input. Our methodology capitalizes on the strengths of both Transformers and CNNs, incorporating global attention and local attention, while simultaneously harnessing both appearance and geometric information. This comprehensive approach enhances robustness to noise and occlusions. Experimental evaluation on the YCB-Video dataset demonstrates that our method outperforms existing Transformer-based approaches and surpasses advanced CNN-based methods. Moreover, our approach exhibits superior performance on symmetry objects, textureless objects, and small objects.
In smart healthcare, binary classification is one of the most important tasks for disease or clinical outcome prediction. Machine learning (ML) methods have great potential to discover knowledge from data. While there...
详细信息
The health state of RV reducers plays a significant role in reliability and productivity of industrial robots. Due to the complicated structure and nonlinear dynamic behavior, it is difficult to construct a precise dy...
详细信息
ISBN:
(数字)9798350380323
ISBN:
(纸本)9798350380330
The health state of RV reducers plays a significant role in reliability and productivity of industrial robots. Due to the complicated structure and nonlinear dynamic behavior, it is difficult to construct a precise dynamic model of RV reducers. Aiming at the wear degradation monitoring of RV reducers, this paper develops a self-data-driven modeling method based on only proprioceptive sensor signals of robotic joints, including the torque, position and velocity. The baseline model of the ob-servation traj ectories is constructed by statistically analyzing the monitoring data under the healthy stage. A nonparametric modeling technique, i.e., functional principal component analy-sis (FPCA), is used to generate the principal components of the model automatically driven by the monitoring data. With the wear degradation of RV reducers, the observation trajectories will deviate far from the baseline model. Based on this phenom-enon, the real-time observation trajectories are reconstructed on the principal components. The trajectory reconstruction er-ror (TRE) signal is calculated to visualize the wear degradation of RV reducers. Through an experimental demonstration in a single-degree-of-freedom (SDOF) robot manipulator, it is found that the proposed method is more effective than commonly used momentum-based residual signals in the wear degradation mon-itoring of RV reducers.
Prior work on generating explanations in a planning context has focused on providing the rationale behind an AI agent's decision-making. While these methods offer the right explanations, they fail to heed the cogn...
详细信息
ISBN:
(纸本)9781728190778
Prior work on generating explanations in a planning context has focused on providing the rationale behind an AI agent's decision-making. While these methods offer the right explanations, they fail to heed the cognitive requirement of understanding an explanation from the explainee or human's perspective. In this work, we set out to address this issue by considering the order for communicating information in an explanation, or the progressiveness of making explanations. Progression is the notion of building complex concepts on simpler ones, which is known to benefit learning. In this work, we investigate a similar effect when an explanation is composed of multiple parts that are communicated sequentially. The challenge here lies in determining the order for receiving different parts of an explanation that would assist in understanding. Given the sequential nature, a formulation based on goal-based MDP is presented. The reward function of this MDP is learned via inverse reinforcement learning based on training data. We evaluated our approach in an escape-room domain to demonstrate its effectiveness. Upon analyzing the results, it revealed that the desired order arises strongly from both domain-dependent and independence features. This result confirmed our expectation that the process of understanding an explanation for planning tasks was progressive and context dependent. We also showed that the explanations generated using the learned rewards achieved better task performance and simultaneously reduced cognitive load. These results shed light on designing explainable robots across various domains.
Existing nighttime unmanned aerial vehicle (UAV) trackers follow an "Enhance-then-Track" architecture - first using a light enhancer to brighten the nighttime video, then employing a daytime tracker to locat...
详细信息
ISBN:
(数字)9798350384574
ISBN:
(纸本)9798350384581
Existing nighttime unmanned aerial vehicle (UAV) trackers follow an "Enhance-then-Track" architecture - first using a light enhancer to brighten the nighttime video, then employing a daytime tracker to locate the object. This separate enhancement and tracking fails to build an end-to-end trainable vision system. To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts. Without a separate enhancer, DCPT directly encodes anti-dark capabilities into prompts using a darkness clue prompter (DCP). Specifically, DCP iteratively learns emphasizing and undermining projections for darkness clues. It then injects these learned visual prompts into a daytime tracker with fixed parameters across transformer layers. Moreover, a gated feature aggregation mechanism enables adaptive fusion between prompts and between prompts and the base model. Extensive experiments show state-of-the-art performance for DCPT on multiple dark scenario benchmarks. The unified end-to-end learning of enhancement and tracking in DCPT enables a more trainable system. The darkness clue prompting efficiently injects anti-dark knowledge without extra modules. Code is available at https://***/bearyi26/DCPT.
暂无评论