In Autonomous driving, the LidAR sensor has great importance to catch 3d shapes and visualize the environment around cars. These cars have also mounted camera systems calibrated with LidAR sensors. These camera system...
详细信息
In Autonomous driving, the LidAR sensor has great importance to catch 3d shapes and visualize the environment around cars. These cars have also mounted camera systems calibrated with LidAR sensors. These camera systems have multiple cameras with overlapping fields of view which totally cover 360°. In this research, we focus on 3d Object detection andsemanticsegmentation frameworks adopting sensor fusion technologies specifically camera-LidAR and LidAR-only frameworks. Also, how the camera view helps LidAR point cloud is discussed. Early fusion approach by fusing point cloud and pseudo-LidAR in the early step before the object detection network is experimented within the implementation. The depth map is generated from a single camera view. depth map from front-view camera image which has forwarddirection information is adopted in this experiment while merging multi-camera images and merging depth maps are discussed at the same time. Pseudo-LidAR is generated from a depth map while encountering distortion problems due to an unknown depth shift. To overcome this problem and find exact metric reconstruction, some possible solutions are offered. One possible solution is that pseudo-LidAR uses a 3d point cloud to learn this unknown depth shift. To align two 3d point clouds, one original and one virtual, into one common base, ground extracted from both point clouds. Pandaset which has 360° and forward facing LidAR point cloud and multi-view images is exploited as an autonomous driving dataset.
The 3d point cloud, a common format of 3ddata, finds extensive applications in fields like remote sensing, surveying, robotics, and more. Addressing the challenges posed by insufficient weighted feature information a...
详细信息
The 3d point cloud, a common format of 3ddata, finds extensive applications in fields like remote sensing, surveying, robotics, and more. Addressing the challenges posed by insufficient weighted feature information and the offset between weighted results and task expectations in the attention mechanism, this paper proposes a Multi-scale Learnable Key-channel Attention Network (MLKNet). First, we introduce a feature feedback-repair module to mitigate the impact of information loss in the feature embedding process. This module aims to fully embed the original data into a high-dimensional feature space, ensuring a rich supply of feature information for subsequent transformer modules. Second, an efficient hierarchical local feature encoder extract and aggregate local features from point clouds at various scales, thereby significantly enhancing the model's capability to represent geometric structures. Third, a novel learnable key-channel attention module allows tasks to influence the feature selection and weighting process directly, make the highlighted features as close to task expectations as possible, effectively enhancing the network's perception of global semantic information. Our method was benchmarked on various tasks where we achieved overall accuracy (OA) of 92.3% on the ModelNet40 classification task and achieved instance mean intersection over union (ins. mIoU) of 87.6% on the ShapeNet-part segmentation task. The results indicate the superior performance of our method.
LidAR odometry estimation and3d semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in differ...
详细信息
LidAR odometry estimation and3d semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in different semantic categories for 3d semantic segmentation and the influence of dynamic objects for LidAR odometry estimation, which increases the importance of using representative/salient landmarks as reference points for robust feature learning. To address these challenges, we propose a saliency-guided approach that leverages attention information to improve the performance of LidAR odometry estimation andsemanticsegmentation models. Unlike in the image domain, only a few studies have addressed point cloud saliency information due to the lack of annotated training data. To alleviate this, we first present a universal framework to transfer saliency distribution knowledge from color images to point clouds, and use this to construct a pseudo-saliency dataset (i.e. FordSaliency) for point clouds. Then, we adopt point cloud based backbones to learn saliency distribution from pseudo-saliency labels, which is followed by our proposed SalLidAR module. SalLidAR is a saliency-guided3d semantic segmentation model that integrates saliency information to improve segmentation performance. Finally, we introduce SalLONet, a self-supervised saliency-guided LidAR odometry network that uses the semantic and saliency predictions of SalLidAR to achieve better odometry estimation. Our extensive experiments on benchmark datasets demonstrate that the proposed SalLidAR and SalLONet models achieve state-of-the-art performance against existing methods, highlighting the effectiveness of image-to-LidAR saliency knowledge transfer. Source code will be available at https://***/nevrez/SalLONet
Unmanned aerial vehicles (UAVs), in conjunction with computer vision techniques, have shown great potential for bridge inspections. Close-range images captured in proximity to the structural surface are generally requ...
详细信息
Unmanned aerial vehicles (UAVs), in conjunction with computer vision techniques, have shown great potential for bridge inspections. Close-range images captured in proximity to the structural surface are generally required to detect damage and also need to be linked to the corresponding structural component to enable assessment of the health of the global structure. However, the lack of contextual information makes automated identification of bridge components in close-range images challenging. This study proposes a framework for automated bridge component recognition based on close-range images collected by UAVs. First, a 3d point cloud is generated from the UAV survey of the bridge and segmented into bridge components. The segmented point cloud is subsequently projected onto the camera coordinates to categorize each of the images into the bridge component. The proposed approach is successfully validated using a local highway bridge, pointing the way for improved inspection of full-scale bridges.
Falls from scaffolds cause the majority of accidents and fatalities at construction sites. A deep learning-based3d reconstruction technology could provide a solution to prevent such fatalities through automated scaff...
详细信息
Falls from scaffolds cause the majority of accidents and fatalities at construction sites. A deep learning-based3d reconstruction technology could provide a solution to prevent such fatalities through automated scaffold monitoring. However, when the technology was used at a large-scale construction site, there were limitations, such as the scarcity of point clouddata and the non-uniformity of points. To address this issue, this paper presents a large-scale scaffold reconstruction method using synthetic scaffolddatasets and an upsampling adversarial network. The method consists of four steps: 1) data acquisition of scaffold point cloud through a mobile laser scanning (MLS) system, 2) 3d semantic segmentation using synthetic datasets, 3) upsampling of the segmented scaffold points, and 4) automatic generation of a 3d CAd model. The performance of the segmentation model trained with synthetic datasets achieved an 80.83% F1 score, which improved to 94.93% after upsampling.
Interior construction makes up a large portion of project budget and time and is more prone to schedule delays. Most research efforts on progress management focus on exterior environment, while few on interior constru...
详细信息
Interior construction makes up a large portion of project budget and time and is more prone to schedule delays. Most research efforts on progress management focus on exterior environment, while few on interior construction. Although progress monitoring methods based on laser point clouds and computer vision are investigated before, the problems of costly acquisition and creation of point clouds and images are still open, which impede the study of progress evaluation, particularly in interior construction environments where clutters and occlusions are universal. This paper introduces a method based on 360 degrees panoramic images anddeep learning for fast end-to-end interior progress evaluation in room units. The method takes only one or two 360 degrees panoramic images as input, estimates key corners, generates and registers room layouts, andsemantically segments sparse point cloud. With the extracted corners andsegmentation results, the progress states of interior trades can be evaluated. The experimental results show that the proposed method based on deep learning techniques achieves comparable performance against those on public data sets with 3d Intersection over Union (3d IoU) of 83.69% vs 84.23%, Corner Error (CE) of 0.4% vs 0.69%, and mean class Interaction over Union (mIoU) of 70.28% vs 53.5%. A case study of an interior decoration project of a hotel is adopted to demonstrate the feasibility and practical capabilities of the proposed method.
Although a scaffold is an essential structure in the construction industry, it may also be a dangerous factor that causes fatalities. However, the process of monitoring the scaffold is labor-intensive because it is co...
详细信息
Although a scaffold is an essential structure in the construction industry, it may also be a dangerous factor that causes fatalities. However, the process of monitoring the scaffold is labor-intensive because it is conducted by the subjective observation of safety managers. To address this issue, we propose an automatic scaffold3d reconstruction method using 3d point clouddata acquired using a robot dog. The method consists of three steps: 1) data acquisition of scaffold point clouds through a robot dog scanning system, 2) deep learning-based3d semantic segmentation, and3) automatic formation of a 3d CAd model. We created 15 robot dog datasets for training the segmentation model. The proposed method was tested at a different site where a scaffold with a representative structure was attached to the wall. The proposed methoddemonstrated an excellent performance of point cloudsegmentation with a 90.84% F1 score.
We propose a novel deep architecture for semantic labeling of 3d point clouds referred to as Global and Local Streams Network (GLSNet) which is designed to capture both global and local structures and contextual infor...
详细信息
ISBN:
(数字)9781728147321
ISBN:
(纸本)9781728147338
We propose a novel deep architecture for semantic labeling of 3d point clouds referred to as Global and Local Streams Network (GLSNet) which is designed to capture both global and local structures and contextual information for large scale 3d point cloud classification. Our GLSNet tackles a hard problem - large differences of object sizes in large-scale point cloudsegmentation including extremely large objects like water, and small objects like buildings and trees, and we design a two-branch deep network architecture to decompose the complex problem to separate processing problems at global and local scales and then fuse their predictions. GLSNet combines the strength of Submanifold Sparse Convolutional Network [1] for learning global structure with the strength of PointNet++ [2] for incorporating local information. The first branch of GLSNet processes a full point cloud in the global stream, and it captures long range information about the geometric structure by using a U-Net structured Submanifold Sparse Convolutional Network (SSCN-U) architecture. The second branch of GLSNet processes a point cloud in the local stream, and it partitions 3d points into slices and processes one slice of the cloud at a time by using the PointNet ++ architecture. The two streams of information are fused by max pooling over their classification prediction vectors. Our results on the IEEE GRSS data Fusion Contest Urban semantic3d, Track 4 (dFT4) [3] [4] [5] point cloud classification dataset have shown that GLSNet achieved performance gains of almost 4% in mIOU and 1% in overall accuracy over the individual streams on the held-back testing dataset.
暂无评论