Stereo image inputs provide higher objectdetection accuracy than monocular images by enabling the detection of objects that are missed from one view while being detectable from another view. To take advantage of addi...
详细信息
ISBN:
(纸本)9781665477291
Stereo image inputs provide higher objectdetection accuracy than monocular images by enabling the detection of objects that are missed from one view while being detectable from another view. To take advantage of additional information from the secondary image, it is necessary to search for the corresponding region in the images of different views by projecting with depth information of the target object. However, most existing studies utilize highly complex computations to estimate the depth for simple 2d object detection. This complexity limits the potential for deploying the methods on platforms, such as unmanned aerial vehicles, that involve significant resource constraints. In this paper, we introduce a simplifieddepth approximation to obtain depth information by quantizing the depth values into a small number of representative values. With these values, the regions of interest are projected to the secondary image to concatenate the information from the additional image. We validate our method with the KITTI dataset. Our results show that while having very low complexity, our approximation method leads to greatly improvedobjectdetection performance in two out of three difficulty groups of the dataset, and comparable performance in the other difficulty group compared to use of monocular image input.
Joint 2d object detection and 3d reconstruction is an essential computer vision task to get more accurate detection and representation model of the target object. We proposed a novel joint 2d object detection and 3d r...
详细信息
ISBN:
(纸本)9781728192017
Joint 2d object detection and 3d reconstruction is an essential computer vision task to get more accurate detection and representation model of the target object. We proposed a novel joint 2d object detection and 3d reconstruction model that enhances the ability of the 2d object detection and the 3d reconstruction, called Adversarial Fusion Mesh Region Convolutional Neural Networks (AFM R-CNN). Our proposed model introduces the deep Convolutional Generative Adversarial Network (dCGAN) to generate adversarial images and input the real and adversarial images into the objectdetection module GA-RPN to determine the position and anchor box of the target object. Next, to make better use of the two-dimensional information of the image, the voxel conversion and Fusion model Pix2Vox is introduced to fuse the two types of image features and generate coarse voxels. Afterwards, to differentiate the voxel information more efficiently, we use the Principal Neighborhood Aggregation network (PNA) model in 3d model refinement. The contrast experimental results on the open domain dataset (Pix3d) with baseline models demonstrate the effectiveness of AFM R-CNN in joint 2d object detection and 3d reconstruction task.
The spread of Unmanned Aerial Vehicles (UAVs) in the last decade revolutionized many applications fields. Most investigated research topics focus on increasing autonomy during operational campaigns, environmental moni...
详细信息
The spread of Unmanned Aerial Vehicles (UAVs) in the last decade revolutionized many applications fields. Most investigated research topics focus on increasing autonomy during operational campaigns, environmental monitoring, surveillance, maps, and labeling. To achieve such complex goals, a high-level module is exploited to build semantic knowledge leveraging the outputs of the low-level module that takes data acquired from multiple sensors and extracts information concerning what is sensed. All in all, the detection of the objects is undoubtedly the most important low-level task, and the most employed sensors to accomplish it are by far RGB cameras due to costs, dimensions, and the wide literature on RGB-basedobjectdetection. This survey presents recent advancements in 2d object detection for the case of UAVs, focusing on the differences, strategies, and trade-offs between the generic problem of objectdetection, and the adaptation of such solutions for operations of the UAV. Moreover, a new taxonomy that considers different heights intervals anddriven by the methodological approaches introduced by the works in the state of the art instead of hardware, physical and/or technological constraints is proposed.
With the rise in popularity of autonomous driving, the speed and accuracy of surrounding objects' detection by in-vehicle sensing technology is becoming increasingly important for autonomous vehicles. Building on ...
详细信息
With the rise in popularity of autonomous driving, the speed and accuracy of surrounding objects' detection by in-vehicle sensing technology is becoming increasingly important for autonomous vehicles. Building on CenterNet, this paper proposes CenterNet-Auto, a new anchor-free detection network for driving scenes that can satisfy the detection speed requirements while ensuring detection accuracy. The network's backbone uses the RepVGG model transformed through structural re-parameterization technology. Features of different scales are fused, and feature pyramids anddeformable convolution are added after the backbone to accurately detect objects of different sizes. To solve the occlusion problem in the driving scene, this paper proposes the Average Border Model, which supports locating the object using the boundary feature information. The test results demonstrate that the proposed algorithm outperforms CenterNet regarding speed and accuracy on the Bdddataset. The accuracy reaches 55.6%, and the speed reaches 30 FPS, meeting the speed and accuracy requirements in a driving scene.
Comprehensive and accurate perception of the real 3d world is the basis of autonomous driving. However, many perceptual methods focus on a single task or object type, and the accuracy of existing multi-task or multi-o...
详细信息
Comprehensive and accurate perception of the real 3d world is the basis of autonomous driving. However, many perceptual methods focus on a single task or object type, and the accuracy of existing multi-task or multi-object methods is difficult to balance against their real-time performance. This paper presents a unified framework for concurrent dynamic multi-object joint perception, which introduces a real-time monocular joint perception network termed MJPNet. In MJPNet relative weightings are automatically learned by a series of developed network branches. By training an end-to-enddeep convolutional neural network on a shared feature encoder and many proposeddecoding sub-branches, the information of the 2d category and 3d position/pose/size of an object are reconstructed both simultaneously and accurately. Moreover, the effective information among subtasks is transferred by multi-stream learning, guaranteeing the accuracy of each task. Compared to various state-of-the-arts, comprehensive evaluations on the benchmark of challenging image sequences demonstrate the superior performance of our 2ddetection and 3d reconstruction of depth, lateral distance, orientation, and heading angle. Moreover, on the KITTI test set, the real-time runtime (up to 15 fps) of MJPNet significantly outran the public state-of-the-art visual detection methods. Accompanying video: https://***/Z-goToOlI94.
deep neural networks play a crucial role in 2d object detection based on visual data, but they are also vulnerable to adversarial samples. Attackers manipulate low-resolution images to execute data poisoning attacks. ...
详细信息
The notion of anchor plays a major role in modern detection algorithms such as the Faster-RCNN or the SSddetector [2]. Anchors relate the features of the last layers of the detector with bounding boxes containing obj...
详细信息
ISBN:
(纸本)9781728188089
The notion of anchor plays a major role in modern detection algorithms such as the Faster-RCNN or the SSddetector [2]. Anchors relate the features of the last layers of the detector with bounding boxes containing objects in images. despite their importance, the literature on objectdetection has not paid real attention to them. The motivation of this paper comes from the observations that (i) each anchor learns to classify and regress candidate objects independently (ii) insufficient examples are available for each anchor in case of small-scale datasets. This paper addresses these questions by proposing a novel hierarchical head for the SSddetector. The new design has the added advantage of no extra weights, as compared to the original design at inference time, while improving detectors performance for small size training sets. Improved performance on PASCAL-VOC and state-of-the-art performance on FlickrLogos-47 validate the method. We also show when the proposeddesign does not give additional performance gain over the original design.
objectdetection in uncrewed aerial vehicle (UAV) images has been a longstanding challenge in the field of computer vision. Specifically, objectdetection in drone images is a complex task due to objects of various sc...
详细信息
objectdetection in uncrewed aerial vehicle (UAV) images has been a longstanding challenge in the field of computer vision. Specifically, objectdetection in drone images is a complex task due to objects of various scales such as humans, buildings, water bodies, and hills. In this paper, we present an implementation of ensemble transfer learning to enhance the performance of the base models for multiscale objectdetection in drone imagery. Combined with a test-time augmentation pipeline, the algorithm combines different models and applies voting strategies to detect objects of various scales in UAV images. The data augmentation also presents a solution to the deficiency of drone image datasets. We experimented with two specific datasets in the open domain: the Visdrone dataset and the AU-AIR dataset. Our approach is more practical and efficient due to the use of transfer learning and two-level voting strategy ensemble instead of training custom models on entire datasets. The experimentation shows significant improvement in the mAP for both Visdrone and AU-AIR datasets by employing the ensemble transfer learning method. Furthermore, the utilization of voting strategies further increases the 3reliability of the ensemble as the end-user can select and trace the effects of the mechanism for bounding box predictions.
An accurate and efficient environment perception system is crucial for intelligent vehicles. This study proposes an optimized2d object detection method utilizing multi-sensor fusion to improve the performance of the ...
详细信息
ISBN:
(纸本)9781728194097
An accurate and efficient environment perception system is crucial for intelligent vehicles. This study proposes an optimized2d object detection method utilizing multi-sensor fusion to improve the performance of the environment perception system. In the sensor fusion module, a depth completion network is used to predict dense depth map, so both dense and sparse RGB-d images can be obtained. Then, an efficient objectdetection baseline is optimized for intelligent vehicles. This method is verified by KITTI 2d object detectiondataset. The experimental results show that the proposed method can be more accurate than many latest methods on KITTI leaderboard. Meanwhile, this method consumes less inference time and shows its high efficiency.
暂无评论