Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semant...
ISBN:
(纸本)9798350307184
Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semantic map) to perform planning, which is computationally intensive and misses the instance-level structure information. In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation. The proposed vectorized paradigm has two significant advantages. On one hand, VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints which effectively improves planning safety. On the other hand, VAD runs much faster than previous end-to-end planning methods by getting rid of computation-intensive rasterized representation and hand-designed post-processing steps. VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, outperforming the previous best method by a large margin. Our base model, VAD-Base, greatly reduces the average collision rate by 29.0% and runs 2.5x faster. Besides, a lightweight variant, VAD-Tiny, greatly improves the inference speed (up to 9.3x) while achieving comparable planning performance. We believe the excellent performance and the high efficiency of VAD are critical for the real-world deployment of an autonomous driving system. Code and models are available at https://***/hustvl/VAD for facilitating future research.
The current low level of intelligence in the main transportation process of coal mines, traditional protection devices can only passively shut down after accidents occur. Although they prevent further losses from expa...
详细信息
In autonomous driving, it is crucial to capture the driving intentions of other vehicles on the road, which can then be used for the autonomous driving vehicle to plan a safe route. This study proposes a system to ide...
详细信息
ISBN:
(纸本)9798400708428
In autonomous driving, it is crucial to capture the driving intentions of other vehicles on the road, which can then be used for the autonomous driving vehicle to plan a safe route. This study proposes a system to identify the driving intention of other vehicles from their taillight signals. To achieve this goal, both the positions of tail-lights (i.e., spatial features) and the change of the status of taillights over time (i.e., temporal features) need to be properly extracted and recognized. In our system, a longer sequence of 32 frames is used as input to capture the complete change of taillights. In addition, a transfer-learned classical convolutional neural network and a light-weight WaveNet are adopted to extract spatial and temporal features of the input sequence, respectively. Moreover, the dataset is augmented to ensure the convergence of model training. The experiment results indicate that our system outperforms the state of the art approaches in taillight recognition.
Navigating delivery robot along the sidewalk safely and robustly in a campus environment is extremely challenging due to the narrow motion space, appearance changes and unstable GPS localization signal under canopies ...
详细信息
ISBN:
(纸本)9781665476874
Navigating delivery robot along the sidewalk safely and robustly in a campus environment is extremely challenging due to the narrow motion space, appearance changes and unstable GPS localization signal under canopies of trees, etc. To that end, we have completed a systematic implementation for delivery robot sidewalk navigation, where a robust vision based navigation algorithm has been proposed. And it consists of three main modules: sidewalk segmentation, costmap generation and motion planning. More Specifically, the first module is to find the drivable area of the surrounding environment, where an image-based segmentation neural network has been developed to extract where the robot can traverse. Since it only takes as input immediate and local sensory data, thus releasing the high dependence on a prior map. Then, an inverse perspective mapping follows to generate a bird-eye-view of the drivable area and constructs the local occupancy grid map intuitively. Next, two different motion planners, control-based primitives (Dynamic Window Approach) and state-based primitives (state lattice planner), have been adopted to generate a trajectory candidate for navigating the robot along the sidewalk. Both simulation and real-world sidewalk navigation experiments have been conducted to test and evaluate their performance. The results show that our algorithm can precisely extract the sidewalk area for traversing, and the state-based primitive planner demonstrates superior performance in terms of trajectory length and time cost, achieving 14.3% and 18.7% improvement compared with control-based primitive planner.
The high bandwidth required for gradient exchange is a bottleneck for the distributed training of large transformer models. Most sparsification approaches focus on gradient compression for convolutional neural network...
详细信息
ISBN:
(纸本)9781728198354
The high bandwidth required for gradient exchange is a bottleneck for the distributed training of large transformer models. Most sparsification approaches focus on gradient compression for convolutional neural networks (CNNs) optimized by SGD. In this work, we show that performing local gradient accumulation when using Adam to optimize transformers in distributed fashion leads to a misled optimization direction and we address this problem by accumulating the optimization direction locally. We also empirically demonstrate most sparse gradients do not overlap and thus show that sparsification is comparable to an asynchronous update. Our experiments with classification and segmentation tasks show that our method can still maintain the correct optimization direction in distributed training event under highly sparse updates.
The recognition of supermarket products on mobile devices is gaining importance as more and more consumers seek to make informed decisions about their purchases in real time. However, the realization is often difficul...
详细信息
ISBN:
(纸本)9781728198354
The recognition of supermarket products on mobile devices is gaining importance as more and more consumers seek to make informed decisions about their purchases in real time. However, the realization is often difficult due to the vast product assortments of modern supermarkets and the limited computational resources available on mobile devices. In this work, we propose a real-time on-device product recognition pipeline, based on the Global Trade Item Number (GTIN) system, that is both robust to dynamic changes in the product assortment and scalable to tens of thousands of products. We evaluate detection performance on SKU110k and R6k datasets and demonstrate the scalability of our pipeline with 5974 different products, using synthetic data. Furthermore, the proposed product recognition pipeline is deployed on a Google Pixel 6 mobile phone, where it achieves an inference time of 121ms (8.3fps), demonstrating its real-time capabilities in practice.
Image stitching aims to combine two images with overlapping fields to expand the field-of-view (FoV). However, the stitched images of existing methods are irregular, and need to be processed by rectangling methods, wh...
详细信息
ISBN:
(纸本)9781728198354
Image stitching aims to combine two images with overlapping fields to expand the field-of-view (FoV). However, the stitched images of existing methods are irregular, and need to be processed by rectangling methods, which is time-consuming and prone to be unnatural. In this paper, we propose the first end-to-end framework, Rectangular-output Deep Image Stitching Network (RDISNet), to directly stitch two images into a standard rectangular image while learning color consistency between image pairs and maintaining the authenticity of the content. To further preserve the structure of large objects in the stitched image, we design a dilated BN-RCU block to expand the receptive field of RDISNet for extracting enriched spatial context. Furthermore, we design a novel data synthesis pipeline and build the first rectangular-output deep image stitching dataset (RDIS-D) for jointing image stitching and rectangling. Experimental results demonstrate that RDISNet performs favorably against the state-of-the-art methods.
The goal of a landmine detection robot is to map the remaining area on a visual map with millimeter precision after covering as much ground as it can show landmines. The prototype land mine detection robot model descr...
详细信息
ISBN:
(数字)9798350379525
ISBN:
(纸本)9798350379532
The goal of a landmine detection robot is to map the remaining area on a visual map with millimeter precision after covering as much ground as it can show landmines. The prototype land mine detection robot model described in this study offers a visual interface for mapping landmines, modifying PIDs, and aligning cameras. It is also reasonably powerful, accurate, and easy to operate with other sensors. The manual, semi-auto, and auto modes of controlling the differential drive robot are emphasized. Image processing determines the exact location of the robot and provides live reckoning feedback to its dead reckoning servo control. A beat metal detector is a kind of quiet sensor used to locate landmines. The overall objective of the system is to provide the user with something robust, affordable, and easily comprehensible for the user.
The social applications of robots possess intrinsic challenges with respect to social paradigms and heterogeneity of different groups. These challenges can be in the form of social acceptability, anthropomorphism, lik...
详细信息
ISBN:
(纸本)9798350327458
The social applications of robots possess intrinsic challenges with respect to social paradigms and heterogeneity of different groups. These challenges can be in the form of social acceptability, anthropomorphism, likeability, past experiences with robots etc. In this paper, we have considered a group of neurotypical adults to describe how different voices and motion types of the NAO robot can have effect on the perceived safety, anthropomorphism, likeability, animacy, and perceived intelligence of the robot. In addition, prior robot experience has also been taken into consideration to perform this analysis using a one-way Analysis of Variance (ANOVA). Further, we also demonstrate that these different modalities instigate different physiological responses in the person. This classification has been done using two different deep learning approaches, 1) Convolutional Neural Network (CNN), and 2) Gramian Angular Fields on the Blood Volume Pulse (BVP) data recorded. Both of these approaches achieve better than chance accuracy (>25%) for a 4 class classification.
This paper discusses the robustness of executing robot tasks in contact with the environment. For example, in assembly, even the slightest error in the initial pose of the assembled object or grasp uncertainties can l...
详细信息
ISBN:
(纸本)9798350335170
This paper discusses the robustness of executing robot tasks in contact with the environment. For example, in assembly, even the slightest error in the initial pose of the assembled object or grasp uncertainties can lead to large contact forces and, consequently, failure of the assembly operation. Force control can help to improve the robustness only to a certain extent. In this work, we propose using the position and orientation invariant task representation to increase the robustness of assembly and other tasks in continuous contact with the environment. We developed a variable compliance controller which constantly adapts the policy to environmental changes, such as positional and rotational displacements and deviations in the geometry of the assembled part. In addition, we combined ergodic control and visionprocessing to improve the detection of the assembled object's initial pose. The proposed framework has been experimentally validated in two challenging tasks;The first example is a mock-up of an assembly operation, where the object moves along a rigid wire, and the second is the insertion of a car light bayonet bulb into the housing.
暂无评论