Variations of L1-regularization including, in particular, total variation regularization, have hugely improved computational imaging. However, sharper edges and fewer staircase artifacts can be achieved with convex-co...
详细信息
Variations of L1-regularization including, in particular, total variation regularization, have hugely improved computational imaging. However, sharper edges and fewer staircase artifacts can be achieved with convex-concave regularizers. We present a new class of such regularizers using normal priors with unknown variance (NUV), which include smoothed versions of the logarithm function and smoothed versions of L-p norms with p <= 1. All NUV priors allow variational representations that lead to efficient algorithms for image reconstruction by iterative reweighted descent. A preferred such algorithm is iterative reweighted coordinate descent, which has no parameters (in particular, no step size to control) and is empirically robust and efficient. The proposed priors and algorithms are demonstrated with applications to tomography. We also note that the proposed priors come with built-in edge detection, which is demonstrated by an application to image segmentation.
Stereo matching in foggy scenes is challenging as the scattering effect of fog blurs the image and makes the matching ambiguous. Prior methods deem the fog as noise and discard it before matching. Different from them,...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Stereo matching in foggy scenes is challenging as the scattering effect of fog blurs the image and makes the matching ambiguous. Prior methods deem the fog as noise and discard it before matching. Different from them, we propose to explore depth hints from fog and improve stereo matching via these hints. The exploration of depth hints is designed from the perspective of rendering. The rendering is conducted by reversing the atmospheric scattering process and removing the fog within a selected depth range. The quality of the rendered image reflects the correctness of the selected depth, as the closer it is to the real depth, the clearer the rendered image is. We introduce a fog volume representation to collect these depth hints from the fog. We construct the fog volume by stacking images rendered with depths computed from disparity candidates that are also used to build the cost volume. We fuse the fog volume with cost volume to rectify the ambiguous matching caused by fog. Experiments show that our fog volume representation significantly promotes the SOTA result on foggy scenes by 10% similar to 30% while maintaining a comparable performance in clear scenes.
Trajectory prediction is a crucial step in the pipeline for autonomousdriving because it not only improves the planning of future routes, but also ensures vehicle safety. On the basis of deep neural networks, numerou...
详细信息
Trajectory prediction is a crucial step in the pipeline for autonomousdriving because it not only improves the planning of future routes, but also ensures vehicle safety. On the basis of deep neural networks, numerous trajectory prediction models have been proposed and have already achieved high performance on public datasets due to the well-designed model structure and complex optimization procedure. However, the majority of these methods overlook the fact that vehicles' limited computing resources can be utilized for online real-time inference. We proposed a Lane Transformer to achieve high accuracy and efficiency in trajectory prediction to tackle this problem. On the one hand, inspired by the well-known transformer, we use attention blocks to replace the commonly used Graph Convolution Network (GCN) in trajectory prediction models, thereby drastically reducing the time cost while maintaining the accuracy. In contrast, we construct our prediction model to be compatible with TensorRT, allowing it to be further optimized and easily transformed into a deployment-friendly form of TensorRT. Experiments demonstrate that our model outperforms the baseline LaneGCN model in quantitative prediction accuracy on the Argoverse dataset by a factor of 10x to 25x. Our 7ms inference time is the fastest among all open source methods currently available.
Stereo matching becomes computationally challenging when dealing with a large disparity range. Prior methods mainly alleviate the computation through dynamic cost volume by focusing on a local disparity space, but it ...
ISBN:
(纸本)9798350307184
Stereo matching becomes computationally challenging when dealing with a large disparity range. Prior methods mainly alleviate the computation through dynamic cost volume by focusing on a local disparity space, but it requires many iterations to get close to the ground truth due to the lack of a global view. We find that the dynamic cost volume approximately encodes the disparity space as a single Gaussian distribution with a fixed and small variance at each iteration, which results in an inadequate global view over disparity space and a small update step at every iteration. In this paper, we propose a parameterized cost volume to encode the entire disparity space using multi-Gaussian distribution. The disparity distribution of each pixel is parameterized by weights, means, and variances. The means and variances are used to sample disparity candidates for cost computation, while the weights and means are used to calculate the disparity output. The above parameters are computed through a JS-divergence-based optimization, which is realized as a gradient descent update in a feed-forward differential module. Experiments show that our method speeds up the runtime of RAFT-Stereo by 4 similar to 15 times, achieving real-time performance and comparable accuracy. The code is available at https://***/jiaxiZeng/Parameterized-Cost-Volume-for-Stereo-Matching.
3D lane detection usually builds a dense correspondence between the front-view space and the BEV space to estimate lane points in the 3D space. 3D lanes only occupy a small ratio of the dense correspondence, while mos...
ISBN:
(纸本)9798350307184
3D lane detection usually builds a dense correspondence between the front-view space and the BEV space to estimate lane points in the 3D space. 3D lanes only occupy a small ratio of the dense correspondence, while most correspondence belongs to the redundant background. This sparsity phenomenon bottlenecks valuable computation and raises the computation cost of building a high-resolution correspondence for accurate results. In this paper, we propose a sparse point-guided 3D lane detection, focusing on points related to 3D lanes. Our method runs in a coarse-to-fine manner, including coarse-level lane detection and iterative fine- level sparse point refinements. In coarse-level lane detection, we build a dense but efficient correspondence between the front view and BEV space at a very low resolution to compute coarse lanes. Then in fine-level sparse point refinement, we sample sparse points around coarse lanes to extract local features from the high-resolution front-view feature map. The high-resolution local information brought by sparse points refines 3D lanes in the BEV space hierarchically from low resolution to high resolution. The sparse point guides a more effective information flow and greatly promotes the SOTA result by 3 points on the overall F1-score and 6 points on several hard situations while reducing almost half memory cost and speeding up 2 times.
Stereo matching becomes computationally challenging when dealing with a large disparity range. Prior methods mainly alleviate the computation through dynamic cost volume by focusing on a local disparity space, but it ...
Stereo matching becomes computationally challenging when dealing with a large disparity range. Prior methods mainly alleviate the computation through dynamic cost volume by focusing on a local disparity space, but it requires many iterations to get close to the ground truth due to the lack of a global view. We find that the dynamic cost volume approximately encodes the disparity space as a single Gaussian distribution with a fixed and small variance at each iteration, which results in an inadequate global view over disparity space and a small update step at every iteration. In this paper, we propose a parameterized cost volume to encode the entire disparity space using multi-Gaussian distribution. The disparity distribution of each pixel is parameterized by weights, means, and variances. The means and variances are used to sample disparity candidates for cost computation, while the weights and means are used to calculate the disparity output. The above parameters are computed through a JS-divergence-based optimization, which is realized as a gradient descent update in a feed-forward differential module. Experiments show that our method speeds up the runtime of RAFT-Stereo by 4 ~ 15 times, achieving real-time performance and comparable accuracy. The code is available at https://***/jiaxiZeng/Parameterized-Cost-Volume-for-Stereo-Matching.
3D lane detection usually builds a dense correspondence between the front-view space and the BEV space to estimate lane points in the 3D space. 3D lanes only occupy a small ratio of the dense correspondence, while mos...
3D lane detection usually builds a dense correspondence between the front-view space and the BEV space to estimate lane points in the 3D space. 3D lanes only occupy a small ratio of the dense correspondence, while most correspondence belongs to the redundant background. This sparsity phenomenon bottlenecks valuable computation and raises the computation cost of building a high-resolution correspondence for accurate results. In this paper, we propose a sparse point-guided 3D lane detection, focusing on points related to 3D lanes. Our method runs in a coarse-to-fine manner, including coarse-level lane detection and iterative fine-level sparse point refinements. In coarse-level lane detection, we build a dense but efficient correspondence between the front view and BEV space at a very low resolution to compute coarse lanes. Then in fine-level sparse point refinement, we sample sparse points around coarse lanes to extract local features from the high-resolution front-view feature map. The high-resolution local information brought by sparse points refines 3D lanes in the BEV space hierarchically from low resolution to high resolution. The sparse point guides a more effective information flow and greatly promotes the SOTA result by 3 points on the overall F1-score and 6 points on several hard situations while reducing almost half memory cost and speeding up 2 times.
暂无评论