Fusing LiDAR and camera information is essential for accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and sema...
Fusing LiDAR and camera information is essential for accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as “seeds”) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques. However, depth information is under-investigated in these approaches when lifting points into 3D space, thus 2D semantics can not be reliably fused with 3D points. Moreover, their multi-modal fusion strategy, which is implemented as concatenation or attention, either can not effectively fuse 2D and 3D information or is unable to perform fine-grained interactions in the voxel space. To this end, we propose a novel framework with better utilization of the depth information and fine-grained cross-modal interaction between LiDAR and camera, which consists of two important components. First, a Multi-Depth Unprojection (MDU) method is used to enhance the depth quality of the lifted points at each interaction level. Second, a Gated Modality-Aware Convolution (GMA-Conv) block is applied to modulate voxels involved with the camera modality in a fine-grained manner and then aggregate multi-modal features into a unified space. Together they provide the detection head with more comprehensive features from LiDAR and camera. On the nuScenes test benchmark, our proposed method, abbreviated as MSMD-Fusion, achieves state-of-the-art results on both 3D object detection and tracking tasks without using test-time-augmentation and ensemble techniques. The code is available at https://***/SxJyJay/MSMDFusion.
Neural networks for visual content understanding have recently evolved from convolutional ones to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local...
Neural networks for visual content understanding have recently evolved from convolutional ones to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models which utilize both techniques. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention without considering the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) that treats convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former encoder equally splits feature channels into half to fit dual-path inputs. Then, the outputs of the dual-path are fused with weights calculated from visual cues. We also design a compact convolutional path from a concern of efficiency. Extensive experiments on standard benchmarks show that our ASF-former outperforms its CNN, transformer, and hybrid counterparts in terms of accuracy (83.9% on ImageNet-1K), under similar conditions (12.9G MACs / 56.7M Params, without large-scale pre-training). The code is available at: https://***/szx503045266/ASF-former.
Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computa...
详细信息
Event Extraction involves extracting event-related information such as event types and event arguments from context, which has long been tackled through well-designed neural networks or fine-tuned pre-trained language...
详细信息
In this paper, a dual-band antenna for 5G communication based on an antenna design with self-decoupling properties is *** antenna is composed of a self-decoupled antenna unit vertically placed on an 30×80 mm 2 g...
详细信息
In this paper, a dual-band antenna for 5G communication based on an antenna design with self-decoupling properties is *** antenna is composed of a self-decoupled antenna unit vertically placed on an 30×80 mm 2 ground plane and printed on 0.8 mm thick FR-4(εr = 4.4, tanδ = 0.02) substrate,and then added a pair of L-shaped coupling feeder structures on this *** shows that the antenna has two operating frequency bands of 3.3-4.2GHz and 4.8-5GHz, and has good transmission and isolation in the above two operating frequency ***,it also has the advantages of small size, self-decoupling,high isolation,simple structure and easy *** antenna can be used as 5G mobile phone communication antenna unit.
The study of electromagnetic scattering from Gaussian rough surface is of great significance in radar reconnaissance, target tracking and ocean remote sensing. The moment method (MOM) is a commonly used method with hi...
详细信息
Self-dual codes have been studied actively because they are connected with mathematical structures including block designs and lattices and have practical applications in quantum error-correcting codes and secret shar...
详细信息
In this paper, a RIS-assisted multiuser MIMO communication method based on deep reinforcement learning (RMMC-DRL) is proposed for multiuser scenarios. Our objective is to find the optimal transmit beamforming matrix o...
In this paper, a RIS-assisted multiuser MIMO communication method based on deep reinforcement learning (RMMC-DRL) is proposed for multiuser scenarios. Our objective is to find the optimal transmit beamforming matrix of BS and optimal phase shift matrix of reflective intelligent surface (RIS) to maximize the sum rate of multiuser, this problem is reduced into a constrained optimization problem. It is a non-convex optimization problem, so we solve it through deep reinforcement learning (DRL) and then use the results for communication. In the DRL, a deep deterministic policy gradient (DDPG) framework that can handle continuous states and actions is designed, reward is set as optimization goal, and the transmit beamforming matrix and the phase shift matrix of RIS are obtained through the interaction with environment. Unlike the alternating optimization (AO) method, which solve the transmit beamforming matrix and the RIS phase shift matrix alternatively, the RMMC-DRL can obtain both transmit beamforming matrix and RIS phase shift matrix simultaneously as the output of DRL. Simulation results show that RMMC-DRL can learn and improve its behavior by interacting with the environment. Compared with AO method, RMMC-DRL can obtain higher sum rate and lower computational complexity.
In this letter, an ultra-broadband rectifier with expanded dynamic input power range (IPR) for both wireless power transfer (WPT) and radio frequency energy-harvesting (RFEH) is proposed and analyzed. Expanded dynamic...
详细信息
A unified affine-projection-like adaptive (UAPLA) algorithm is deivised and verified for system identification. The UAPLA algorithm uses a generalized cost function encompassing some data-reusing methods to cope with ...
详细信息
暂无评论