This paper proposes ConsistDreamer - a novel frame-work that lifts 2D diffusion models with 3D awareness and 3D consistency, thus enabling high-fidelity instruction-guided scene editing. To overcome the fundamental li...
ISBN:
(纸本)9798350353006
This paper proposes ConsistDreamer - a novel frame-work that lifts 2D diffusion models with 3D awareness and 3D consistency, thus enabling high-fidelity instruction-guided scene editing. To overcome the fundamental limitation of missing 3D consistency in 2D diffusion models, our key insight is to introduce three synergistic strategies that augment the input of the 2D diffusion model to become 3D-aware and to explicitly enforce 3D consistency during the training process. Specifically, we design surrounding views as context-rich input for the 2D diffusion model, and generate 3D-consistent structured noise instead of image-independent noise. Moreover, we introduce self-supervised consistency-enforcing training within the per-scene editing procedure. Extensive evaluation shows that our ConsistDreamer achieves state-of-the-art performance for instruction-guided scene editing across various scenes and editing instructions, particularly in complicated large-scale indoor scenes from ScanNet++, with significantly improved sharpness and fine-grained textures. Notably, ConsistDreamer stands as the first work capable of success-fully editing complex (e.g., plaid/checkered) patterns. Our project page is at ***/ConsistDreamer.
This paper has introduced studies on applying reinforcement learning to MAC protocols in wireless networks to enhance network performance. As services and users demanding improved network performance increase, using r...
详细信息
Deep-red organic light-emitting diodes (OLEDs) exhibit significant potential for applications in infrared medical treatment and infrared imaging. However, the OLED performance drops significantly with increasing emiss...
详细信息
This paper introduces PEANUTS, a system designed to improve I/O performance for Single Shared File (SSF) in high-performance computing scenarios. I/O performance evaluation of SSF on 100 nodes demonstrated write speed...
详细信息
ISBN:
(纸本)9783031697654;9783031697661
This paper introduces PEANUTS, a system designed to improve I/O performance for Single Shared File (SSF) in high-performance computing scenarios. I/O performance evaluation of SSF on 100 nodes demonstrated write speeds of 2.47 TB/s, remote read speeds of 2.39 TB/s, and local read speeds of 7.75 TB/s. These outcomes are close to the hardware's performance limits and represent significant improvements of approximately 2.2 times, 2.3 times, and 7.5 times, respectively, compared to existing state-of-the-art systems. A major feature of PEANUTS is the integration of persistent memory with RDMA one-sided communication, supporting high-speed and low-latency data transfers without the need for separate storage servers on the compute nodes. This configuration allows all compute node CPU cores to be fully available for application processing. Seamless integration with the MPI runtime enables rapid data sharing through the MPI-IO interface. The advancements by PEANUTS suggest that utilizing large-scale SSF could solidify as the standard I/O framework in HPC, demonstrating a viable solution to overcome traditional performance limitations.
Reconfigurable hardware accelerators, known for their highperformance and power efficiency, have yet to be fully leveraged for clustering low-dimensional data at realistic scales. In this work, we identify and addres...
详细信息
Although the current point cloud upsampling techniques are already capable of attaining arbitrary upsampling rates af-ter one training. However, during the upsampling process, it may be difficult to preserve the fine ...
详细信息
Human parsing is a fundamental task aimed at segmenting human images into distinct body parts and holds vast potential applications. Nowadays, the advancement of image-capturing devices has led to a growing number of ...
详细信息
ISBN:
(纸本)9798400709029
Human parsing is a fundamental task aimed at segmenting human images into distinct body parts and holds vast potential applications. Nowadays, the advancement of image-capturing devices has led to a growing number of high-resolution human images. Receptive field, detail loss and memory usage are a triplet of contradictions in high-resolution scenarios. Existing human parsing methods designed for low-resolution inputs struggle to process high-resolution images efficiently due to their massive demands for computation and memory. Some methods save resources by overwhelmingly downsampling or encoding high-resolution inputs at the cost of poor performance on details. To resolve the issues above, we propose the Bilateral Edge-Perceiving Network (BiEPNet), consisting of a resources-friendly semantic-perceiving branch to acquire sufficient global information and a simple yet effective edge-perceiving branch used to refine details. The attention mechanism is utilized to simultaneously enhance the perception of context and details, leading to better performance on the boundary regions. To verify the effectiveness of BiEPNet, we contribute a high-resolution human parsing dataset, Human4K, containing 4,000 images with more than five million pixels. Extensive experiments on Human4K demonstrate that our method effectively outperforms the state-of-the-art methods.
The PETSc (Portable, Extensible Toolkit for Scientific Computation) library is one of the fundamental general-purpose numerical libraries in high-performance computing environments. It is widely employed for solving p...
详细信息
Graph analytics has become a major workload in recent years. The underlying core algorithms tend to be irregular and data dependent, making them challenging to parallelize. Yet, these algorithms can be implemented and...
详细信息
LiDAR-based 3D object detectors usually adopt grid-based approaches to handle sparse point clouds efficiently. However, during this process, the down-sampled features inevitably lose spatial information, which can hin...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
LiDAR-based 3D object detectors usually adopt grid-based approaches to handle sparse point clouds efficiently. However, during this process, the down-sampled features inevitably lose spatial information, which can hinder the detectors from accurately predicting the location and size of objects. To address this issue, previous researches proposed sophisticatedly designed neck and head modules to effectively compensate for information loss. Inspired by the core insights of previous studies, we propose a novel voxel-based 3D object detector, named as Re-VoxelDet, which combines three distinct components to achieve both good detection capability and real-time performance. First, in order to learn features from diverse perspectives without additional computational costs during inference, we introduce Multi-view Voxel Backbone (MVBackbone). Second, to effectively compensate for abundant spatial and strong semantic information, we design Hierarchical Voxel-guided Auxiliary Neck (HVANeck), which attentively integrates hierarchically generated voxel-wise features with RPN blocks. Third, we present Rotation-based Group Head (RGHead), a simple yet effective head module that is designed with two groups according to the heading direction and aspect ratio of the objects. Through extensive experiments on the Argoverse2, Waymo Open Dataset and nuScenes, we demonstrate the effectiveness of our approach. Our results significantly outperform existing state-of-the-art methods. We plan to release our model and code(1) in the near future.
暂无评论