Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the sc...
详细信息
Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the scene from unobserved viewpoints. However, NeRF's computational requirements are prohibitive for real-time applications: rendering views from a trained NeRF requires querying a multilayer perceptron (MLP) hundreds of times per ray. We present a method to train a NeRF, then precompute and store (i.e., "bake") it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables real-time rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF's architecture and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF's ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.
The integration of vision and language has propelled the advancement of artificial intelligence systems. Visual Question Answering (VQA) stands at the nexus of computer vision and natural language processing, enabling...
详细信息
image segmentation is the critical step in different imaging and especially optical inspection applications: detection and recognition of objects, classification, analysis, and identification. Also, image gradient, as...
详细信息
The escalating production of counterfeit notes, facilitated by advancements in color printing and scanning, poses a significant global challenge impacting economies and security. This issue, prevalent in countries lik...
详细信息
This research introduces "Jaddah,"an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The system addresses...
详细信息
machinevisionapplications are commonly utilised in manufacturing lines as low cost, high precision measuring devices. Output facilities can accomplish high production numbers without mistakes thanks to these solutio...
详细信息
The harsh marine atmospheric conditions, including high temperatures, humidity, and salt spray, prevalent in the coastal areas of Hainan, pose a significant challenge to the durability of ground facilities and equipme...
详细信息
ISBN:
(纸本)9780791887806
The harsh marine atmospheric conditions, including high temperatures, humidity, and salt spray, prevalent in the coastal areas of Hainan, pose a significant challenge to the durability of ground facilities and equipment, often resulting in corrosion and functional degradation. Hence, accurate monitoring of corrosion is paramount for maintaining coastal engineering structures. This study leveraged data from the Chinese National Center for Materials Corrosion and Protection Science to extract corrosion morphology features using image analysis techniques. Subsequently, a corrosion identification model was developed using machine learning algorithms. The model's accuracy was validated through various evaluation metrics. The research outcomes have practical applications in the maintenance and management of coastal engineering structures by providing automatic corrosion morphology recognition. This enables maintenance personnel to promptly undertake repair measures, thereby reducing maintenance costs and enhancing structural sustainability.
Out-of-distribution (OOD) learning presents a major challenge in machine learning as models must effectively generalize to previously unseen data. This challenge is prevalent in deep learning models, which tend to foc...
详细信息
Out-of-distribution (OOD) learning presents a major challenge in machine learning as models must effectively generalize to previously unseen data. This challenge is prevalent in deep learning models, which tend to focus on the most dominant features in images. This narrow focus impedes OOD learning, where critical features are concealed or absent during testing, leading to reduced prediction accuracy. To address this issue, we introduce a novel data augmentation approach termed Dominant Feature Masking (DFM), inspired by human visual holistic processing. DFM strategically conceals and reveals the most prominent features within images, allowing neural networks to simultaneously capture both dominant and non-dominant attributes, thereby enhancing adaptability to OOD data. We evaluated DFM using a novel set of learning challenges termed Versatile Evaluation Benchmark (VEB), which assesses model performance on three distinct tasks: (i) augmented MNIST images to test resilience against diverse transformations;(ii) a novel dataset of unseen image classes to examine performance on new instances within familiar categories;and (iii) a dataset created by DALL-E to challenge class differentiation with artificially mixed features. Our results demonstrate that DFM significantly improves OOD generalization compared to traditional augmentation techniques, achieving marked enhancements across various conditions without compromising in-distribution testing accuracy. These findings underscore the potential of DFM to improve the performance of computer vision systems in various real-world scenarios, making them more robust and adaptable to unexpected data variations. By leveraging VEB, researchers will gain a deeper understanding of their models' generalization performance, ensuring that CNNs are well-equipped to handle the complexities of real-world applications. The source code and VEB datasets are available at https://***/Deepvisionary/DFM.
Generative adversarial networks (GANs) have recently become a hot research topic;however, they have been studied since 2014, and a large number of algorithms have been proposed. Nevertheless, few comprehensive studies...
详细信息
Generative adversarial networks (GANs) have recently become a hot research topic;however, they have been studied since 2014, and a large number of algorithms have been proposed. Nevertheless, few comprehensive studies explain the connections among different GAN variants and how they have evolved. In this paper, we attempt to provide a review of the various GAN methods from the perspectives of algorithms, theory, and applications. First, the motivations, mathematical representations, and structures of most GAN algorithms are introduced in detail, and we compare their commonalities and differences. Second, theoretical issues related to GANs are investigated. Finally, typical applications of GANs in imageprocessing and computer vision, natural language processing, music, speech and audio, the medical field, and data science are discussed.
Biological vision systems inspire processing methods in computer visionapplications. This paper employs the insights of vision systems in hardware and presents a pixel-parallel, reconfigurable, and layer-based hierar...
详细信息
Biological vision systems inspire processing methods in computer visionapplications. This paper employs the insights of vision systems in hardware and presents a pixel-parallel, reconfigurable, and layer-based hierarchical architecture for smart image sensors. The architecture aims to bring computation close to the sensor to achieve high acceleration for different machinevisionapplications while consuming low power. We logically divide the image into multiple regions and perform pixel-level and region-level processing after removing spatiotemporal redundancy. Those processors use bio-inspired algorithms to activate the regions with region of interest of a scene. The hierarchical processing breaks the traditional sequential imageprocessing and introduces parallelism for machinevisionapplications. Also, we make the hardware design reconfigurable even after fabrication to make the hardware reusable for different applications. Simulation results show that the area overhead and power penalty for adding reconfigurable features stay in an acceptable range. We emphasize to maximize the operating speed and obtain 800 MHz. Besides, the design saves 84.01% and 96.91% dynamic power at the first and second stages of the hierarchy by removing redundant information. Furthermore, the sequential deployment of high-level reasoning only on the selected regions of the image becomes computationally inexpensive to execute a complex task in real time.
暂无评论