Neural Radiance Fields (NeRF) have demonstrated exceptional performance in generating novel views of scenes by learning implicit volumetric representations from calibrated RGB images, without depth information. A majo...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Neural Radiance Fields (NeRF) have demonstrated exceptional performance in generating novel views of scenes by learning implicit volumetric representations from calibrated RGB images, without depth information. A major limitation is the need for large training datasets in neural network-based view synthesis frameworks. The challenge of effective data augmentation for view synthesis remains unresolved. NeRF models require extensive scene coverage from multiple views to accurately estimate radiance and density. Insufficient coverage reduces the model's ability to interpolate or extrapolate unseen parts of the scene effectively. In this paper, we propose a novel pipeline to address this data augmentation issue using depth map information. We use depth image-based rendering (DIBR) to overcome the lack of enough views for training NeRF. Experimental results indicate that our approach enhances the quality of rendered images using the NeRF framework, achieving an average peak signal-to-noise ratio (PSNR) increase of 7.2 dB, with a maximum improvement of 12 dB.
We are living in the Information Age, and information has become a critically important component of our life. Due to the success of the Internet, the amount of available information, including immense volumes of visu...
详细信息
ISBN:
(纸本)9789612480363
We are living in the Information Age, and information has become a critically important component of our life. Due to the success of the Internet, the amount of available information, including immense volumes of visual information, is growing explosively. Therefore means for its faultless circulation and handling are urgently required. Considerable research efforts are dedicated today to address this necessity, but they are seriously hampered by the lack of a common agreement about "What actually is visual information?" Without answering this question, all our remarkable efforts inevitably end up as a plain alchemy. I am trying to rind out a remedy for this bizarre and absurd situation. I propose my own definition of information (derived from the Kolmogorov's complexity theory), and from this point of view, attempt to revise the state of the art of contemporary imageprocessing convention.
In traditional asymmetric stereo video encoding scheme, one eye is represented with high quality sequence, the other eye is represented with lower quality one. However, if the low quality view is the observer's do...
详细信息
ISBN:
(纸本)9781424448562
In traditional asymmetric stereo video encoding scheme, one eye is represented with high quality sequence, the other eye is represented with lower quality one. However, if the low quality view is the observer's dominant eye, the masking effect will not work. Based on this human visual characteristic, this paper proposed a GOP-based resolution cross-switching asymmetric encoding scheme. By allocating degradation to both of views in a balanced way over time, our experimental results show better compression efficiency than JMVM reference software and better subjective visual quality than the traditional asymmetric stereo video encoding scheme. Our stereo video coding scheme can be a trade-off between compression performance and subjective visual quality.
Advances in image quality assessment have shown the potential added value of including visual attention aspects in objective quality metrics. Numerous models of visual saliency are implemented and integrated in differ...
详细信息
ISBN:
(纸本)9781479961399
Advances in image quality assessment have shown the potential added value of including visual attention aspects in objective quality metrics. Numerous models of visual saliency are implemented and integrated in different quality metrics;however, their ability of improving a metric's performance in predicting perceived image quality is not fully investigated. In this paper, we conduct an exhaustive comparison of 20 state-of-the-art saliency models in the context of image quality assessment. Experimental results show that adding computational saliency is beneficial to quality prediction in general terms. However, the amount of performance gain that can be obtained by adding saliency in quality metrics highly depends on the saliency model and on the metric.
We have been investigating the sensors measuring traffic flow with imageprocessing technology (We call them "image sensors".) for a few years. Through this investigation we have the conclusion that the work...
详细信息
ISBN:
(纸本)0780370805
We have been investigating the sensors measuring traffic flow with imageprocessing technology (We call them "image sensors".) for a few years. Through this investigation we have the conclusion that the work those who is engaged in road management should do for introducing them is to evaluate the validity of measured values and to clarify their purposes. In this paper we show some examples of that work using the velocity of cars measured by image sensors.
We present a robust and portable visual-based skin and face detection system developed for use in a multiple speaker teleconferencing system, employing both audio and video cues. An omni-directional video sensor is us...
详细信息
ISBN:
(纸本)0780367251
We present a robust and portable visual-based skin and face detection system developed for use in a multiple speaker teleconferencing system, employing both audio and video cues. An omni-directional video sensor is used to provide a view of the entire visual hemisphere, thereby allowing for multiple dynamic views of all the participants. Regions of skin are detected using simple statistical methods, along with histogram color models for both skin and non-skin color classes. Regions of skin belonging to the same person are grouped together, and using simple spatial properties, the position of each person's face is inferred. Preliminary results suggest the system is capable of detecting human faces present in an omni-directional image despite the poor resolution inherent with such an omni-directional sensor.
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. ...
详细信息
ISBN:
(纸本)9781728185514
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. However, it is difficult to find repeating patterns which are illuminated under different lighting conditions and viewed from different perspectives. To solve the problem, we first identify the regularity unit of repeating patterns by statistics. Then we can use the regularity unit for shift-map optimization to obtain a better resized image. The experimental results show that our method is competitive with other well-known methods.
Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of obje...
详细信息
ISBN:
(纸本)9781728180687
Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.
Printed circuit board (PCB) assemblies in everyday electronic devices are mass-produced. As a result of this production volume, a fast way of visual inspection is necessary. An integral part of visual inspection syste...
详细信息
ISBN:
(纸本)9798350343557
Printed circuit board (PCB) assemblies in everyday electronic devices are mass-produced. As a result of this production volume, a fast way of visual inspection is necessary. An integral part of visual inspection systems is PCB component classification. In this paper, we have explored use of the Vision Transformer (ViT), which is a recent state-of-the-art image classification approach, for PCB component classification. We have employed several ViT models that are available in the literature and also proposed a new compact, efficient, and high performing ViT model, named as ViT-Mini. We have conducted extensive experiments on the FICS-PCB dataset in order to comparatively evaluate the ViT models' performance. The highest achieved accuracy is 99.46% for capacitor and resistor classification and 96.52% for classification of capacitor, resistor, inductor, transistor, diode, and IC. The proposed compact model's performance is comparable with the ones obtained with larger models, which indicates its suitability for real-time applications.
Compared to RGB images non-linearly mapped from RAW data through the image Signal Processor (ISP), RAW data are linear to scene radiance and contain more native information, which is better to be modeled in many visio...
详细信息
ISBN:
(纸本)9781665475921
Compared to RGB images non-linearly mapped from RAW data through the image Signal Processor (ISP), RAW data are linear to scene radiance and contain more native information, which is better to be modeled in many vision tasks. This work proposes to enhance low-light images in the RAW domain via a cross-scale framework using paired Fast Fourier Convolution (FFC) and Transformer, driving the network to characterize images effectively. The entire framework has three scales to abstract low-level, mid-level, and high-level representations of input images. We embed paired FFC and Transformer in each scale to attain spatial-spectral information extraction and aggregation. Specifically, by transforming features from the spatial domain into the spectral domain with FFC, pixel correlations can be effectively exploited locally and globally, generating representative features for the input image. Immediately, the Transformer using multi-head self-attention mechanism is applied to aggregate and embed important features. Experimental results demonstrate that our method significantly outperforms state-of-the-art low-light enhancement works in both full reference assessment metrics, including PSNR, MPSNR, and SSIM, and no-reference metrics, such as NIMA. Meanwhile, the perceptual quality of the proposed method is more visually pleasing than that of other methods.
暂无评论