Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering th...
详细信息
ISBN:
(纸本)9781728185514
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel "non-Lambertian maps", without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate featur...
详细信息
ISBN:
(纸本)9781728185514
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a c...
详细信息
ISBN:
(纸本)9781728185514
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naive subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
Many point-of-care tests rely on visual changes in color, shape, and size to convey results that can be read by the naked eye. One category of such tests is an agglutination test (AT), which relies on the clumping of ...
详细信息
ISBN:
(纸本)9781665416474
Many point-of-care tests rely on visual changes in color, shape, and size to convey results that can be read by the naked eye. One category of such tests is an agglutination test (AT), which relies on the clumping of micro-particles or cells in the presence of a target analyte. Although visual inspection is convenient and fast, it is subjective, prone to errors, and limits decision-making to coarse-grained results. We present an open-source software framework designed to facilitate the development and interpretation of ATs. This framework includes a web-based annotation interface for curating new image datasets, a computer vision pipeline that extracts informative AT features, and a machine learning module that allows AT developers to study how an AT agglutinates over time during future experiments. We present two case studies of our framework being used to develop and interpret tests.
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited qual...
详细信息
ISBN:
(纸本)9781728185514
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited quality improvement. Direct using of denoising methods causes coding performance degradation, and hence not suitable for video coding scenario. In this work, we propose a video pre-processing approach by leveraging edge preserving filter specifically designed for video coding, of which filter parameters are optimized in the sense of rate-distortion (R-D) performance. The proposed pre-processing method removes low R-D cost-effective components for video encoder while keeping important structural components, leading to higher coding efficiency and also better subjective quality. Comparing with the conventional denoising filters, our proposed pre-processing method using the R-D optimized edge preserving filter can improve the coding efficiency by up to -5.2% BD-rate with low computational complexity.
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations amo...
详细信息
ISBN:
(纸本)9781728185514
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a nonlinear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.
In order to address the issue that the center-free fuzzy c-means (CFFCM) clustering algorithm does not consider the texture features and spatial information of pixels, and the time complexity is too high, a center-fre...
详细信息
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representa...
详细信息
ISBN:
(纸本)9781728185514
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representation Reconstruction Network (ATSR-Net), composed of the Convolutional Sparse Representation subnet (CSR-subnet) and the truly Adaptive Threshold Generation subnet (ATG-subnet). The traditional iterations are unfolded into several CSR-subnets, which can fully exploit the local and nonlocal similarities. The ATG-subnet automatically determines a threshold map based on the image intrinsic characterization for flexible feature selection. Moreover, we present a three-level consistency loss based on pixel-level, measurement-level, and feature-level, to accelerate the network convergence. Extensive experiment results demonstrate the superiority of the proposed network to the existing state-of-the-art methods by large margins, both quantitatively and qualitatively.
暂无评论