image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate featur...
详细信息
ISBN:
(纸本)9781728185514
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a c...
详细信息
ISBN:
(纸本)9781728185514
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naive subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
Many point-of-care tests rely on visual changes in color, shape, and size to convey results that can be read by the naked eye. One category of such tests is an agglutination test (AT), which relies on the clumping of ...
详细信息
ISBN:
(纸本)9781665416474
Many point-of-care tests rely on visual changes in color, shape, and size to convey results that can be read by the naked eye. One category of such tests is an agglutination test (AT), which relies on the clumping of micro-particles or cells in the presence of a target analyte. Although visual inspection is convenient and fast, it is subjective, prone to errors, and limits decision-making to coarse-grained results. We present an open-source software framework designed to facilitate the development and interpretation of ATs. This framework includes a web-based annotation interface for curating new image datasets, a computer vision pipeline that extracts informative AT features, and a machine learning module that allows AT developers to study how an AT agglutinates over time during future experiments. We present two case studies of our framework being used to develop and interpret tests.
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited qual...
详细信息
ISBN:
(纸本)9781728185514
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited quality improvement. Direct using of denoising methods causes coding performance degradation, and hence not suitable for video coding scenario. In this work, we propose a video pre-processing approach by leveraging edge preserving filter specifically designed for video coding, of which filter parameters are optimized in the sense of rate-distortion (R-D) performance. The proposed pre-processing method removes low R-D cost-effective components for video encoder while keeping important structural components, leading to higher coding efficiency and also better subjective quality. Comparing with the conventional denoising filters, our proposed pre-processing method using the R-D optimized edge preserving filter can improve the coding efficiency by up to -5.2% BD-rate with low computational complexity.
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations amo...
详细信息
ISBN:
(纸本)9781728185514
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a nonlinear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.
In order to address the issue that the center-free fuzzy c-means (CFFCM) clustering algorithm does not consider the texture features and spatial information of pixels, and the time complexity is too high, a center-fre...
详细信息
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representa...
详细信息
ISBN:
(纸本)9781728185514
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representation Reconstruction Network (ATSR-Net), composed of the Convolutional Sparse Representation subnet (CSR-subnet) and the truly Adaptive Threshold Generation subnet (ATG-subnet). The traditional iterations are unfolded into several CSR-subnets, which can fully exploit the local and nonlocal similarities. The ATG-subnet automatically determines a threshold map based on the image intrinsic characterization for flexible feature selection. Moreover, we present a three-level consistency loss based on pixel-level, measurement-level, and feature-level, to accelerate the network convergence. Extensive experiment results demonstrate the superiority of the proposed network to the existing state-of-the-art methods by large margins, both quantitatively and qualitatively.
With the blooming of deep learning technology in computer vision, the integration of deep learning and the traditional video coding has made significant improvements, especially applying the super-resolution neural ne...
详细信息
ISBN:
(纸本)9781728185514
With the blooming of deep learning technology in computer vision, the integration of deep learning and the traditional video coding has made significant improvements, especially applying the super-resolution neural network as the post-processing module in the down-sampling-based video compression framework. However, the pre-processing module lacks back-propagated gradients for jointly considering down-sampling and up-sampling due to the non-differentiability of the traditional video codec. In this paper, we propose an end-to-end down-sampling-based video compression framework applying convolutional neural networks both as down-sampling and up-sampling. We use a virtual codec neural network to approximate the actual video codec so that the gradient can be effectively back-propagated for joint training. Experimental results show the superiority of our proposed framework compared with the predefined down-sampling-based video compression and various methods of joint training.
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely ma...
详细信息
ISBN:
(纸本)9781728185514
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide ...
详细信息
ISBN:
(纸本)9781728185514
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide users with immersive experience. However, due to the lack of depth information in panoramic image, it is still a challenge to convert panorama to 6-DOF VR. In this paper, we propose a new depth estimation method SPCNet based on spherical convolution to solve the problem of depth information restoration of panoramic image. Particularly, spherical convolution is introduced to improve depth estimation accuracy by reducing distortion, which is attributed to Equi-Rectangular Projection (ERP). The experimental results show that many indicators of SPCNet are better than other advanced networks. For example, RMSE is 0.419 lower than UResNet. Moreover, the threshold accuracy of depth estimation has also been improved.
暂无评论