image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate featu...
详细信息
ISBN:
(纸本)9781728173221
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a c...
详细信息
ISBN:
(纸本)9781728173221
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In...
详细信息
ISBN:
(纸本)9781728173221
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.
Plenoptic 2.0 videos that record time-varying light fields by focused plenoptic cameras are prospective to immersive visual applications due to capturing dense sampled light fields with high spatial resolution in the ...
详细信息
ISBN:
(纸本)9781728173221
Plenoptic 2.0 videos that record time-varying light fields by focused plenoptic cameras are prospective to immersive visual applications due to capturing dense sampled light fields with high spatial resolution in the rendered sub-apertures. In this paper, an intra prediction method is proposed for compressing multi-focus plenoptic 2.0 videos efficiently. Based on the estimation of zooming factor, novel gradient-feature-based zooming, adaptive-bilinear-interpolation-based tailoring and inverse-gradient-based boundary filtering are proposed and executed sequentially to generate accurate prediction candidates for weighted prediction working with adaptive skipping strategy. Experimental results demonstrate the superior performance of the proposed method relative to HEVC and state-of-the-art methods.
Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly ...
详细信息
ISBN:
(纸本)9781728173221
Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. A UHD video dataset, referred to as Tencent Video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. This dataset contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely ma...
详细信息
ISBN:
(纸本)9781728173221
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.
Light field displays project hundreds of micro-parallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional ...
详细信息
ISBN:
(纸本)9781728173221
Light field displays project hundreds of micro-parallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.
In many imageprocessing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in on...
详细信息
ISBN:
(纸本)9781728173221
In many imageprocessing tasks it occurs that pixels or blocks of pixels are missing or lost in only some channels. For example during defective transmissions of RGB images, it may happen that one or more blocks in one color channel are lost. Nearly all modern applications in imageprocessing and transmission use at least three color channels, some of the applications employ even more bands, for example in the infrared and ultraviolet area of the light spectrum. Typically, only some pixels and blocks in a subset of color channels are distorted. Thus, other channels can be used to reconstruct the missing pixels, which is called spatio-spectral reconstruction. Current state-of-the-art methods purely rely on the local neighborhood, which works well for homogeneous regions. However, in high-frequency regions like edges or textures, these methods fail to properly model the relationship between color bands. Hence, this paper introduces non-local filtering for building a linear regression model that describes the inter-band relationship and is used to reconstruct the missing pixels. Our novel method is able to increase the PSNR on average by 2 dB and yields visually much more appealing images in high-frequency regions.
The proceedings contain 36 papers. The special focus in this conference is on communications, Signal processing and VLSI. The topics include: Design and Analysis of Wearable Step-Shaped Sierpinski Fractal Antenna for ...
ISBN:
(纸本)9789813340572
The proceedings contain 36 papers. The special focus in this conference is on communications, Signal processing and VLSI. The topics include: Design and Analysis of Wearable Step-Shaped Sierpinski Fractal Antenna for WBAN Applications;Evaluation of Real-Time Embedded Systems in HILS and Delay Issues;acoustic Feedback Cancellation Using Optimal Step-Size Control of the Partition Block Frequency-Domain Adaptive Filter;A Compact IPD Based on-Chip Bandpass Filter for 5G Radio Applications;An 18-Bit Incremental Zoom ADC with Reduced Delay Overhead Data Weighted Averaging for Battery Measurement Systems;a Non-Uniform Digital Filter Bank Hearing Aid for Aged People with Hearing Disability;optimal Resource Allocation Based on Particle Swarm Optimization;Design and Simulation of a W-Band FMCW Radar for Cloud Profiling Application;Single Layer Asymmetric Slit Circularly Polarized Patch Antenna for WLAN Application;recognition of Natural and Computer-Generated images Using Convolutional Neural Network;contour and Texture-Based Approaches for Dental Radiographic and Photographic images in Forensic Identification;an Efficient Low Latency Router Architecture for Mesh-Based NoC;high-Performance imagevisual Feature Detection and Matching;systolic-Architecture-Based Matrix Multiplications and Its Realization for Multi-Sensor Bias Estimation Algorithms;fast Encoding Using X-Search Pattern and Coded Block Flag Fast Method;Energy Efficient and Accurate Hybrid CSA-CIA Adders;machine Learning Oriented Dynamic Cost Factors-Based Routing in Communication Networks;Design of Look-Up Table with Low Leakage Power Enhanced Stability 10 T SRAM Cell;Conformal Omni Directional Antenna for GPS Applications;total Internal Reflection Quasi-Phase-Matching-Based Broadband Second Harmonic Generation in Isotropic Semiconductor: A Comparative Analysis;automated Bayesian Drowsiness Detection System Using Recurrent Convolutional Neural Networks.
In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the ...
详细信息
ISBN:
(纸本)9781728173221
In edge-cloud collaborative intelligence (CI), an unreliable transmission channel exists in the information path of the AI model performing the inference. It is important to be able to simulate the performance of the CI system across an imperfect channel in order to understand system behavior and develop appropriate error control strategies. In this paper we present a simulation framework called DFTS2, which enables researchers to define the components of the CI system in TensorFlow 2, select a packet-based channel model with various parameters, and simulate system behavior under various channel conditions and error/loss control strategies. Using DFTS2, we also present the most comprehensive study to date of the packet loss concealment methods for collaborative image classification models.
暂无评论