This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food safety detection. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional v...
详细信息
ISBN:
(纸本)9781728185514
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attentio...
详细信息
ISBN:
(纸本)9781728185514
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attention areas when they observe the same stimulus. Research on the IOVC of video is interesting but lacking. In this paper, we first introduce the measurement to calculate the IOVC of video. And an eye-tracking experiment is conducted in a realistic movie-watching environment to establish a movie scene dataset. Then we propose a method to predict the IOVC of video, which employs a dual-channel network to extract and integrate content and optical flow features. The effectiveness of the proposed prediction model is validated on our dataset. And the correlation between inter-observer congruency and video emotion is analyzed.
In this paper we propose a novel, post-processing system for compressed video sources. The proposed system explores the interaction between artifact reduction and sharpness/resolution enhancement to achieve optimal vi...
详细信息
ISBN:
(纸本)0819452114
In this paper we propose a novel, post-processing system for compressed video sources. The proposed system explores the interaction between artifact reduction and sharpness/resolution enhancement to achieve optimal video quality for compressed (e.g. MPEG-2) sources. It is based on the Unified Metric for Digital Video processing (UMDVP), which adaptively controls the post-processing algorithms according to the coding characteristics of the decoded video. The experiments carried out on several MPEG-2 encoded video sequences have shown significant improvement in picture quality compared to a system without the UMDVP control and to a system that did not exploit the interaction between artifact reduction and video enhancement. The UMDVP as well the proposed post-processing system can be easily adapted for different coding standard, such as MPEG-4, H.26x.
Magnetic resonance image analysis by computer is useful to aid diagnosis of malady. We present in this paper a automatic segmentation method for principal brain tissues. It is based on the possibilistic clustering app...
详细信息
ISBN:
(纸本)0819452114
Magnetic resonance image analysis by computer is useful to aid diagnosis of malady. We present in this paper a automatic segmentation method for principal brain tissues. It is based on the possibilistic clustering approach, which is an improved fuzzy c-means clustering method. In order to improve the efficiency of clustering process, the initial value problem is discussed and solved by combining with a histogram analysis method. Our method can automatically determine number of classes to cluster and the initial values for each class. It has been tested on a set of forty MR brain images with or without the presence of tumor. The experimental results showed that it is simple, rapid and robust to segment the principal brain tissues.
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representa...
详细信息
ISBN:
(纸本)9781728185514
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representation Reconstruction Network (ATSR-Net), composed of the Convolutional Sparse Representation subnet (CSR-subnet) and the truly Adaptive Threshold Generation subnet (ATG-subnet). The traditional iterations are unfolded into several CSR-subnets, which can fully exploit the local and nonlocal similarities. The ATG-subnet automatically determines a threshold map based on the image intrinsic characterization for flexible feature selection. Moreover, we present a three-level consistency loss based on pixel-level, measurement-level, and feature-level, to accelerate the network convergence. Extensive experiment results demonstrate the superiority of the proposed network to the existing state-of-the-art methods by large margins, both quantitatively and qualitatively.
With the blooming of deep learning technology in computer vision, the integration of deep learning and the traditional video coding has made significant improvements, especially applying the super-resolution neural ne...
详细信息
ISBN:
(纸本)9781728185514
With the blooming of deep learning technology in computer vision, the integration of deep learning and the traditional video coding has made significant improvements, especially applying the super-resolution neural network as the post-processing module in the down-sampling-based video compression framework. However, the pre-processing module lacks back-propagated gradients for jointly considering down-sampling and up-sampling due to the non-differentiability of the traditional video codec. In this paper, we propose an end-to-end down-sampling-based video compression framework applying convolutional neural networks both as down-sampling and up-sampling. We use a virtual codec neural network to approximate the actual video codec so that the gradient can be effectively back-propagated for joint training. Experimental results show the superiority of our proposed framework compared with the predefined down-sampling-based video compression and various methods of joint training.
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering th...
详细信息
ISBN:
(纸本)9781728185514
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel "non-Lambertian maps", without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely ma...
详细信息
ISBN:
(纸本)9781728185514
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.
Nowadays, Typeface plays an increasingly important role in dynamic digital interfaces, but there still has little direct evaluation of visualimage perception related to the typeface design, especially for the use in ...
详细信息
ISBN:
(纸本)9781665424257
Nowadays, Typeface plays an increasingly important role in dynamic digital interfaces, but there still has little direct evaluation of visualimage perception related to the typeface design, especially for the use in interface typography. The research is based on the analysis of display screen, elaborates upon the connection between display resolution and typeface design, the relationship between display polarity and the principle of vision optics. Furthermore, essential attributes and requirements of the two genre of interface font are inspected from the human visualimage perception. Additionally, the visualprocessing of text information and visual characteristics in scanning state are elaborated, visual Angle and spatial frequency of visual perception are identified as the cornerstones influencing the design of a typeface for user interface. The methodology of visual perception can be adapted to investigate questions relevant to typographic and typeface design.
暂无评论