In traditional asymmetric stereo video encoding scheme, one eye is represented with high quality sequence, the other eye is represented with lower quality one. However, if the low quality view is the observer's do...
详细信息
ISBN:
(纸本)9781424448562
In traditional asymmetric stereo video encoding scheme, one eye is represented with high quality sequence, the other eye is represented with lower quality one. However, if the low quality view is the observer's dominant eye, the masking effect will not work. Based on this human visual characteristic, this paper proposed a GOP-based resolution cross-switching asymmetric encoding scheme. By allocating degradation to both of views in a balanced way over time, our experimental results show better compression efficiency than JMVM reference software and better subjective visual quality than the traditional asymmetric stereo video encoding scheme. Our stereo video coding scheme can be a trade-off between compression performance and subjective visual quality.
An audio-graphic teleconferencing system has been developed that uses ordinary personal computers (PCs) interconnected over a basic rate (2B+D) ISDN line. The system supports high-speed transmission of 200-dpi resolut...
详细信息
ISBN:
(纸本)0819407437
An audio-graphic teleconferencing system has been developed that uses ordinary personal computers (PCs) interconnected over a basic rate (2B+D) ISDN line. The system supports high-speed transmission of 200-dpi resolution documents read by an optical scanner and presented on the displays of the conference participants. While looking at the same material, the conferees can interactively converse and make handwritten notations for all the participants to see on the document via a LCD tablet. This paper describes the configuration and performance of the system, focusing mainly on the ISDN-based multi-media transmission method and the method of reducing and enlarging binary images.
Transform coding based on the discrete cosine transform (DCT) has been widely used in image coding standards. However, the coded images often suffer from severe visual distortions such as blocking artifacts. In this p...
详细信息
ISBN:
(纸本)9781479961399
Transform coding based on the discrete cosine transform (DCT) has been widely used in image coding standards. However, the coded images often suffer from severe visual distortions such as blocking artifacts. In this paper, we propose a novel image deblocking method to address the blocking artifacts reduction problem in a patch-based scheme. image patches are clustered and reconstructed by the low-rank approximation, which is weighted by the geodesic distance. Experimental results show that the proposed method achieves higher PSNR than the state-of-the-art deblocking and denoising methods and the processed images present good visual quality.
Printed circuit board (PCB) assemblies in everyday electronic devices are mass-produced. As a result of this production volume, a fast way of visual inspection is necessary. An integral part of visual inspection syste...
详细信息
ISBN:
(纸本)9798350343557
Printed circuit board (PCB) assemblies in everyday electronic devices are mass-produced. As a result of this production volume, a fast way of visual inspection is necessary. An integral part of visual inspection systems is PCB component classification. In this paper, we have explored use of the Vision Transformer (ViT), which is a recent state-of-the-art image classification approach, for PCB component classification. We have employed several ViT models that are available in the literature and also proposed a new compact, efficient, and high performing ViT model, named as ViT-Mini. We have conducted extensive experiments on the FICS-PCB dataset in order to comparatively evaluate the ViT models' performance. The highest achieved accuracy is 99.46% for capacitor and resistor classification and 96.52% for classification of capacitor, resistor, inductor, transistor, diode, and IC. The proposed compact model's performance is comparable with the ones obtained with larger models, which indicates its suitability for real-time applications.
Compared to RGB images non-linearly mapped from RAW data through the image Signal Processor (ISP), RAW data are linear to scene radiance and contain more native information, which is better to be modeled in many visio...
详细信息
ISBN:
(纸本)9781665475921
Compared to RGB images non-linearly mapped from RAW data through the image Signal Processor (ISP), RAW data are linear to scene radiance and contain more native information, which is better to be modeled in many vision tasks. This work proposes to enhance low-light images in the RAW domain via a cross-scale framework using paired Fast Fourier Convolution (FFC) and Transformer, driving the network to characterize images effectively. The entire framework has three scales to abstract low-level, mid-level, and high-level representations of input images. We embed paired FFC and Transformer in each scale to attain spatial-spectral information extraction and aggregation. Specifically, by transforming features from the spatial domain into the spectral domain with FFC, pixel correlations can be effectively exploited locally and globally, generating representative features for the input image. Immediately, the Transformer using multi-head self-attention mechanism is applied to aggregate and embed important features. Experimental results demonstrate that our method significantly outperforms state-of-the-art low-light enhancement works in both full reference assessment metrics, including PSNR, MPSNR, and SSIM, and no-reference metrics, such as NIMA. Meanwhile, the perceptual quality of the proposed method is more visually pleasing than that of other methods.
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. ...
详细信息
ISBN:
(纸本)9781728185514
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. However, it is difficult to find repeating patterns which are illuminated under different lighting conditions and viewed from different perspectives. To solve the problem, we first identify the regularity unit of repeating patterns by statistics. Then we can use the regularity unit for shift-map optimization to obtain a better resized image. The experimental results show that our method is competitive with other well-known methods.
Depth for single image is a hot problem in computer vision, which is very important to 2D/3D image conversion. Generally, depth of the object in the scene varies with the amount of blur in the defocus images. So, dept...
详细信息
ISBN:
(纸本)9781479902880
Depth for single image is a hot problem in computer vision, which is very important to 2D/3D image conversion. Generally, depth of the object in the scene varies with the amount of blur in the defocus images. So, depth in the scene can be recovered by measuring the blur. In this paper, a new method for depth estimation based on focus/defocus cue is presented, where the entropy of high frequency subband of wavelet decomposition is regarded as the measure of blur. The proposed method, which is unnecessary to select threshold, can provide pixel-level depth map. The experimental results show that this method is effective and reliable.
With the rapid development of computer graphics and generative models, computers are capable of generating images containing non-existent objects and scenes. Moreover, the computer-generated (CG) images may be indisti...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of computer graphics and generative models, computers are capable of generating images containing non-existent objects and scenes. Moreover, the computer-generated (CG) images may be indistinguishable from photographic (PG) images due to the strong representation ability of neural network and huge advancement of 3D rendering technologies. The abuse of such CG images may bring potential risks for personal property and social stability. Therefore, in this paper, we propose a dual-stream neural network to extract features enhanced by texture information to deal with the CG and PG image classification task. First, the input images are first converted to texture maps using the rotation-invariant uniform local binary patterns. Then we employ an attention-based texture-aware feature enhancement module to fuse the features extracted from each stage of the dual-stream neural network. Finally, the features are pooled and regressed into the predicted results by fully connected layers. The experimental results show that the proposed method achieves the best performance among all three popular CG and PG classification databases. The ablation study and cross-database validation experiments further confirm the effectiveness and generalization ability of the proposed algorithm.
The unmixing of hyperspectral data is a hot topic in the field of r emote s ensing. H owever, in p resence o f various types of noise, especially the noisy channels, the performance of unmixing approaches is seriously...
详细信息
ISBN:
(纸本)9781728180687
The unmixing of hyperspectral data is a hot topic in the field of r emote s ensing. H owever, in p resence o f various types of noise, especially the noisy channels, the performance of unmixing approaches is seriously deteriorated. To enhance the robustness of the unmixing method is a subject worth studying. This paper presents a robust unmixing method based on the recently- proposed multilinear mixing model, where the l(2,1) norm is adopted in the loss function to suppress the influence of noise. The sparseness of abundance is also considered to improve the parameter estimation. The resulting optimization problem is solved by the alternating direction multiplier method (ADMM). Experiments on both synthetic and real images demonstrate the performance of the proposed unmixing strategy.
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge o...
详细信息
ISBN:
(纸本)9781665475921
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge of novel categories with few training samples. In FSL task, Meta-learning and metric learning have achieved impressive results. However, the performance of this task is still limited by large intra-class variance and small inter-class distance caused by limited number of few samples. To solve this problem, In this paper, we propose a new method, which integrates meta-learning and metric learning techniques. Specifically, we first propose a feature representation module (FR) to construct representative support class prototypes and query features. Then, we design bias loss to minimize the bias between support and query samples. Furthermore, we design an intra-class loss to minimize the distance between query class prototype and each query sample. We denote this model as ML-FDA and validate it on standard few-shot classification benchmark datasets (MiniimageNet, CIFAR-FS, FC100). The results show that our method improves the performance over other same paradigm methods and achieves the best performance on most benchmarks. The ablation study and visulization analysis also demonstrate the effectiveness of our method.
暂无评论