In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which...
详细信息
ISBN:
(纸本)9781728173221
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
In this paper, we propose a new fast CU partition method for VVC intra coding based on the cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and ...
详细信息
ISBN:
(纸本)9781728173221
In this paper, we propose a new fast CU partition method for VVC intra coding based on the cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and is employed to guide the skipping of unnecessary horizontal and vertical partition modes. With this guidance, a fast determination of block partitions is accordingly achieved. Compared with VVC, our proposed method can save 41.64% (on average) encoding time with only 0.97% (on average) increase of BD-rate.
The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the da...
详细信息
The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging visual Genome dataset.
Here we propose a novel affine registration method for planar curves. It is based on a pseudo-inverse algorithm applied to the source and target curves in their multi-scale version. The proposed registration system se...
详细信息
ISBN:
(纸本)9781728173221
Here we propose a novel affine registration method for planar curves. It is based on a pseudo-inverse algorithm applied to the source and target curves in their multi-scale version. The proposed registration system selects the relevant scales in the optimized L 2 distances. The retrieved smoothing parameters are realized with the Gaussian Expectation-Maximization (EM) algorithm. We resolve the global system, formed by equations corresponding to EM selected scales.
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjøntegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages...
详细信息
ISBN:
(纸本)9781728173221
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjøntegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages, parsing *.csv-formatted files, and *.xml-formatted files. Once parsed, RDPlot offers the ability to evaluate video coding results interactively. Conceptually, several measures can be plotted over the bitrate and BD measurements can be conducted accordingly. Moreover, plots and corresponding BD statistics can be exported, and directly integrated into LaTeX documents.
Computer vision is an emerging area for imageprocessing that is well utilized in the field of Intelligent Transport Systems. There are various techniques used to increase the efficiency of ITS such as GPS, radar-base...
详细信息
Existing Masked image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those patterns resort to different criteria to mask local regions, sticking to a fixed pattern leads to limite...
Existing Masked image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those patterns resort to different criteria to mask local regions, sticking to a fixed pattern leads to limited vision cues modeling capability. This paper proposes an evolved part-based masking to pursue more general visual cues modeling in self-supervised learning. Our method is based on an adaptive part partition module, which leverages the vision model being trained to construct a part graph, and partitions parts with graph cut. The accuracy of partitioned parts is on par with the capability of the pretrained model, leading to evolved mask patterns at different training stages. It generates simple patterns at the initial training stage to learn low-level visual cues, which hence evolves to eliminate accurate object parts to reinforce the learning of object semantics and contexts. Our method does not require extra pretrained models or annotations, and effectively ensures the training efficiency by evolving the training difficulty. Experiment results show that it substantially boosts the performance on various tasks including image classification, object detection, and semantic segmentation. For example, it outperforms the recent MAE by 0.69% on imageNet-1K classification and 1.61% on ADE20K segmentation with the same training epochs.
Glaucoma, an ocular disorder, has the potential to inflict destruction upon the optic nerve. The optic nerve facilitates the transmission of visual information form the brain to the eyes. The phenomenon known as intra...
详细信息
ISBN:
(数字)9798331505790
ISBN:
(纸本)9798331505806
Glaucoma, an ocular disorder, has the potential to inflict destruction upon the optic nerve. The optic nerve facilitates the transmission of visual information form the brain to the eyes. The phenomenon known as intraocular pressure (IOP) refers to the accumulation of pressure within the eye, which can ultimately result in harm to the optic nerve. Failing to recognize glaucoma in its earliest phases can potentially result in the onset of blindness or visual impairment. The utilization of fundus pictures is utilized within the suggested study to facilitate the automation of glaucoma diagnosis. The article discusses the development of a hybrid model combining the VGG16 (visual Geometry Group) and convolutional neural net-work (CNN) architectures for the purpose of glaucoma detection. The algorithm is educated through its utilization of fundus photographs, and subsequent to its instruction, it is capable of discerning among eyes that are healthy and those af-flicted with glaucoma by analyzing the features extracted from the input photos. An aggregate of 520 photos were utilized in order to enhance the precision and computational capabilities of the deep learning approach. The assessment of the CNN and VGG16 models is conducted by assessing their efficacy in terms of precision, robustness, exactness, and F1 score. The Glaucoma detection approach is frequently utilized in the analysis of color fundus pictures. The CNN and VGG16 models achieved a success rate of $76 \%$, a loss of 4.83, a sensitivity of $78 \%$, a precision of $93 \%$, and an F1 score of 1.03. The findings demonstrate the efficacy of the suggested technique for quick execution in realworld scenarios.
visually impaired people struggle daily and have difficulty recognizing and distinguishing objects around them. Thus, they mainly depend on supervision from other people to assist them. Since smartphones have become a...
详细信息
visually impaired people struggle daily and have difficulty recognizing and distinguishing objects around them. Thus, they mainly depend on supervision from other people to assist them. Since smartphones have become a necessity in this modern world, the researchers formulated a solution to help the visually impaired through a machine learning-based mobile application for object recognition. Nowadays, software applications can provide accurate findings in picture classification and processing processes thanks to machine learning techniques and algorithms. In this study, the researchers use a convolutional neural network (CNN)-based system on TensorFlow Lite to create a mobile version of a visual information system employing a machine learning strategy and deep learning framework. The main objectives of the smartphone application, EyeRis, is to recognize and categorize items in real-time and to separate photographs from the user-selected scenarios. Results were analyzed and contrasted based on the app’s obtained recognition accuracy data. It demonstrated the utility of CNN as a model for image recognition algorithms.
Deep learning models operating in the complex domain are used due to their rich representation capacity. However, most of these models are either restricted to the first quadrant of the complex plane or project the co...
详细信息
Deep learning models operating in the complex domain are used due to their rich representation capacity. However, most of these models are either restricted to the first quadrant of the complex plane or project the complex-valued data into the real domain, causing a loss of information. This paper proposes that operating entirely in the complex domain increases the overall performance of complex-valued models. A novel, fully complex-valued learning scheme is proposed to train a Fully Complex-valued Convolutional Neural Network (FC-CNN) using a newly proposed complex-valued loss function and training strategy. Benchmarked on CIFAR-10, SVHN, and CIFAR-100, FC-CNN has a 4-10% gain compared to its real-valued counterpart, with the same number of parameters. It achieves comparable performance to state-of-the-art complex-valued models on CIFAR-10 and SVHN with fewer parameters. For the CIFAR-100 dataset, it achieves state-of-the-art performance with 25% fewer parameters. FC-CNN shows better training efficiency and much faster convergence than all the other models.
暂无评论