While saliency detection for images has been extensively studied during the past decades, only a little work explores the influence of different viewing devices (i.e., tablet computer, mobile phone) towards human visu...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
While saliency detection for images has been extensively studied during the past decades, only a little work explores the influence of different viewing devices (i.e., tablet computer, mobile phone) towards human visual attention behavior. The lack of research in this field hinders the research progress in cross-device image saliency detection. In this paper, we first establish a novel cross-device saliency detection (CDSD) database based on eye-tracking experiments and investigate subjects' visual attention behavior when using different viewing devices. Then, we evaluate several classic saliency detection models using the CDSD database and the evaluation results indicate that the cross-device performance of these models need further improvement. Finally, some meaningful discussions are provided which might enlighten the design of cross-device saliency detection model. The proposed CDSD database will be made publicly available.
Applying encryption technology to image retrieval can ensure the security and privacy of personal images. The related researches in this field have focused on the organic combination of encryption algorithm and artifi...
详细信息
ISBN:
(纸本)9781665475921
Applying encryption technology to image retrieval can ensure the security and privacy of personal images. The related researches in this field have focused on the organic combination of encryption algorithm and artificial feature extraction. Many existing encrypted image retrieval schemes cannot prevent feature leakage and file size increase or cannot achieve satisfied retrieval performance. In this paper, a new end-to-end encrypted image retrieval scheme is presented. First, images are encrypted by using block rotation, new orthogonal transforms and block permutation during the JPEG compression process. Second, we combine the triplet loss and the cross entropy loss to train a network model, which contains gMLP modules, by end-to-end learning for extracting cipher-images' features. Compared with manual features extraction such as extracting color histogram, the end-to-end mechanism can economize on manpower. Experimental results show that our scheme has good retrieval performance, while can ensure compression friendly and no feature leakage.
Near infrared (NIR) images are robust to ambient light and contain clear textures in low light condition. In this paper, we propose NIR image colorization using spatial adaptive denormalization (SPADE) generator and g...
详细信息
ISBN:
(纸本)9781728180687
Near infrared (NIR) images are robust to ambient light and contain clear textures in low light condition. In this paper, we propose NIR image colorization using spatial adaptive denormalization (SPADE) generator and grayscale approximated self-reconstruction. Compared with traditional image to image translation methods, the proposed NIR colorization pursues photorealism rather than generative diversity. The challenge of this task is NIR-RGB mis-registration in training data. We address this problem by separately extracting NIR texture and RGB color with an end to end SPADE based model. Moreover, the proposed method facilitates a more precise synthesis with a given low light RGB reference image. Experiments on an open NIR-RGB dataset verify that the proposed method effectively preserves NIR textures and RGB colors in the synthesized results and outperforms the baselines in terms of visual quality and quantitative assessments.
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with ...
详细信息
ISBN:
(纸本)9781728185514
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with other computer vision datasets. Unlike many works that use data augmentation to solve this problem, we adopt a novel strategy, called, label splitting, to deal with limited samples. Specifically, each sample has its original semantic label, we assign a new appearance label via unsupervised clustering for each sample by label splitting. Then an optimized triplet loss learning is applied to distill domain specific knowledge. This is achieved through a binary tree forest partitioning and triplets selection and optimization scheme that controls the triplet quality. Simulation results on NWPU, UCM and AID datasets demonstrate that proposed solution achieves the state-of-the-art performance in the aerial image classification.
The direction-adaptive discrete wavelet transform (DA-DWT) locally adapts the filtering direction to the geometric flow in the image. DA-DWT image coders have been shown to achieve a rate-distortion performance superi...
详细信息
ISBN:
(纸本)9780819466211
The direction-adaptive discrete wavelet transform (DA-DWT) locally adapts the filtering direction to the geometric flow in the image. DA-DWT image coders have been shown to achieve a rate-distortion performance superior to non-adaptive wavelet coders. However, since the direction information must always be signalled regardless of total bit-rate, performance at very low bit-rates might be worse. In this paper, we propose two scalable direction representations: the layered scheme which is similar to the scalable motion vector representation in scalable video coding and the level-unit scheme which provides finer granularity upon the layered scheme. Experimental results indicate that we can achieve the desirable performance at both low and high bit rates with our proposed level-unit scheme. Significant improvement in image quality (about 3-5 dB) is observed at very low bit rate, relative to non-scalable coding of the direction information.
image aesthetics assessment (IAA) measures the perceived beauty of images using a computational approach. People usually assess the aesthetics of an image according to semantic attributes, e.g., lighting, color, objec...
详细信息
ISBN:
(纸本)9781665475921
image aesthetics assessment (IAA) measures the perceived beauty of images using a computational approach. People usually assess the aesthetics of an image according to semantic attributes, e.g., lighting, color, object emphasis, etc. However, the state-of-the-art IAA approaches usually follow the data-driven framework without considering the rich attributes contained in images. With this motivation, this paper presents a new semantic attribute guided IAA model, where the attention maps of semantic attributes are employed to enhance the representation ability of general aesthetic features for more effective aesthetics assessment. Specifically, we first design an attribute attention generation network to obtain the attention maps for different semantic attributes, which are utilized to weight the general aesthetic features, producing the semantic attribute-enhanced feature representations. Then, the Graph Convolutional Network (GCN) is employed to further investigate the inherent relationship among the enhanced aesthetic features, producing the final image aesthetics prediction. Extensive experiments and comparisons on three public IAA databases demonstrate the effectiveness of the proposed method.
This paper deals with the insertion of a chaotic signature in an image by exploiting the analogy presented by the wavelets transform and the human visual system (HVS) model, to modulate and adapt the signature accordi...
详细信息
ISBN:
(纸本)9781424412358
This paper deals with the insertion of a chaotic signature in an image by exploiting the analogy presented by the wavelets transform and the human visual system (HVS) model, to modulate and adapt the signature according to local characteristic's of the image. The blind detection process consists on computing the correlation between the marked DWT coefficients and the watermarking sequence. In order to face the problem of the geometrical de-synchronization, a differential technique of motion estimation is employed to compensate the possible geometrical deformations undergone by the watermarked image. The method has been proved to be robust to various imageprocessing such as compression, filtering and noise addition and various geometrical attacks such as translation, rotation, scaling, cropping and resizing.
End-to-end optimized image compression has emerged as a disruptive technique to reduce the spatial redundancies with an improved reconstruction quality. However, existing entropy model for latent representations canno...
详细信息
ISBN:
(纸本)9781728180687
End-to-end optimized image compression has emerged as a disruptive technique to reduce the spatial redundancies with an improved reconstruction quality. However, existing entropy model for latent representations cannot sufficiently exploit their spatial and channel-wise correlations. In this paper, we propose a novel entropy model based on spatial-channel contexts for end-to-end optimized image compression. The proposed model jointly leverages spatial structural dependencies and channel-wise correlations to improve the probabilistic estimation of latent representations. Instead of complex autoregressive hyperprior network, shallow artificial neural networks (ANNs) incorporating 3-D masks are developed to efficiently realize the entropy model with a guarantee of causality. Experimental results demonstrate that the proposed model achieves competitive rate-distortion performance and reduces model complexity in comparison to recent end-to-end optimized methods for image compression.
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide ...
详细信息
ISBN:
(纸本)9781665475921
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide features based on encoder-decoder architecture. However, the ability to gather relevant contextual information about human is limited and different sub-tasks in HOI detection are not differentiated by specific decoupling in previous methods. To this end, we propose a new transformer-based method for HOI detection, namely, Mask-Guided Transformer (MGT). Our model, which is composed of five parallel decoders with a shared encoder, not only emphasizes interactive regions by applying body features, but also disentangles the prediction of instance and interaction. We achieve a favorable result at 63.3 mAP on the well-known HOI detection dataset V-COCO.
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
暂无评论