With the advance of deep learning in the BigData era, image/video coding for machines (VCM) as called for proposals by the moving picture experts group (MPEG) now becomes the pivotal technique for extensive intelligen...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
With the advance of deep learning in the BigData era, image/video coding for machines (VCM) as called for proposals by the moving picture experts group (MPEG) now becomes the pivotal technique for extensive intelligent vision tasks. However, existing VCM methods typically focus on compressing features independently at each scale, ignoring the redundancy of features across multiple scales. This paper thus introduces a simple yet effective architecture called hybrid single input and multiple output (H-SIMO) for VCM, which can significantly reduce the redundancy across scales of features. More specifically, as the pyramid structure is commonly employed for localising multi-scale objects, our HSIMO method proposes to compress all features by inputting a single-scale feature while retaining the ability to decompress all the features. Moreover, an entropy model is seamlessly integrated into the training process to efficiently reduce the statistical redundancy of features. During the testing phase, the hybrid coding method, in conjunction with the versatile video coding (VVC), is employed to compress the features from both images and videos. We comprehensively evaluate the performance of our H-SIMO method in two standard machine vision tasks: object detection and instance segmentation, in which the experimental results verify the superior performances of our H-SIMO method.
Autoregressive language modeling (ALM) has been successfully used in self-supervised pre-training in Natural language processing (NLP). However, this paradigm has not achieved comparable results with other self-superv...
详细信息
ISBN:
(纸本)9781577358800
Autoregressive language modeling (ALM) has been successfully used in self-supervised pre-training in Natural language processing (NLP). However, this paradigm has not achieved comparable results with other self-supervised approaches in computer vision (e.g., contrastive learning, masked image modeling). In this paper, we try to find the reason why auto-regressive modeling does not work well on vision tasks. To tackle this problem, we fully analyze the limitation of visual autoregressive methods and proposed a novel stochastic auto-regressive image modeling (named SAIM) by the two simple designs. First, we serialize the image into patches. Second, we employ the stochastic permutation strategy to generate an effective and robust image context which is critical for vision tasks. To realize this task, we create a parallel encoder-decoder training process in which the encoder serves a similar role to the standard vision transformer focusing on learning the whole contextual information, and meanwhile the decoder predicts the content of the current position so that the encoder and decoder can reinforce each other. Our method significantly improves the performance of autoregressive image modeling and achieves the best accuracy (83.9%) on the vanilla ViT-Base model among methods using only imageNet-1K data. Transfer performance in downstream tasks also shows that our model achieves competitive performance. Code is available at https://***/qiy20/SAIM.
In recent years, light field imaging has gained significant attention in the scientific community due to its ability to provide a more immersive representation of the 3D world. However, ensuring the quality of light f...
详细信息
ISBN:
(纸本)9798350350920
In recent years, light field imaging has gained significant attention in the scientific community due to its ability to provide a more immersive representation of the 3D world. However, ensuring the quality of light field images is crucial for their subsequent processing and applications. Deep learning methods, leveraging neural networks, have shown promising performance in image Quality Assessment (IQA). However, the unique characteristics of light field data pose a challenge for existing IQA methods. To address this challenge, we propose a Robust Large-scale Dataset for Assessing Light Field image Quality, named RLSD, specifically designed for evaluating the quality of light field images. The dataset comprises both real and synthetic scenes, covering a wide range of key low attributes and including three representative distortions: compression, noise, and blur. To obtain subjective evaluations, we adopt the single stimulus continuous quality evaluation (SSCQE) method and compute the Mean Opinion Score (MOS). We performed statistical analysis on the dataset and experimental results indicate that our proposed RLSD dataset includes various common scenes and distortion levels, making it suitable for designing and evaluating LF-IQA algorithms. The dataset is publicly available at the following link: "https://***/s/1kJmx4qsy8ywLPba-HwGCEg" (password: XY28).
This study aims to improve the performance of graph data anomaly detection. To address the limitations of traditional methods in terms of accuracy and robustness of anomaly detection, this paper proposes a new method ...
详细信息
This paper proposes a stochastic human action prediction method based on denoising diffusion probability model to address the shortcomings of current motion prediction methods, such as insufficient diversity and devia...
详细信息
This paper presents an image encryption scheme that introduces a novel permutation technique named as orbital-extraction permutation. The proposed encryption scheme contains three important modules, i.e., the key gene...
详细信息
ISBN:
(纸本)9798350351491;9798350351484
This paper presents an image encryption scheme that introduces a novel permutation technique named as orbital-extraction permutation. The proposed encryption scheme contains three important modules, i.e., the key generation module, the orbital-extraction permutation module, and dynamic chaotic substitution module. The key generation module utilizes the 2D Henon map, a highly non-linear and unpredictable chaotic map, to generate cryptographic keys. The orbital-extraction permutation module reshuffles the pixels of the plain text image in a complex manner disrupting the inherent correlation within the neighboring pixels of the input image. The proposed permutation technique serves as a strong diffusion stage. Furthermore, for the confusion part of the encryption scheme, bit-XOR operations and chaotic substitution methods have been employed. The proposed scheme has been evaluated for key statistical security parameters. Results indicate the enhanced security and robustness of the proposed scheme with an information entropy of 7.9974 and a correlation coefficient of 0.007.
Deep steganalyzer combined with neural networks has achieved great success in image classification over recent years. However, it suffers from the following persistent challenges: i) Deep steganalyzer is extremely vul...
ISBN:
(纸本)9798350374520;9798350374513
Deep steganalyzer combined with neural networks has achieved great success in image classification over recent years. However, it suffers from the following persistent challenges: i) Deep steganalyzer is extremely vulnerable and has the risk of being attacked via adversarial steganography when performing the image classification tasks;ii) Pre-processing based methods aiming to remove adversarial perturbations from cover images jeopardize the accuracy performance, as the involved steganographic signal will be wiped off as well. In this context, to defend against adversarial attacks by an adversary, we propose an adversarial steganography detection scheme based on the pre-processing and feature migration. In brief, sub-images are sampled to obtain the dimensionality of the extracted features, which are usually used to expand them while reducing the effect brought by adversarial perturbations. In particular, by computing statistical features together with normalizing the features, our approach can improve the classification accuracy of the samples. Our experimental results show that the proposed approach is capable of detecting adversarial steganographic image with an accuracy gain of up to 35.9% over the state-of-the-art methods.
Assistive driving systems like Lane Keeping Assist predominantly utilize imageprocessing for lane marker detection to localize the vehicle, however, image data faces challenges, such as sensitivity to weather and roa...
详细信息
ISBN:
(纸本)9798350371635;9798350371628
Assistive driving systems like Lane Keeping Assist predominantly utilize imageprocessing for lane marker detection to localize the vehicle, however, image data faces challenges, such as sensitivity to weather and road conditions and a low update rate due to computational demands. To address these issues, this study proposes a novel approach for local positioning by fusing image data with dashboard speed, IMU (accelerometer and gyroscope sensors) measurements, and GPS data. An Extended Kalman Filter as a stochastic estimator is used to estimate road-vehicle states and enhance lane marker detection accuracy under various conditions. The algorithm's performance was tested across diverse driving scenarios, including different speeds, road curvatures, and poor visibility conditions simulating Canada's winter weather. Results indicate that the proposed sensor fusion algorithm significantly reduces RMS and maximum error in estimating lane lateral offset, relative heading angle, and velocity, especially where image-based methods falter due to noise or temporary loss of functionality.
The unmanned aerial vehicle equipment is inevitably interfered by environmental noise in the process of image acquisition. Suppress noise to enhance images is a hot topic that scholars strive to study. The stochastic ...
详细信息
A single image can not fully describe the information of the target, and the practical application value is low. Fusion of multi-source data to generate images with richer information and higher quality has become a t...
详细信息
暂无评论