This paper comprehensively overviews image and signal processing, including their fundamentals, advanced techniques, and applications. imageprocessing involves analyzing and manipulating digital images, while signal ...
详细信息
image Attribution seeks to reveal the importance of image regions in the classifier's final decision. Of the various ways to tackle this problem, the optimization based perspective is particularly intuitive: It ap...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
image Attribution seeks to reveal the importance of image regions in the classifier's final decision. Of the various ways to tackle this problem, the optimization based perspective is particularly intuitive: It applies the attribution as a mask on the image and reduces the attribution task to a loss, that can be optimized using gradient descent. Previous work has considered the goal as searching for the single best mask. Under this setup, however, there is a tendency towards trivial solutions of large masks with reduced discernment of the relative importances of regions. This has typically required auxiliary loss terms to control the area of the mask, however, their strength relative to the primary loss needs to be tuned. We challenge this necessity, by re-imagining attribution as an ordering of pixels according to importance. This ordering may be interpreted as a schedule which determines which locations get seen earlier and which later, allowing us to create a trajectory of masks from completely OFF to completely ON. We optimize through this sequence of masks of "all" areas and not just a single mask as in previous methods. We explore this setting, which we dub Saliency-as-Schedule (SaS), and demonstrate its effectiveness through experiments in a variety of settings, involving multiple datasets and CNN architectures. Further, we also propose a novel attribution task, feature saliency, where we use SaS to rate the influence of image regions on the intermediate feature maps of a CNN, and not just the class logit. Our findings suggest that SaS is a promising direction for the attribution problem. Our code will be available at https://***/tumble-weed/SaliencyasSchedule
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into multi-frequency and multi-directional sub-bands, and can figure out artifacts caused by video compression with multi-scale feature representation. Thus, we combine DWT with CNN and construct two sub-networks: Step-like sub-band network (SLSB) and mixed enhancement network (ME). SLSB takes the wavelet subbands as input, and feeds them into the Res2Net group (R2NG) from high frequency to low frequency. R2NG consists of Res2Net modules and adopts spatial and channel attentions to adaptively enhance features. We combine the high frequency sub-band output with the low frequency sub-band in R2NG to capture multi-scale features. ME uses mixed convolution composed of dilated convolution and standard convolution as the basic block to expand the receptive field without blind spots in dilated convolution and further improve the reconstruction quality. Experimental results demonstrate that the proposed CNN filter achieves average 2.13 %, 2.63 %, 2.99 %, 4.8 %, 3.72 % and 4.5 % BD-rate reductions over VTM 11.0-NNVC anchor for Y channel on A1, A2, B, C, D and E classes of the common test conditions (CTC) in AI, RA and LDP configurations, respectively.
visual impairment is on the major setbacks for the people suffering from it in society. The age of modernism has created an opportunity for the technological world to make this challenge easier for people suffering fr...
详细信息
Point clouds in 3D applications frequently experience quality degradation during processing, e.g., scanning and compression. Reliable point cloud quality assessment (PCQA) is important for developing compression algor...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Point clouds in 3D applications frequently experience quality degradation during processing, e.g., scanning and compression. Reliable point cloud quality assessment (PCQA) is important for developing compression algorithms with good bitrate-quality trade-offs and techniques for quality improvement (e.g., denoising). This paper introduces a full-reference (FR) PCQA method utilizing spectral graph wavelets (SGWs). First, we propose novel SGW-based PCQA metrics that compare SGW coefficients of coordinate and color signals between reference and distorted point clouds. Second, we achieve accurate PCQA by integrating several conventional FR metrics and our SGW-based metrics using support vector regression. To our knowledge, this is the first study to introduce SGWs for PCQA. Experimental results demonstrate the proposed PCQA metric is more accurately correlated with subjective quality scores compared to conventional PCQA metrics.
In this paper, we present HyperSpectraNet, a specialized convolutional neural network model developed for the reconstruction of hyperspectral images (HSI). Containing rich spectral information, HSIs are widely used in...
详细信息
ISBN:
(纸本)9798350350463;9798350350456
In this paper, we present HyperSpectraNet, a specialized convolutional neural network model developed for the reconstruction of hyperspectral images (HSI). Containing rich spectral information, HSIs are widely used in fields such as environmental monitoring, agriculture, and medical imaging, offering detailed insights beyond the capabilities of standard imaging. The proposed model combines spectral and spatial attention mechanisms with Fourier transform interactions, addressing the complex demands of HSI reconstruction. This combination enhances the model's ability to identify and highlight detailed spectral features, which are essential for accurate HSI representation. We have evaluated the effectiveness of the model on the NTIRE 2022 hyperspectral dataset, where it provides considerable improvement in the image quality and accuracy of spectral details with 31.6 dB PSNR and 0.9442 SSIM. These results highlight the potential of the model in advancing HSI reconstruction technology.
In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits le...
详细信息
ISBN:
(纸本)9798350343557
In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.
With the development of V2X technology, efficient spectrum resource management is critical to ensure the reliability and overall system performance of vehicle-to-vehicle communications. Traditional spectrum allocation...
详细信息
ISBN:
(纸本)9798350350920
With the development of V2X technology, efficient spectrum resource management is critical to ensure the reliability and overall system performance of vehicle-to-vehicle communications. Traditional spectrum allocation methods often do not take into account inter-vehicle interference. In this paper, we introduce an innovative approach to eliminate interference in vehicle-to-vehicle communication, the MAS-EGNN framework. Initially, an Equivariant Graph Neural Networks (EGNN) is utilized to dynamically update the graph representation through node and edge conditions to effectively capture the relationships and dependencies between vehicles. Subsequently, multi-intelligence reinforcement learning techniques allow multiple intelligences to interact simultaneously within the environment, with each independently adapting to changes in the surrounding environment to optimize overall network performance. The effectiveness of the approach in improving communication quality and system throughput is verified through the simulation of V2X communication scenarios and the implementation of corresponding optimization strategies. The experimental results show that the method significantly reduces interference and optimizes V2X spectrum allocation compared with the traditional spectrum allocation strategy.
Natural language processing is a crucial and beneficial task in various multimedia processing applications. Text-based person search (TBPS) application involves finding person images in an image gallery using an input...
详细信息
ISBN:
(纸本)9798350379808;9798350379792
Natural language processing is a crucial and beneficial task in various multimedia processing applications. Text-based person search (TBPS) application involves finding person images in an image gallery using an input as a text sentence. Previous studies have shown that noun phrases is one of the most important components in person description sentence as users employ noun phrases to describe the person characteristics. However, in recent TBPS the noun phrase chunking mainly bases on some set of rules. Low quality noun phrase chunking may lead to irrelevant results of text-based person search. In this paper, a method for Vietnamese noun phrase chunking named VNPC (Vietnamese Noun Phrase Chunking) is proposed. The method is based on Conditional Random Fields (CRFs) model and it is improved with a task-dependent post processing ruler to be integrated in Vietnamese text-based person image search framework. Compared to the Vietnamese Noun Phrase Chunking based on Conditional Random Fields [14] using only simple CRFs, which achieved an average recall and precision of 82.72% and 82.62% respectively, we achieved better results in chunking (with the same dataset splitting ratio). Experimental results show that the proposed method allows to obtain a high quality noun phrase chunking performance with 88.02%, 90.09%, and 89.01% of Precision, Recall, F1 score. And it helps to improve the person search results in TBPS by 1.325%, 0.675% and 0.25%, of accuracies at the top-1, top-5 at the top-10 respectively.
暂无评论