A longstanding challenge in Super-Resolution (SR) is how to efficiently enhance high-frequency details in Low-Resolution (LR) images while maintaining semantic coherence. This is particularly crucial in practical appl...
详细信息
A longstanding challenge in Super-Resolution (SR) is how to efficiently enhance high-frequency details in Low-Resolution (LR) images while maintaining semantic coherence. This is particularly crucial in practical applications where SR models are often deployed on low-power devices. To address this issue, we propose an innovative asymmetric SR architecture featuring Multi-Depth Branch Module (MDBM). These MDBMs contain branches of different depths, designed to capture high- and low-frequency information simultaneously and efficiently. The hierarchical structure of MDBM allows the deeper branch to gradually accumulate fine-grained local details under the contextual guidance of the shallower branch. We visualize this process using featuremaps, and further demonstrate the rationality and effectiveness of this design using proposed novel Fourier spectral analysis methods. Moreover, our model exhibits more significant spectral differentiation between branches than existing branch networks. This suggests that MDBM reduces feature redundancy and offers a more effective method for integrating high- and low-frequency information. Extensive qualitative and quantitative evaluations on various datasets show that our model can generate structurally consistent and visually realistic HR images. It achieves state-of-the-art (SOTA) results at a very fast inference speed. Our code is available at https://***/thy96 0112/MDBN.
The Swin-Transformer is a variant of the Vision Transformer, which constructs a hierarchical Transformer that computes representations with shifted windows and window multi-head self-attention. This method can handle ...
详细信息
The Swin-Transformer is a variant of the Vision Transformer, which constructs a hierarchical Transformer that computes representations with shifted windows and window multi-head self-attention. This method can handle the scale invariance problem and performs well in many computer vision tasks. In image retrieval, high-quality feature descriptors are necessary to improve retrieval accuracy. This paper proposes a self-ensemble Swin-Transformer network structure to fuse the features of different layers of the Swin-Transformer network, eliminating noise points present in a single layer, and improving the retrieval effect. Two experiments were conducted, one on the In-shop Clothes Retrieval dataset and another on the Stanford Online Product dataset. The experiments showed that the proposed method significantly increased the retrieval effect of features extracted using Vision Transformer, surpassing previous state-of-the-art image retrieval methods. In the second experiment, the featuremap of the trained model was visualized, revealing that the improved network significantly reduces focus on some noise points and enhances focus on image features compared to the original network. In order to effectively integrate consistent information between the multiple layers of the Swin Transformer, the model conducts parameter self-ensemble of the internal blocks of the Swin ***
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by th...
详细信息
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation space. Inspired by the direct connection between the LGN and V4, which allows V4 to handle low-level information closer to the trichromatic input in addition to processed information that comes from V2/V3, we propose the addition of a long skip connection (LSC) between the first and last blocks of the feature extraction stage to allow deeper parts of the network to receive information from shallower layers. This type of connection improves classification accuracy by combining simple-visual and complex-abstract features to create more color-selective ones. We have applied this strategy to classic CNN architectures and quantitatively and qualitatively analyzed the improvement in accuracy while focusing on color selectivity. The results show that, in general, skip connections improve accuracy, but LSC improves it even more and enhances the color selectivity of the original CNN architectures. As a side result, we propose a new color representation procedure for organizing and filtering featuremaps, making their visualization more manageable for qualitative color selectivity analysis.
In light of the recent advancements in Artificial Intelligence (AI), the application of Machine Learning in the domains of Natural Language Processing and Computer Vision is increasing by leaps and bounds. Deployment ...
详细信息
ISBN:
(纸本)9783030814625;9783030814618
In light of the recent advancements in Artificial Intelligence (AI), the application of Machine Learning in the domains of Natural Language Processing and Computer Vision is increasing by leaps and bounds. Deployment of Machine Learning Models in applications (apps) have been rampant with the aim of achieving automation, mainly involving textual and image data. Textual data indulges the subject of Text Analytics (also known as NLP) into action and image data indulges the subject of Computer Vision into play. But, the performance of Machine Learning in the domains of Text Analytics or Vision needs to be judged before deployment. Now, performance analysis of ML Models are done with the help of performance metrics, most importantly AUC Score in Classification Problems, but justification by means of numerical scores only, can't establish the relevance of the Model Performance with Domain Knowledge. In this paper, 1 standard NLP use-case and 4 Computer Vision use-cases are considered for ML Model Interpretability Enhancement that can throw light on the relevance with the concerned Domain Knowledge, the use-case deals with.
暂无评论