检索结果-内蒙古大学图书馆

IEEE Transactions on Artificial Intelligence 2024年第4期5卷 1613-1623页

作者： Manna, Siladittya Bhattacharya, Saumik Pal, Umapada Indian Statistical Institute Computer Vision and Pattern Recognition Unit Kolkata700108 India Indian Institute of Technology Kharagpur Department of Electronics and Electrical Communication Engineering Kharagpur721302 India

In medical image analysis, the cost of acquiring high-quality data and annotation by experts is a barrier in many medical applications. Most of the techniques used are based on a supervised learning framework and require a large amount of annotated data to achieve satisfactory performance. As an alternative, in this article, we propose a self-supervised learning approach for learning the spatial anatomical representations from the frames of magnetic resonance (MR) video clips for the diagnosis of knee medical conditions. The pretext model learns meaningful context-invariant spatial representations. The downstream task in our article is a class-imbalanced multilabel classification. Different experiments show that the features learned by the pretext model provide competitive performance in the downstream task. Moreover, the efficiency and reliability of the proposed pretext model in learning representations of minority classes without applying any strategy toward imbalance in the dataset can be seen from the results. To the best of our knowledge, this work is the first of its kind in showing the effectiveness and reliability of self-supervised learning algorithms in imbalanced multilabel classification tasks on MR scans. © 2020 IEEE.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

Blind Image Inpainting via Omni-dimensional Gated Attention and Wavelet Queries

Blind Image Inpainting via Omni-dimensional Gated Attention ...

引用

2023 IEEE/CVF Conference on computer vision and pattern recognition Workshops, CVPRW 2023

作者： Phutke, Shruti S. Kulkarni, Ashutosh Vipparthi, Santosh Kumar Murala, Subrahmanyam Indian Institute of Technology Ropar Computer Vision and Pattern Recognition Lab Punjab Rupnagar India

ISBN: (纸本)9798350302493

Blind image inpainting is a crucial restoration task that does not demand additional mask information to restore the corrupted regions. Yet, it is a very less explored research area due to the difficulty in discriminating between corrupted and valid regions. There exist very few approaches for blind image inpainting which sometimes fail at producing plausible inpainted images. Since they follow a common practice of predicting the corrupted regions and then inpaint them. To skip the corrupted region prediction step and obtain better results, in this work, we propose a novel end-to-end architecture for blind image inpainting consisting of wavelet query multi-head attention transformer block and the omni-dimensional gated attention. The proposed wavelet query multi-head attention in the transformer block provides encoder features via processed wavelet coefficients as query to the multi-head attention. Further, the proposed omni-dimensional gated attention effectively provides all dimensional attentive features from the encoder to the respective decoder. Our proposed approach is compared numerically and visually with existing state-of-the-art methods for blind image inpainting on different standard datasets. The comparative and ablation studies prove the effectiveness of the proposed approach for blind image inpainting. The testing code is available at : https://***/shrutiphutke/Blind-Omni-Wav-Net © 2023 IEEE.

关键词： Restoration

来源：评论

学校读者我要写书评

暂无评论

A new multimodal sentiment analysis for images containing textual information

引用

Multimedia Tools and Applications 2024年 1-30页

作者： Ahuja, Garvit Alaei, Alireza Pal, Umapada School of Computer Science and Engineering Manipal University Jaipur Jaipur India Faculty of Science and Engineering Southern Cross University Gold Coast Australia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

Multimodal sentiment analysis on images with textual content is a research area aiming to understand the sentiment conveyed by visual and textual elements in the images. While multimodal sentiment analysis on images and text (reviews) has its own challenges, the combination of textual and visual content in the form of images presents new challenges as well as opportunities. In this research work, we proposed a multimodal sentiment analysis method that works on images incorporating textual elements. In the textual sentiment analysis model, we initially employed a recognition system to extract textual data from input images. Our proposed multimodal method is based on transfer learning, considering two pre-trained deep learning models, Xception, and RoBERTa, to extract features from both visual and textual content from multimedia images. We then implemented a fusion strategy to combine these two modalities (Visual Sentiment Analysis (VSA) and Textual Sentiment Analysis (TSA)) to enhance the accuracy of the proposed method and to provide a more comprehensive understanding of sentiment in multimedia content. In addition, we curated a custom dataset comprising images with associated text labels and sentiments. To ensure accurate labels, we conducted human evaluations involving thirty annotators. Our dataset includes images labeled with negative, neutral, and positive sentiments. Experimental results demonstrated the effectiveness of combining visual and textual features for sentiment analysis. The findings from this research hold promising implications for real-world applications, such as sentiment analysis in social media, product reviews, and marketing campaigns, where both images and text play a significant role in conveying emotional context. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Low Resource Degraded Quality Document Image Binarization – Domain Adaptation is the Way 22

Low Resource Degraded Quality Document Image Binarization –...

引用

Proceedings of the Thirteenth Indian Conference on computer vision, Graphics and Image Processing

作者： Ahana Kundu Ujjwal Bhattacharya Computer Vision and Pattern Recognition Indian Statistical Institute IN Computer Vision and Pattern Recognition ISI Kolkata IN

ISBN: (纸本)9781450398220

Usually, image binarization plays a crucial role in automatic analysis of degraded documents from their captured images. However, this binarization task is often difficult due to a number of reasons including the high similarity between noisy background and faded foreground pixels. The study presented here is particularly focused on binarization of images of low-resource degraded quality documents based on a set of recently collected image samples of several rare, ancient and severely degraded quality printed documents of Bangla, the 2nd and 5th most popular script of India and the world respectively. This new collection of degraded document image samples will henceforth be referred as ’ISIDDI2’ and it consists of 139 images of Bangla old document pages. Samples of ’ISIDDI’, another existing database of degraded Bangla document image samples, have also been used in the present study. A novel deep architecture based on attention UNET++ with dilated convolution operation is proposed for this binarization task. The model is optimized using human vision perceptible distance reciprocal distortion (DRD) loss. Since the binarization ground truth of samples of both ’ISIDDI2’ and ’ISIDDI’ are not available, the proposed network has been trained using samples of DIBCO and H-DIBCO datasets and an unsupervised domain adaptation (DA) module is employed for adaptation of the proposed architecture to the degradation patterns of ’ISIDDI2’ or ’ISIDDI’ samples. The proposed binarization strategy includes certain post-processing operation based on a modified k-neighbourhood based approach for recovery of broken characters. Results of our extensive experimentation show that the proposed binarization strategy has improved the binarization output of state-of-the-art methods on both ISIDDI2 and ISIDDI datasets. Also, its performance on well-known DIBCO samples is satisfactory.

关键词： UNET++ degraded document image domain adaptation Image binarization attention

来源：评论

学校读者我要写书评

暂无评论

Low Resource Degraded Quality Document Image Binarization - Domain Adaptation is the Way 13

Low Resource Degraded Quality Document Image Binarization - ...

引用

13th Indian Conference on computer vision, Graphics, and Image Processing, ICVGIP 2022

作者： Kundu, Ahana Bhattacharya, Ujjwal Computer Vision and Pattern Recognition Indian Statistical Institute West Bengal Kolkata India

ISBN: (纸本)9781450398237

Usually, image binarization plays a crucial role in automatic analysis of degraded documents from their captured images. However, this binarization task is often difficult due to a number of reasons including the high similarity between noisy background and faded foreground pixels. The study presented here is particularly focused on binarization of images of low-resource degraded quality documents based on a set of recently collected image samples of several rare, ancient and severely degraded quality printed documents of Bangla, the 2nd and 5th most popular script of India and the world respectively. This new collection of degraded document image samples will henceforth be referred as 'ISIDDI2' and it consists of 139 images of Bangla old document pages. Samples of 'ISIDDI', another existing database of degraded Bangla document image samples, have also been used in the present study. A novel deep architecture based on attention UNET++ with dilated convolution operation is proposed for this binarization task. The model is optimized using human vision perceptible distance reciprocal distortion (DRD) loss. Since the binarization ground truth of samples of both 'ISIDDI2' and 'ISIDDI' are not available, the proposed network has been trained using samples of DIBCO and H-DIBCO datasets and an unsupervised domain adaptation (DA) module is employed for adaptation of the proposed architecture to the degradation patterns of 'ISIDDI2' or 'ISIDDI' samples. The proposed binarization strategy includes certain post-processing operation based on a modified k-neighbourhood based approach for recovery of broken characters. Results of our extensive experimentation show that the proposed binarization strategy has improved the binarization output of state-of-the-art methods on both ISIDDI2 and ISIDDI datasets. Also, its performance on well-known DIBCO samples is satisfactory. © 2022 ACM.

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement 20th

Bootstrap Diffusion Model Curve Estimation for High Resolut...

引用

20th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2023

作者： Huang, Jiancheng Liu, Yifan Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9789819970247

Learning-based methods have attracted a lot of research attention and led to significant improvements in low-light image enhancement. However, most of them still suffer from two main problems: expensive computational cost in high resolution images and unsatisfactory performance in simultaneous enhancement and denoising. To address these problems, we propose BDCE, a bootstrap diffusion model that exploits the learning of the distribution of the curve parameters instead of the normal-light image itself. Specifically, we adopt the curve estimation method to handle the high-resolution images, where the curve parameters are estimated by our bootstrap diffusion model. In addition, a denoise module is applied in each iteration of curve adjustment to denoise the intermediate enhanced result of each iteration. We evaluate BDCE on commonly used benchmark datasets, and extensive experiments show that it achieves state-of-the-art qualitative and quantitative performance. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2024.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

PCGAUNet: Pixel Correlation and Gaussian Attention Driven Network for Text Segmentation 27th

PCGAUNet: Pixel Correlation and Gaussian Attention Driven Ne...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Roy, Ayush Palaiahnakote, Shivakumara Pal, Umapada Antonacopoulos, Apostolos Ramachandra, Raghavendra Computer Vision and Pattern Recognition Indian Statistical Institute Kolkata India School of Science Engineering and Environment University of Salford Manchester United Kingdom Norwegian University of Science and Technology Trondheim Norway

ISBN: (纸本)9783031784460

Text-line segmentation is still considered challenging for complex background scene images. The success of text detection and recognition depends on the success of the text segmentation. This study presents a new method for text segmentation to facilitate reliable detection and recognition. Therefore, we introduce a new model called Pixel Correlation and Gaussian Attention Driven Network (PCGAUNet) for text segmentation. To extract pixel correlation, we modified the MultiResUnet architecture, which leverages pixel-wise correlation to effectively highlight foreground pixels. In addition, the proposed model utilizes the prior spatial statistics of bottleneck features to create a learnable Gaussian distribution, which guides the decoder for accurate text segmentation. Experimental results on three standard scene text segmentation datasets, ICDAR13 FST, Total Text, and COCO-TS, show that the proposed model outperforms existing methods. Furthermore, the results for the underwater dataset UTS-55 show that our model is robust and generic. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Image segmentation

来源：评论

学校读者我要写书评

暂无评论

XLSI: A New Xception and Log Polar Transform Based Approach for Scene Text Script Identification 27th

XLSI: A New Xception and Log Polar Transform Based Approach ...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Roy, Ayush Palaiahnakote, Shivakumara Pal, Umapada Antonacopoulos, Apostolos Blumenstein, Michael Computer Vision and Pattern Recognition Indian Statistical Institute Kolkata Kolkata India School of Science Engineering and Environment University of Salford Manchester United Kingdom University of Technology Sydney Sydney Australia

ISBN: (纸本)9783031784941

Script identification of text in natural scene images is challenging due to complex backgrounds, arbitrary orientations, different-sized characters, varying fonts, and multiple styles. Most existing methods are not effective in the presence of the above challenges. This paper introduces a new approach based on the Xception architecture and employing the log-polar transformed original image as an additional input, enabling the extraction of cues that are invariant to rotation, scaling, but are sensitive to script. The rationale behind the proposed work is that the combination of global features with text style features makes a significant difference in discriminating between different scripts. To combine the features extracted by Xception from the input image and log the polar transform of the input image, the proposed method introduces a style-enhanced fusion block. In addition, to further improve the performance of script identification, the proposed approach uses a new receptive channel selective focal attention module. Comparative evaluation results on three benchmark datasets, namely CVSI 2015, SIW-13, and MLe2e show that the proposed method outperforms the state-of-the-art in terms of classification rate. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Can an Image Tell the Tale: Looking Beyond the Haze to Determine PM2.5 Concentration

Can an Image Tell the Tale: Looking Beyond the Haze to Deter...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Sagarnil Chakraborty Sarbani Palit Harsh Bhandari Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

In the past few decades, due to rapid growth in industrialization, there has been a steady decline of the air quality along with an increase in the concentration of PM2.5. It is well known that a high PM2.5 concentration adversely affects the environment and has hazardous impact on public health. Therefore, it is important to monitor the PM2.5 concentration at geographic locations where air quality monitoring stations are presently unavailable, especially in remote areas. Unfortunately, installation of such monitoring stations requires expensive instruments and constant maintenance. This paper presents a novel, low-cost and portable alternative to such measurement apparatus, where PM2.5 concentration is estimated based on image input obtained from a camera. The novelty of the present work lies in its hitherto unique attempt to capture information regarding PM2.5 content from visibility degradation caused by the pollutant which is further supplemented by important knowledge regarding seasonal and diurnal variation of it. The latter has a crucial role in the prevention of confounding effects arising from the presence of other weather and atmospheric elements. Another important highlight is the use of a full reference image metric as a feature, for which a powerful, dehazing algorithm has been employed. The results obtained are extremely promising, providing a close to accurate estimation of PM2.5 concentration with R 2 values far higher than reported in the literature. To summarize, the construction of a unique feature set, together with an appropriate machine learning algorithm, lead to an extremely reliable, stand-alone approach, deployable on a hand-held device such as a mobile and is a very significant contribution indeed of the proposed approach.

关键词： Machine learning algorithms Prevention and mitigation Instruments Neural networks Air quality Pollution measurement Maintenance

来源：评论

学校读者我要写书评

暂无评论

Aquaformer: Underwater Image Enhancement via Adaptive Transformer

Aquaformer: Underwater Image Enhancement via Adaptive Transf...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Harsh Bhandari Sarbani Palit Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Water causes degradation of quality in optical images captured underwater due to its physical properties of absorption and scattering. This degradation is further aggravated by the increase in water depth and the presence of contaminated water. Transformers in the vision domain have made a quantum leap in many vision tasks such as detection, and segmentation but yet to make any progress in enhancing degraded underwater images. We propose a transformer-based model named “Aquaformer” which makes four major contributions: an adaptive layer normalization, replacement of masked cyclic shift with symmetric padding in window partitioning, a novel aggregation mechanism, and an adjustable fusion approach. These succeed in making the model a very powerful one, producing significantly better performance compared to the latest state-of-the-art methods. Testing on multiple benchmark datasets, employing both quantitative and qualitative metrics, establishes its supremacy.

关键词： Degradation Water Adaptation models Scattering Benchmark testing Transformers Water pollution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：