检索结果-内蒙古大学图书馆

IEEE Transactions on Artificial Intelligence 2024年第4期5卷 1613-1623页

作者： Manna, Siladittya Bhattacharya, Saumik Pal, Umapada Indian Statistical Institute Computer Vision and Pattern Recognition Unit Kolkata700108 India Indian Institute of Technology Kharagpur Department of Electronics and Electrical Communication Engineering Kharagpur721302 India

In medical image analysis, the cost of acquiring high-quality data and annotation by experts is a barrier in many medical applications. Most of the techniques used are based on a supervised learning framework and require a large amount of annotated data to achieve satisfactory performance. As an alternative, in this article, we propose a self-supervised learning approach for learning the spatial anatomical representations from the frames of magnetic resonance (MR) video clips for the diagnosis of knee medical conditions. The pretext model learns meaningful context-invariant spatial representations. The downstream task in our article is a class-imbalanced multilabel classification. Different experiments show that the features learned by the pretext model provide competitive performance in the downstream task. Moreover, the efficiency and reliability of the proposed pretext model in learning representations of minority classes without applying any strategy toward imbalance in the dataset can be seen from the results. To the best of our knowledge, this work is the first of its kind in showing the effectiveness and reliability of self-supervised learning algorithms in imbalanced multilabel classification tasks on MR scans. © 2020 IEEE.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

A new multimodal sentiment analysis for images containing textual information

引用

Multimedia Tools and Applications 2024年 1-30页

作者： Ahuja, Garvit Alaei, Alireza Pal, Umapada School of Computer Science and Engineering Manipal University Jaipur Jaipur India Faculty of Science and Engineering Southern Cross University Gold Coast Australia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

Multimodal sentiment analysis on images with textual content is a research area aiming to understand the sentiment conveyed by visual and textual elements in the images. While multimodal sentiment analysis on images and text (reviews) has its own challenges, the combination of textual and visual content in the form of images presents new challenges as well as opportunities. In this research work, we proposed a multimodal sentiment analysis method that works on images incorporating textual elements. In the textual sentiment analysis model, we initially employed a recognition system to extract textual data from input images. Our proposed multimodal method is based on transfer learning, considering two pre-trained deep learning models, Xception, and RoBERTa, to extract features from both visual and textual content from multimedia images. We then implemented a fusion strategy to combine these two modalities (Visual Sentiment Analysis (VSA) and Textual Sentiment Analysis (TSA)) to enhance the accuracy of the proposed method and to provide a more comprehensive understanding of sentiment in multimedia content. In addition, we curated a custom dataset comprising images with associated text labels and sentiments. To ensure accurate labels, we conducted human evaluations involving thirty annotators. Our dataset includes images labeled with negative, neutral, and positive sentiments. Experimental results demonstrated the effectiveness of combining visual and textual features for sentiment analysis. The findings from this research hold promising implications for real-world applications, such as sentiment analysis in social media, product reviews, and marketing campaigns, where both images and text play a significant role in conveying emotional context. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Can an Image Tell the Tale: Looking Beyond the Haze to Determine PM2.5 Concentration

Can an Image Tell the Tale: Looking Beyond the Haze to Deter...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Sagarnil Chakraborty Sarbani Palit Harsh Bhandari Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

In the past few decades, due to rapid growth in industrialization, there has been a steady decline of the air quality along with an increase in the concentration of PM2.5. It is well known that a high PM2.5 concentration adversely affects the environment and has hazardous impact on public health. Therefore, it is important to monitor the PM2.5 concentration at geographic locations where air quality monitoring stations are presently unavailable, especially in remote areas. Unfortunately, installation of such monitoring stations requires expensive instruments and constant maintenance. This paper presents a novel, low-cost and portable alternative to such measurement apparatus, where PM2.5 concentration is estimated based on image input obtained from a camera. The novelty of the present work lies in its hitherto unique attempt to capture information regarding PM2.5 content from visibility degradation caused by the pollutant which is further supplemented by important knowledge regarding seasonal and diurnal variation of it. The latter has a crucial role in the prevention of confounding effects arising from the presence of other weather and atmospheric elements. Another important highlight is the use of a full reference image metric as a feature, for which a powerful, dehazing algorithm has been employed. The results obtained are extremely promising, providing a close to accurate estimation of PM2.5 concentration with R 2 values far higher than reported in the literature. To summarize, the construction of a unique feature set, together with an appropriate machine learning algorithm, lead to an extremely reliable, stand-alone approach, deployable on a hand-held device such as a mobile and is a very significant contribution indeed of the proposed approach.

关键词： Machine learning algorithms Prevention and mitigation Instruments Neural networks Air quality Pollution measurement Maintenance

来源：评论

学校读者我要写书评

暂无评论

Aquaformer: Underwater Image Enhancement via Adaptive Transformer

Aquaformer: Underwater Image Enhancement via Adaptive Transf...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Harsh Bhandari Sarbani Palit Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Water causes degradation of quality in optical images captured underwater due to its physical properties of absorption and scattering. This degradation is further aggravated by the increase in water depth and the presence of contaminated water. Transformers in the vision domain have made a quantum leap in many vision tasks such as detection, and segmentation but yet to make any progress in enhancing degraded underwater images. We propose a transformer-based model named “Aquaformer” which makes four major contributions: an adaptive layer normalization, replacement of masked cyclic shift with symmetric padding in window partitioning, a novel aggregation mechanism, and an adjustable fusion approach. These succeed in making the model a very powerful one, producing significantly better performance compared to the latest state-of-the-art methods. Testing on multiple benchmark datasets, employing both quantitative and qualitative metrics, establishes its supremacy.

关键词： Degradation Water Adaptation models Scattering Benchmark testing Transformers Water pollution

来源：评论

学校读者我要写书评

暂无评论

A New Contrastive Learning Based Model for Estimating Degree of Multiple Personality Traits Using Social Media Posts 1

引用

7th Asian Conference on pattern recognition, ACPR 2023

作者： Biswas, Kunal Shivakumara, Palaiahnakote Pal, Umapada Sarkar, Ram Jadavpur University Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kula Lumpur Malaysia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (数字)9783031476372

ISBN: (纸本)9783031476365

Estimating the degree of multiple personality traits in a single image is challenging due to the presence of multiple people, occlusion, poor quality etc. Unlike existing methods which focus on the classification of a single personality using images, this work focuses on estimating different personality traits using a single image. We believe that when the image contains multiple persons and modalities, one can expect multiple emotions and expressions. This work separates given input images into different faces of people, recognized text, meta-text and background information using face segmentation, text recognition and scene detection techniques. Contrastive learning is explored to extract features from each segmented region based on clustering. The proposed work fuses textual and visual features extracted from the image for estimating the degree of multiple personality traits. Experimental results on our benchmark datasets show that the proposed model is effective and outperforms the existing methods. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

关键词： Image segmentation

来源：评论

学校读者我要写书评

暂无评论

A New HourGlass Network for Detecting Text in Shaky and Non-shaky Video Frames 27th

A New HourGlass Network for Detecting Text in Shaky and Non-...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Halder, Arnab Palaiahnakote, Shivakumara Pal, Umapada Blumenstein, Michael Gornale, Shivanand S. Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata Kolkata India School of Science Engineering and Environment University of Salford Manchester United Kingdom University of Technology Sydney Sydney Australia Department of Computer Science Rani Channamma University Belagavi India

ISBN: (纸本)9783031784972

Shaky and non-shaky videos are quite common in real-time applications such as surveillance and monitoring vehicles and human movements in protected areas. As a result, text detection in such videos is a formidable challenge due to motion blur, noise, shaky cameras, poor quality and poor visibility. In contrast to existing text detection methods, which focus on text detection in scene images or specific types of images, the present work focuses on text detection in both shaky and non-shaky video frames. Inspired by the impressive performance of the HourGlass network for adverse situations, we explore the HourGlass network for successful text detection in shaky, non-shaky video frames and natural scene images. To improve the performance of the HourGlass network, we employ the Real-Time Model (RTMHead) for predicting text precisely and the Cross Stage Partial Network (CSPNet), which is a neck architecture for robust feature fusion. In addition, the integration of the SiLu activation function with the HourGlass network improves the discriminative power ability. To test the efficacy of the proposed method, we conducted experiments on shaky and non-shaky video frames, as well as ICDAR 2015 video frames. Furthermore, to show the effectiveness of the proposed method, we used Total-Text scene images for experimentation. The results on different datasets and a comparative study with the state-of-the-art models show that the proposed model outperforms the existing methods. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

A CNN Based Framework for Unistroke Numeral recognition in Air-Writing

arXiv

引用

arXiv 2023年

作者： Roy, Prasun Ghosh, Subhankar Pal, Umapada Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom. This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework. Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marker and track the trajectory of the marker tip. A pre-trained CNN is then used to classify the gesture. The recognition accuracy is further improved using transfer learning with the newly acquired data. The performance of the system varies significantly on the illumination condition due to color-based segmentation. In a less fluctuating illumination condition, the system is able to recognize isolated unistroke numerals of multiple languages. The proposed framework has achieved 97.7%, 95.4% and 93.7% recognition rates in person independent evaluations on English, Bengali and Devanagari numerals, respectively. © 2023, CC BY.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

DATR: Domain Agnostic Text Recognizer 27th

DATR: Domain Agnostic Text Recognizer

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Purkayastha, Kunal Sarkar, Shashwat Palaiahnakote, Shivakumara Pal, Umapada Ghosal, Palash Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India School of Science Engineering and Environment University of Salford Manchester United Kingdom Department of Information Technology Sikkim Manipal Institute of Technology Sikkim Manipal University Sikkim Gangtok India

ISBN: (纸本)9783031784460

Recognizing text extracted from multiple domains is complex and challenging because complexities vary from one domain to another. Most existing methods focus either on natural scene text or specific text type but not text of multiple domains, namely, scene, underwater, and drone texts. In addition, the state-of-the-art models ignore the vital cues that exist in multiple instances of the text. This paper presents a new method called the Student-Teacher-Assistant (STA) network, which involves dual CLIP models to exploit cues in multiple text instances. The model that uses ResNet50 in its image encoder is called helper CLIP, while the model that uses ViT in its image encoder is called primary CLIP. The proposed work processes both models simultaneously to extract visual and textual features through image and text encoders. Our work uses cosine similarity for the randomly chosen input image to detect instances similar to the input image. The input and similar instances are supplied to primary and helper CLIPs for visual and textual feature extraction. The outputs of dual CLIPs are fused in a different way through the alignment step for recognizing text accurately, irrespective of domains. To demonstrate the proposed model’s significance, experiments are conducted on a set of standard natural scene text datasets (regular and irregular), underwater images, and drone images. The results on three different domains show that the proposed model outperforms the state-of-the-art recognition models. The datasets and code for public use in training and testing shall be made available on GitHub. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Underwater imaging

来源：评论

学校读者我要写书评

暂无评论

A New Impressive and Expressive Features Based Model for Personality Traits Identification 27th

A New Impressive and Expressive Features Based Model for Per...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Biswas, Kunal Palaiahnakote, Shivakumara Pal, Umapada Chanda, Sukalpa Wu, Xiao-Jun Jadavpur University Kolkata India School of Science Engineering and Environment University of Salford Manchester United Kingdom Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Østfold University College Halden Norway Jiangnan University Wuxi China

ISBN: (纸本)9783031781858

Personality traits identification is challenging due to unpredictable changes in the foreground and background of images. In this work, we propose a new deep learning model for personality traits image classification by exploiting visual and textual information from dating images. We believe that there is a strong correlation between the features (visual + textual) of dating images and personality traits images. The features which draw attention are defined as impressive features of dating images and features that convey emotions are expressive features of personality traits images. This observation motivated us to combine the features of dating images to improve the performance of personality traits images classification. To affirm the above observation, we propose multiple convolutional layers followed by max pool layers which extract features from dating and personality traits images simultaneously. To integrate the strengths of impressive and expressive features, the proposed work introduces a dual fusion approach, which fuses features and modalities at different levels. The experiments are conducted on different standard datasets of personality traits images to demonstrate the effectiveness of impressive features in terms of personality trait image classification. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

A New Attention Based UNet and Gated Edge Attention Network for Retinal Vessel Segmentation 27th

A New Attention Based UNet and Gated Edge Attention Network ...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Roy, Ayush Palaiahnakote, Shivakumara Pal, Umapada Chanda, Sukalpa Department of Electrical Engineering Jadavpur University Kolkata India School of Science Engineering and Environment University of Salford Salford United Kingdom Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata Kolkata India Østfold University College Halden Norway

ISBN: (纸本)9783031781032

Early diagnosis of retinal diseases is crucial for preventing blindness. However, due to background variations and degradation in the images, retinal vessel segmentation has become challenging. As a result, accurate segmentation of the retinal vessels is essential for enhancing diagnosis to identify the disease. To achieve this, inspired by the special ability of the attention mechanism, which detects vital regions, and UNet, which segments the vital region, we propose a combination of a new attention mechanism and modified UNet for segmenting vessels in retina images. In the proposed segmentation model, the convolutional blocks have been modified to capture multiscale spatial information by varying the convolution dilation rates. Similarly, the Trainable tanh activation (T-Tanh) is adapted in a new way to identify changes in the flow of the feature gradients to differentiate between the retinal vessel pixels and the background. Furthermore, to make the segmentation robust, the Gated Edge Attention (GEA) network is proposed. The effectiveness of the segmentation is demonstrated by testing on two benchmark datasets, namely, STARE and CHASE. The results show that the performance of the proposed method is superior to the state-of-the-art methods. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Ophthalmology

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：