检索结果-内蒙古大学图书馆

3rd International Conference on pattern recognition and Artificial Intelligence, ICPRAI 2022

作者： Kumar, Amish Shivakumara, Palaiahnakote Pal, Umapada Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia

ISBN: (纸本)9783031090363

Accurate multiple license plate detection without affecting speed, occlusion, low contrast and resolution, uneven illumination effect and poor quality is an open challenge. This study presents a new Robust Deep Model for Multiple License Plate Number Detection (RDMMLND). To cope with the above-mentioned challenges, the proposed work explores YOLOv5 for detecting vehicles irrespective of type to reduce background complexity in the images. For detected vehicle regions, we propose a new combination of Wavelet Decomposition and Phase Congruency Model (WD-PCM), which enhances the license plate number region such that the license plate number detection step fixes correct bounding boxes for each vehicle of the input images. The proposed model is tested on our own dataset containing video images and standard dataset of license plate number detection to show that the proposed model is useful and effective for multiple license plate number detection. Furthermore, the proposed method is tested on natural scene text datasets to show that the proposed method can be extended to address the challenges of natural scene text detection. © 2022, Springer Nature Switzerland AG.

关键词： Wavelet decomposition

来源：评论

学校读者我要写书评

暂无评论

A New Transformer-Based Approach for Text Detection in Shaky and Non-shaky Day-Night Video 1

引用

7th Asian Conference on pattern recognition, ACPR 2023

作者： Halder, Arnab Shivakumara, Palaiahnakote Pal, Umapada Lu, Tong Blumenstein, Michael Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kula Lumpur Malaysia Nanjing University Nanjing China University of Technology Sydney Sydney Australia

ISBN: (数字)9783031476372

ISBN: (纸本)9783031476365

Text detection in shaky and non-shaky videos is challenging because of variations caused by day and night videos. In addition, moving objects, vehicles, and humans in the video make the text detection problems more challenging in contrast to text detection in normal natural scene images. Motivated by the capacity of the transformer, we propose a new transformer-based approach for detecting text in both shaky and non-shaky day-night videos. To reduce the effect of object movement, poor quality, and other challenges mentioned above, the proposed work explores temporal frames for obtaining activation frames based on similarity and dissimilarity measures. For estimating similarity and dissimilarity, our method extracts luminance, contrast, and structural features. The activation frames are fed to the transformer which comprises an encoder, decoder, and feed-forward network for text detection in shaky and non-shaky day-night video. Since it is the first work, we create our own dataset for experimentation. To show the effectiveness of the proposed method, experiments are conducted on a standard dataset called the ICDAR-2015 video dataset. The results on our dataset and standard dataset show that the proposed model is superior to state-of-the-art methods in terms of recall, precision, and F-measure. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

关键词： Chemical activation

来源：评论

学校读者我要写书评

暂无评论

A New StyleGAN Latent Space Based Model for Image Style Transfer 27th

A New StyleGAN Latent Space Based Model for Image Style Tran...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Dey, Rakesh Palaiahnakote, Shivakumara Bhattacharya, Saumik Chanda, Sukalpa Pal, Umapada Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata Baranagar India School of Science Engineering and Environment University of Salford Salford United Kingdom Department of Electrical and Electronics Communication IIT-Kharagpur Kharagpur India Østfold University College Halden Norway

ISBN: (纸本)9783031781940

Cross-domain image style transfer task is an attractive topic for several applications, such as image-to-image style transfer, text-to-image style transfer, artistic image generation, etc. In cross-domain image style transfer tasks (e.g., image-to-image style transfer, artistic image-to-image style transfer, text-to-image style transfer, etc.), training becomes cumbersome due to differences in data distribution across domains and complex model architectures. Unlike existing domain adaptation and domain-independent methods that focus on robust and sufficient feature extraction, this work focuses on disentangling the latent space through latent optimization. For this purpose here we propose a new idea of styled image generation from the latent space of StyleGAN which works well for image-to-image and text-to-image style transfer. We critically analyzed the low-dimensional latent structure and its effect on cross-domain image style transfer tasks and finally proposed a method along with a latent optimizing procedure to overcome the problem of style transfer. The experimental results on different standard datasets show that the proposed model is robust, effective, and generic compared to the state-of-the-art models. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

Diving into the Depths of Spotting Text in Multi-Domain Nois...

引用

IEEE International Conference on Robotics and Automation (ICRA)

作者： Alloy Das Sanket Biswas Umapada Pal Josep Lladós Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Computer Science Department Computer Vision Center Universitat Autónoma de Barcelona Barcelona Spain

ISBN: (数字)9798350384574

ISBN: (纸本)9798350384581

When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models have been released in our Github.

关键词： Training Codes Accuracy Superresolution Benchmark testing Transformers Data models

来源：评论

学校读者我要写书评

暂无评论

A New Lightweight Attention-Based Model for Emotion recognition on Distorted Social Media Face Images 1

引用

7th Asian Conference on pattern recognition, ACPR 2023

作者： Roy, Ayush Shivakumara, Palaiahnakote Pal, Umapada Gornale, Shivanand S. Liu, Cheng-Lin Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kula Lumpur Malaysia Department of Computer Science Rani Channamma University Belagavi India Institute of Automation of Chinese Academy of Sciences Beijing China

ISBN: (数字)9783031476372

ISBN: (纸本)9783031476365

The recognition of human emotions remains a challenging task for social media images. This is due to distortions created by different social media conflict with the minute changes in facial expression. This study presents a new model called the Global Spectral-Spatial Attention Network (GSSAN), which leverages both local and global information simultaneously. The proposed model comprises a shallow Convolutional Neural Network (CNN) with an MBResNext block, which integrates the features extracted from MobileNet, ResNet, and DenseNet for extracting local features. In addition, to strengthen the discriminating power of the features, GSSAN incorporates Fourier features, which provide essential cues for minute changes in the face images. To test the proposed model for emotion recognition using social media images, we conduct experiments on two widely-used datasets: FER-2013 and AffectNet. The same benchmark datasets are uploaded and downloaded to create a distorted social media image dataset to test the proposed model. Experiments on distorted social media images dataset show that the model surpasses the accuracy of SOTA models by 0.69% for FER-2013 and 0.51% for AffectNet social media datasets. The same inference can be drawn from the experiments on standard datasets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

PulmoNetX: A Hybrid vision Transformer Approach for Multi-scale Spatial Feature Reduction in Pneumonia Classification 27th

PulmoNetX: A Hybrid Vision Transformer Approach for Multi-sc...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Lasker, Asifuzzaman Ghosh, Mridul Obaidullah, Sk Md Chakraborty, Chandan Roy, Kaushik Pal, Umapada Department of Computer Science and Engineering Aliah University Kolkata700160 India Department of Computer Science Shyampur Siddheswari Mahavidyalaya Howrah711312 India Department of Computer Science and Engineering National Institute of Technical Teachers’ Training and Research Kolkata700106 India Department of Computer Science West Bengal State University Barasat700126 India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India

ISBN: (纸本)9783031781650

An innovative deep learning structure, PulmoNetX, integrates the capabilities of Convolutional Neural Networks (CNNs) and vision Transformers (ViTs) to enhance pneumonia detection in chest X-ray imagery. During preprocessing, images are normalized in size, converted to grayscale, and subjected to contrast amplification to emphasize essential features. PulmoNetX employs a hybrid methodology to capture both the local and global characteristics of images, leading to significant advancements in diagnosing different pneumonia types, such as COVID-19-induced, viral, and bacterial pneumonia. Comparative studies reveal that PulmoNetX surpasses leading vision Transformer models in terms of precision, recall, F1-score, and overall accuracy, highlighting its advanced processing abilities and its promise as an effective diagnostic tool in X-ray lung disease detection. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

DITS: A New Domain Independent Text Spotter 27th

DITS: A New Domain Independent Text Spotter

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Purkayastha, Kunal Sarkar, Shashwat Shivakumara, Palaiahnakote Pal, Umapada Ghosal, Palash Wu, Xiao-Jun Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata Baranagar India School of Science Engineering and Environment University of Salford Manchester United Kingdom Department of Information Technology Sikkim Manipal Institute of Technology Sikkim Manipal University Sikkim Gangtok India Jiangnan University Wuxi China

ISBN: (纸本)9783031784941

Text spotting in diverse domains, such as drone-captured images, underwater scenes, and natural scene images, presents unique challenges due to variations in image quality, contrast, text appearance, background complexity, and external factors like water surface reflections and weather conditions. While most existing approaches focus on text spotting in natural scene images, we propose a Domain-Independent Text Spotter (DITS) that effectively handles multiple domains. We innovatively combine the Real-ESRGAN, developed for regular image enhancement, with the DeepSolo, developed for scene text spotting, in an end-to-end fashion for text detection and spotting on images of different domains. The key idea behind our approach is that improving image quality and text-spotting accuracy are complementary goals. Real-ESRGAN enhances image quality, making the text more discernible, while DeepSolo, a state-of-the-art text spotting model, accurately localizes and recognizes text in the enhanced images. We validate the superiority of our proposed model by evaluating it on datasets from drone, underwater, and scene domains (ICDAR 2015, CTW1500, and Total-Text). Furthermore, we demonstrate the domain independence of our model through cross-domain validation, where we train on one domain and test on others. Our dataset and code will be publicly available on GitHub. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Image quality

来源：评论

学校读者我要写书评

暂无评论

Compound attention embedded dual channel encoder-decoder for ms lesion segmentation from brain MRI

引用

Multimedia Tools and Applications 2024年 1-33页

作者： Ghosal, Palash Roy, Abhijit Agarwal, Rohit Purkayastha, Kunal Sharma, Aaditya Lochan Kumar, Amish Department of Information Technology Sikkim Manipal Institute of Technology Sikkim Manipal University Majhitar Sikkim Rangpo737136 India Department of Computer Science and Engineering National Institute of Technology West Bengal Durgapur713209 India Department of Computer Science and Engineering Sikkim Manipal Institute of Technology Sikkim Manipal University Majhitar Sikkim Rangpo737136 India Computer Vision and Pattern Recognition Unit Indian Statistical Institute West Bengal Kolkata700108 India

Multiple Sclerosis (MS) lesions’ segmentation is difficult due to their variegated sizes, shapes, and intensity levels. Besides this, the class imbalance problem and the availability of limited annotated data samples obstruct the building of highly efficient deep learning-based models. Though researchers have made many attempts to design efficient deep learning-based models, the maximum Dice Coefficient achieved by their models is fairly below the acceptable level of 0.70. The possible reason may be due to the inability to capture sufficient local and global features of the lesions required for accurate segmentation. In this paper, we present a new deep-learning architecture based on compound attention for MS lesion segmentation from magnetic resonance images that handles the challenges of capturing the local and global variable features of the MS lesions. The proposed model is equipped with a dual-channel CNN encoder-decoder structure employing residual connections in one channel and residual channel and spatial attention in the other. The residual connections alleviate the vanishing gradient problem and pass the fine-grained information through the channels, which is crucial for pixel-wise prediction. The attention mechanism used in a channel helps to capture long-range dependencies. Thus, the complete model leverages rich global and local information through the two channels for lesion segmentation. The problem of data imbalance is handled by using the Focal Tversky loss function. Through rigorous evaluation using 3-fold cross-validation on the MICCAI 2016 challenge dataset, our model demonstrates superior performance, achieving a Dice Coefficient of 0.73, surpassing state-of-the-art models in both qualitative and quantitative assessments. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词：

来源：评论

学校读者我要写书评

暂无评论

BYOLMED3D: SELF-SUPERVISED REPRESENTATION LEARNING OF MEDICAL VIDEOS USING GRADIENT ACCUMULATION ASSISTED 3D BYOL FRAMEWORK

arXiv

引用

arXiv 2022年

作者： Manna, Siladittya Dey, Rakesh Chakraborty, Souvik Computer Vision and Pattern Recognition Unit Indian Statistical Institue Kolkata India Computer Vision Researcher

Applications on Medical Image Analysis suffer from acute shortage of large volume of data properly annotated by medical experts. Supervised Learning algorithms require a large volumes of balanced data to learn robust representations. Often supervised learning algorithms require various techniques to deal with imbalanced data. Self-supervised learning algorithms on the other hand are robust to imbalance in the data and are capable of learning robust representations. In this work, we train a 3D BYOL self-supervised model using gradient accumulation technique to deal with the large number of samples in a batch generally required in a self-supervised algorithm. To the best of our knowledge, this work is one of the first of its kind in this domain. We compare the results obtained through our experiments in the downstream task of ACL Tear Injury detection with the contemporary self-supervised pre-training methods and also with ResNet3D-18 initialized with the Kinetics-400 pre-trained weights. From the downstream task experiments, it is evident that the proposed framework outperforms the existing baselines. © 2022, CC0.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

arXiv

引用

arXiv 2023年

作者： Das, Alloy Biswas, Sanket Pal, Umapada Lladós, Josep Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India The Computer Vision Center Computer Science Department Universitat Autónoma de Barcelona Barcelona Spain

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：