检索结果-内蒙古大学图书馆

International Conference on pattern recognition

作者： Lokesh Nandanwar Palaiahnakote Shivakumara Ramachandra Raghavendra Tong Lu Umapada Pal Daniel Lopresti Nor Badrul Anuar Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Faculty of Information Technology and Electrical Engineering IIK NTNU Norway National Key Lab for Novel Software Technology Nanjing University Nanjing China Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Computer Science & Engineering Lehigh University Bethlehem PA USA

Methods developed for normal 2D text detection do not work well for text that is rendered using decorative, 3D effects, etc. This paper proposes a new method for classification of 2D and 3D natural scene text images so that an appropriate recognition method can be chosen accordingly based on the classification results for better performance. The proposed method explores local gradient differences for obtaining candidate pixels, which represent a stroke. To study the spatial distribution of candidate pixels, we propose a measure, called COLD, which is denser for pixels toward the center of strokes and scattered for non-stroke pixels. This observation leads us to introduce mass features for extracting the regular spatial pattern of COLD, which indicates a 2D text image. The extracted features are fed into a Neural Network (NN) for classification. The proposed method is tested on (i) a new dataset introduced in this work (ii) a second dataset assembled from standard natural scene datasets (iii) Non-Text Image datasets which does not contain text, rather it contains objects. Experimental results of the proposed method on images with text and non-text show that the proposed method is independent of text. The proposed approach improves text detection and recognition performance significantly after classification.

关键词： Three-dimensional displays Image recognition Graphical models Text recognition Artificial neural networks Feature extraction Standards

来源：评论

学校读者我要写书评

暂无评论

Exploring emotion features and fusion strategies for audio-video emotion recognition

arXiv

引用

arXiv 2020年

作者： Zhou, Hengshun Meng, Debin Zhang, Yuanyuan Peng, Xiaojiang Du, Jun Wang, Kai Qiao, Yu University of Science and Technology of China China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China

The audio-video based emotion recognition aims to classify a given video into basic emotions. In this paper, we describe our approaches in EmotiW 2019, which mainly explores emotion features and feature fusion strategies for audio and visual modality. For emotion features, we explore audio feature with both speech-spectrogram and Log Mel-spectrogram and evaluate several facial features with different CNN models and different emotion pretrained strategies. For fusion strategies, we explore intra-modal and cross-modal fusion methods, such as designing attention mechanisms to highlights important emotion feature, exploring feature concatenation and factorized bilinear pooling (FBP) for cross-modal feature fusion. With careful evaluation, we obtain 65.5% on the AFEW validation set and 62.48% on the test set and rank second in the challenge. © 2020, CC BY.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

FD-GAN: Generative adversarial networks with fusion-discriminator for single image dehazing

arXiv

引用

arXiv 2020年

作者： Dong, Yu Liu, Yihao Zhang, He Chen, Shifeng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences China Adobe Inc.

Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learning-based dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recover the haze-free image based on the atmospheric scattering model. However, in practice, due to lack of priors and constraints, it is hard to precisely estimate these intermediate parameters. Inaccurate estimation further degrades the performance of dehazing, resulting in artifacts, color distortion and insufficient haze removal. To address this, we propose a fully end-to-end Generative Adversarial Networks with Fusion-discriminator (FD-GAN) for image dehazing. With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts. Moreover, we synthesize a large-scale training dataset including various indoor and outdoor hazy images to boost the performance and we reveal that for learning-based dehazing methods, the performance is strictly influenced by the training data. Experiments have shown that our method reaches state-of-the-art performance on both public synthetic datasets and real-world images with more visually pleasing dehazed results. Copyright © 2020, The Authors. All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Partial Differential Equations is All You Need for Generating Neural Architectures - A Theory for Physical Artificial Intelligence Systems

arXiv

引用

arXiv 2021年

作者： Guo, Ping Huang, Kaizhu Xu, Zenglin Image Processing & Pattern Recognition Lab. Beijing Normal University Beijing100875 China Data Science Research Center Duke Kunshan University Jiangsu Kunshan215316 China School of Computer Science and Technology Harbin Institute of Technology at ShenZhen Peng Cheng National Lab Guangdong Shenzhen510855 China

In this work, we generalize the reaction-diffusion equation in statistical physics, Schrödinger equation in quantum mechanics, and Helmholtz equation in paraxial optics into the neural partial differential equations (NPDE), which can be considered as the fundamental equations in the field of artificial intelligence research. We take the finite difference method to discretize NPDEs for finding the numerical solution. Moreover, the basic building blocks of deep neural network architectures, including multi-layer perceptron, convolutional neural networks, and recurrent neural networks, are generated. The learning strategies, such as Adaptive moment estimation, L-BFGS, pseudoinverse learning algorithms, and partial differential equation constrained optimization, are also developed. This work is of significance in that it presents a clear view of interpretable deep neural networks physically, enables high possibility to be applied to analog computing device design, and paves the road to physical artificial intelligence. © 2021, CC BY-NC-SA.

关键词： Schrodinger equation

来源：评论

学校读者我要写书评

暂无评论

Non-deterministic Behavior of Ranking-Based Metrics When Evaluating Embeddings 2nd

Non-deterministic Behavior of Ranking-Based Metrics When Eva...

引用

2nd International Workshop on Reproducible Research in pattern recognition, RRPR 2018

作者： Nicolaou, Anguelos Dey, Sounak Christlein, Vincent Maier, Andreas Karatzas, Dimosthenis Computer Vision Center Edificio O Campus UAB Bellaterra08193 Spain Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany

ISBN: (纸本)9783030239862

Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits. © Springer Nature Switzerland AG 2019.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

A Survey of the Self Supervised Learning Mechanisms for vision Transformers

arXiv

引用

arXiv 2024年

作者： Khan, Asifullah Sohail, Anabia Fiaz, Mustansar Hassan, Mehdi Afridi, Tariq Habib Marwat, Sibghat Ullah Munir, Farzeen Ali, Safdar Naseem, Hannan Zaheer, Muhammad Zaigham Ali, Kamran Sultana, Tangina Tanoli, Ziaurrehman Akhter, Naeem Pattern Recognition Lab DCIS PIEAS Nilore Islamabad45650 Pakistan PIEAS Nilore Islamabad45650 Pakistan Deep Learning Lab Center for Mathematical Sciences PIEAS Nilore Islamabad45650 Pakistan Center of Secure Cyber-Physical Security Systems Khalifa University Abu Dhabi United Arab Emirates IBM Research United States Department of Computer Science Air University Islamabad Pakistan Department of Computer Science and Engineering Kyung Hee University Global Campus 1732 Gyeonggi-do Yongin17104 Korea Republic of Department of Electrical Engineering and Automation Aalto University Finland Finnish Center of Artificial Center Finland Faculty of Engineering and Green Technology Universiti Tunku Abdul Rahman Malaysia Computer Vision Department Mohamed Bin Zayed University of Artificial Intelligence United Arab Emirates Karachi Pakistan Department of Electronics and Communication Engineering Hajee Mohammad Danesh Science and Technology University Bangladesh HiLIFE University of Helsinki Finland

vision Transformers (ViTs) have recently demonstrated remarkable performance in computer vision tasks. However, their parameter-intensive nature and reliance on large amounts of data for effective performance have shifted the focus from traditional human-annotated lab.ls to unsupervised learning and pretraining strategies that uncover hidden structures within the data. In response to this challenge, self-supervised learning (SSL) has emerged as a promising paradigm. SSL leverages inherent relationships within the data itself as a form of supervision, eliminating the need for manual lab.ling and offering a more scalab.e and resource-efficient alternative for model training. Given these advantages, it is imperative to explore the integration of SSL techniques with ViTs, particularly in scenarios with limited lab.led data. Inspired by this evolving trend, this survey aims to systematically review SSL mechanisms tailored for ViTs. We propose a comprehensive taxonomy to classify SSL techniques based on their representations and pre-training tasks. Additionally, we discuss the motivations behind SSL, review prominent pre-training tasks, and highlight advancements and challenges in this field. Furthermore, we conduct a comparative analysis of various SSL methods designed for ViTs, evaluating their strengths, limitations, and applicability to different scenarios. Copyright © 2024, The Authors. All rights reserved.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution

arXiv

引用

arXiv 2020年

作者： Feng, Ruicheng Guan, Weipeng Qiao, Yu Dong, Chao Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Acedamy of Sciences China Chinese University of Hong Kong Hong Kong

Multi-scale techniques have achieved great success in a wide range of computer vision tasks. However, while this technique is incorporated in existing works, there still lacks a comprehensive investigation on variants of multi-scale convolution in image super resolution. In this work, we present a unified formulation over widely-used multi-scale structures. With this framework, we systematically explore the two factors of multi-scale convolution – feature propagation and cross-scale communication. Based on the investigation, we propose a generic and efficient multi-scale convolution unit – Multi-Scale cross-Scale Share-weights convolution (MS3-Conv). Extensive experiments demonstrate that the proposed MS3-Conv can achieve better SR performance than the standard convolution with less parameters and computational cost. Beyond quantitative analysis, we comprehensively study the visual quality, which show that MS3-Conv behave better to recover high-frequency details. Copyright © 2020, The Authors. All rights reserved.

关键词： Optical resolving power

来源：评论

学校读者我要写书评

暂无评论

A New U-Net Based License Plate Enhancement Model in Night and Day Images 5th

A New U-Net Based License Plate Enhancement Model in Night a...

引用

5th Asian Conference on pattern recognition, ACPR 2019

作者： Chowdhury, Pinaki Nath Shivakumara, Palaiahnakote Raghavendra, Ramachandra Pal, Umapada Lu, Tong Blumenstein, Michael Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Faculty of Information Technology and Electrical Engineering IIK NTNU Gjøvik Norway National Key Lab for Novel Software Technology Nanjing University Nanjing China Faculty of Engineering and Information Technology University of Technology Sydney Ultimo Australia

ISBN: (纸本)9783030414030

A new trend of smart city development opens up many challenges. One such issue is that automatic vehicle driving and detection for toll fee payment in night or limited light environments. This paper presents a new work for enhancing license plates captured in limited or low light conditions such that license plate detection methods can be expanded to detect images at night. Due to the popularity of Convolutional Neural Network (CNN) in solving complex issues, we explore U-Net-CNN for enhancing contrast of license plate pixels. Since the difference between pixels that represent license plates and pixels that represent background is too due to low light effect, the special property of U-Net that extracts context and symmetric of license plate pixels to separate them from background pixels irrespective of content. This process results in image enhancement. To validate the enhancement results, we use text detection methods and based on text detection results we validate the proposed system. Experimental results on our newly constructed dataset which includes images captured in night/low light/limited light conditions and the benchmark dataset, namely, UCSD, which includes very poor quality and high quality images captured in day, show that the proposed method outperforms the existing methods. In addition, the results on text detection by different methods show that the proposed enhancement is effective and robust for license plate detection. © Springer Nature Switzerland AG 2020.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

Structure Function Based Transform Features for Behavior-Oriented Social Media Image Classification 5th

Structure Function Based Transform Features for Behavior-Ori...

引用

5th Asian Conference on pattern recognition, ACPR 2019

作者： Krishnani, Divya Shivakumara, Palaiahnakote Lu, Tong Pal, Umapada Ramachandra, Raghavendra International Institute of Information Technology Naya Raipur Naya RaipurChhattisgarh India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia National Key Lab for Novel Software Technology Nanjing University Nanjing China Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology Trondheim Norway

ISBN: (纸本)9783030414030

Social media has become an essential part of people to reflect their day to day activities including emotions, feelings, threatening and so on. This paper presents a new method for the automatic classification of behavior-oriented images like Bullying, Threatening, Neuroticism-Depression, Neuroticism-Sarcastic, Psychopath and Extraversion of a person from social media images. The proposed method first finds facial key points for extracting features based on a face detection algorithm. Then the proposed method lab.ls face regions as foreground and other than face region as background to define context between foreground and background information. To extract context, the proposed method explores Structural Function based Transform (SFBT) features, which study variations on pixel values. To increase discriminating power of the context features, the proposed method performs clustering to integrate the strength of the features. The extracted features are then fed to Support Vector Machines (SVM) for classification. Experimental results on a dataset of six classes show that the proposed method outperforms the existing methods in terms of confusion matrix and classification rate. © Springer Nature Switzerland AG 2020.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

arXiv

引用

arXiv 2021年

作者： Roy, Swalpa Kumar Paoletti, Mercedes E. Haut, Juan M. Dubey, Shiv Ram Kar, Purbayan Plaza, Antonio Chaudhuri, Bidyut B. The Computer Science and Engineering Alipurduar Government Engineering and Management College 736206 India The Hyperspectral Computing Laboratory Department of Technology of Computers and Communications University of Extremadura Cáceres10003 Spain The Computer Vision and Biometrics Lab Indian Institute of Information Technology Prayagraj Uttar Pradesh Allahabad211015 India The Media Analysis Group Sony Research India Private Limited Karnataka Bangalore560103 India The Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata700108 India

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. Source code: https://***/mhaut/AngularGrad. Copyright © 2021, The Authors. All rights reserved.

关键词： Optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：