检索结果-内蒙古大学图书馆

2019 IEEE International Conference on Robotics and Biomimetics, ROBIO 2019

作者： Chen, Xiaolong Zhang, Zhengfu Qiao, Yu Zhang, Pu Guo, Lanqing Chen, Wenrui Chen, Chen Fu, Bin Guangzhou Power Supply Bureau Co. Ltd. Guangzhou China Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

ISBN: (纸本)9781728163215

In this paper, we introduce the Equipment Nameplate Dataset, a large dataset for scene text detection and recognition. Natural images in this dataset are taken in the wild and thus this dataset includes various intra-class inconsistency such as ill illumination conditions and partly occluded, which makes our dataset more challenging than other datasets. In order to make people train detection and recognition model separately, we annotate our dataset not only word instance, but also text region by using rectangle bounding boxes. Some detailed statistics information about our dataset will be given so that people can use them to analyse and develop their own models. Moreover, we use our dataset to test some famous detection and recognition models and present the corresponding results in order to make researcher compare them with their own models. Dataset will be publicly available on the website. © 2019 IEEE.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Orientation robust scene text recognition in natural scene

Orientation robust scene text recognition in natural scene

引用

2019 IEEE International Conference on Robotics and Biomimetics, ROBIO 2019

作者： Chen, Xiaolong Zhang, Zhengfu Qiao, Yu Lai, Jiangyu Jiang, Jian Zhang, Zeyu Fu, Bin Guangzhou Power Supply Bureau Co. Ltd. Guangzhou China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

ISBN: (纸本)9781728163215

In recent years, scene text recognition has achieved significant improvement and various state-of-the-art recognition approaches have been proposed. This paper focused on recognizing text in natural photos of equipment nameplates, which has wide applications in industrial automations. This task only receives little attentions in previous works. The challenge of this problem comes from multi-orientation, curved, noisy and blurry text patches in equipment nameplates. To address this problem, we propose a deep model for text recognition in multi-oriented nameplates, namely, Orientation Robust Scene Text recognition (ORSTR). Specifically, our model employs a rectification module to transform curved, distorted or multi-orientation text to near-horizontal text with a carefully designed rectification module. Once the near-horizontal text has been generated, recognition network will output the predictions of text patches. Our scene text recognition model achieves 90.8% recognition accuracy on equipment nameplate dataset which outperforms previous scene text recognition model (CRNN) about 0.8%. Several extensive experiments have been conducted to verify the effectiveness of our model. © 2019 IEEE.

关键词： Nameplates

来源：评论

学校读者我要写书评

暂无评论

Multi-dimension modulation for image restoration with dynamic controllable residual Learning

arXiv

引用

arXiv 2019年

作者： He, Jingwen Dong, Chao Qiaoy, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China

Based on the great success of deterministic learning, to interactively control the output effects has attracted increasingly attention in the image restoration field. The goal is to generate continuous restored images by adjusting a controlling coefficient. Existing methods are restricted in realizing smooth transition between two objectives, while the real input images may contain different kinds of degradations. To make a step forward, we present a new problem called multi-dimension (MD) modulation, which aims at modulating output effects across multiple degradation types and levels. Compared with the previous single-dimension (SD) modulation, the MD task has three distinct properties, namely joint modulation, zero starting point and unbalanced learning. These obstacles motivate us to propose the first MD modulation framework - CResMD with newly introduced controllable residual connections. Specifically, we add a controlling variable on the conventional residual connection to allow a weighted summation of input and residual. The exact values of these weights are generated by a condition network. We further propose a new data sampling strategy based on beta distribution to balance different degradation types and levels. With the corrupted image and the degradation information as inputs, the network could output the corresponding restored image. By tweaking the condition vector, users are free to control the output effects in MD space at test time. Extensive experiments demonstrate that the proposed CResMD could achieve excellent performance on both SD and MD modulation tasks. Copyright © 2019, The Authors. All rights reserved.

关键词： Modulation

来源：评论

学校读者我要写书评

暂无评论

A New Forged Handwriting Detection Method Based on Fourier Spectral Density and Variation 5th

A New Forged Handwriting Detection Method Based on Fourier S...

引用

5th Asian Conference on pattern recognition, ACPR 2019

作者： Kundu, Sayani Shivakumara, Palaiahnakote Grouver, Anaica Pal, Umapada Lu, Tong Blumenstein, Michael Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia National Key Lab for Novel Software Technology Nanjing University Nanjing China Faculty of Engineering and Information Technology University of Technology Sydney Ultimo Australia

ISBN: (纸本)9783030414030

Use of handwriting words for person identification in contrast to biometric features is gaining importance in the field of forensic applications. As a result, forging handwriting is a part of crime applications and hence is challenging for the researchers. This paper presents a new work for detecting forged handwriting words because width and amplitude of spectral distributions have the ability to exhibit unique properties for forged handwriting words compared to blurred, noisy and normal handwriting words. The proposed method studies spectral density and variation of input handwriting images through clustering of high and low frequency coefficients. The extracted features, which are invariant to rotation and scaling, are passed to a neural network classifier for the classification for forged handwriting words from other types of handwriting words (like blurred, noisy and normal handwriting words). Experimental results on our own dataset, which consists of four handwriting word classes, and two benchmark datasets, namely, caption and scene text classification and forged IMEI number dataset, show that the proposed method outperforms the existing methods in terms of classification rate. © Springer Nature Switzerland AG 2020.

关键词： Spectral density

来源：评论

学校读者我要写书评

暂无评论

A Spatial Density and Phase Angle Based Correlation for Multi-type Family Photo Identification 5th

A Spatial Density and Phase Angle Based Correlation for Mult...

引用

5th Asian Conference on pattern recognition, ACPR 2019

作者： Grouver, Anaica Shivakumara, Palaiahnakote Kaljahi, Maryam Asadzadeh Chetty, Bhaarat Pal, Umapada Lu, Tong Hemantha Kumar, G. Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Google Developers Group NASDAQ Bangalore India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China University of Mysore MysoreKarnataka India

ISBN: (纸本)9783030412982

Due to change in mindset and living style of humans, the numbers of diversified marriages are increasing all around the world irrespective of race, color, religion and culture. As a result, it is challenging for research community to identify multi type family photos, namely, normal family (family of the same race, religion or culture), multi-culture family (family of different culture, religion or race) from the family and non-family photos (images with friends, colleagues, etc.). In this work, we present a new method that combines spatial density information with phase angle for multi-type family photo classification. The proposed method uses three facial key points, namely, left-eye, right-eye and nose, for the features which are based on color, roughness and wrinkleless of faces, these are prominent for extracting unique cues for classification. The correlations between features of Left & Right Eyes, Left Eye & Nose and Right Eye & Nose are computed for all the faces in an image. This results in feature vectors for respective spatial density and phase angle information. Furthermore, the proposed method fuses the feature vectors and feeds them to the Convolutional Neural Network (CNN) for the classification of the above-three class problem. Experiments conducted on our database which contains three classes, namely, multi-cultural, normal and non-family images and the benchmark databases (due to Maryam et al. and Wang et al.) which contain two class-family and non-family images, show that the proposed method outperforms the existing methods in terms of classification rate for all the three databases. © 2020, Springer Nature Switzerland AG.

关键词： Database systems

来源：评论

学校读者我要写书评

暂无评论

Neural Transformation Fields for Arbitrary-Styled Font Generation

Neural Transformation Fields for Arbitrary-Styled Font Gener...

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Bin Fu Junjun He Jianjun Wang Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory

Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values. Typically, the FFG approaches follow the style-content disentanglement paradigm, which transfers the target font styles to characters by combining the content representations of source characters and the style codes of reference samples. Most existing methods attempt to increase font generation ability via exploring powerful style representations, which may be a sub-optimal solution for the FFG task due to the lack of modeling spatial transformation in transferring font styles. In this paper, we model font generation as a continuous transformation process from the source character image to the target font image via the creation and dissipation of font pixels, and embed the corresponding transformations into a neural transformation field. With the estimated transformation path, the neural transformation field generates a set of intermediate transformation results via the sampling process, and a font rendering formula is developed to accumulate them into the target font image. Extensive experiments show that our method achieves state-of-the-art performance on few-shot font generation task, which demonstrates the effectiveness of our proposed model. Our implementation is available at: https://***/fubinfb/NTF.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Complex 3D General Object Reconstruction from Line Drawings

Complex 3D General Object Reconstruction from Line Drawings

引用

International Conference on computer vision (ICCV)

作者： Linjie Yang Jianzhuang Liu Xiaoou Tang Department of Information Engineering Chinese University of Hong Kong Shenzhen Key Lab of Computer Vision and Pattern Recognition Chinese Academy of Sciences China

ISBN: (纸本)9781479928415

An important topic in computer vision is 3D object reconstruction from line drawings. Previous algorithms either deal with simple general objects or are limited to only manifolds (a subset of solids). In this paper, we propose a novel approach to 3D reconstruction of complex general objects, including manifolds, non-manifold solids, and nonsolids. Through developing some 3D object properties, we use the degree of freedom of objects to decompose a complex line drawing into multiple simpler line drawings that represent meaningful building blocks of a complex object. After 3D objects are reconstructed from the decomposed line drawings, they are merged to form a complex object from their touching faces, edges, and vertices. Our experiments show a number of reconstruction examples from both complex line drawings and images with line drawings superimposed. Comparisons are also given to indicate that our algorithm can deal with much more complex line drawings of general objects than previous algorithms.

关键词： Solids Three-dimensional displays Manifolds Image reconstruction computer vision Search problems Image edge detection

来源：评论

学校读者我要写书评

暂无评论

Finding discriminative filters for specific degradations in blind super-resolution 21

Finding discriminative filters for specific degradations in ...

引用

Proceedings of the 35th International Conference on Neural Information Processing Systems

作者： Liangbin Xie Xintao Wang Chao Dong Zhongang Qi Ying Shan Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and University of Chinese Academy of Sciences and ARC Lab Tencent PCG ARC Lab Tencent PCG Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and Shanghai AI Laboratory Shanghai China

ISBN: (纸本)9781713845393

Recent blind super-resolution (SR) methods typically consist of two branches, one for degradation prediction and the other for conditional restoration. However, our experiments show that a one-branch network can achieve comparable performance to the two-branch scheme. Then we wonder: how can one-branch networks automatically learn to distinguish degradations? To find the answer, we propose a new diagnostic tool – Filter Attribution method based on Integral Gradient (FAIG). Unlike previous integral gradient methods, our FAIG aims at finding the most discriminative filters instead of input pixels/features for degradation removal in blind SR networks. With the discovered filters, we further develop a simple yet effective method to predict the degradation of an input image. Based on FAIG, we show that, in one-branch blind SR networks, 1) we are able to find a very small number of (1%) discriminative filters for each specific degradation; 2) The weights, locations and connections of the discovered filters are all important to determine the specific network function. 3) The task of degradation prediction can be implicitly realized by these discriminative filters without explicit supervised learning. Our findings can not only help us better understand network behaviors inside one-branch blind SR networks, but also provide guidance on designing more efficient architectures and diagnosing networks for blind SR.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images 2nd

A New DCT-FFT Fusion Based Method for Caption and Scene Text...

引用

2nd International Conference on pattern recognition and Artificial Intelligence, ICPRAI 2020

作者： Nandanwar, Lokesh Shivakumara, Palaiahnakote Manna, Suvojit Pal, Umapada Lu, Tong Blumenstein, Michael Faculty of Computer Science and Information Technology University of Malayasia Kuala Lumpur Malaysia Department of Computer Science and Engineering Jalpaiguri Government Engineering College Jalpaiguri India Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China University of Technology Sydney Ultimo Australia

ISBN: (纸本)9783030598297

Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification. © 2020, Springer Nature Switzerland AG.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reco...

引用

International Conference on computer vision (ICCV)

作者： Xiaoxing Zeng Xiaojiang Peng Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology University of Chinese Academy of Sciences China

ISBN: (数字)9781728148038

ISBN: (纸本)9781728148045

Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. DF2Net decomposes the reconstruction process into three stages, each of which is processed by an elaborately-designed network, namely D-Net, F-Net, and Fr-Net. D-Net exploits a U-net architecture to map the input image to a dense depth image. F-Net refines the output of D-Net by integrating features from depth and RGB domains, whose output is further enhanced by Fr-Net with a novel multi-resolution hypercolumn architecture. In addition, we introduce three types of data to train these networks, including 3D model synthetic data, 2D image reconstructed data, and fine facial images. We elaborately exploit different datasets (or combination) together with well-designed losses to train different networks. Qualitative evaluation indicates that our DF2Net can effectively reconstruct subtle facial details such as small crow's feet and wrinkles. Our DF2Net achieves performance superior or comparable to state-of-the-art algorithms in qualitative and quantitative analyses on real-world images and the BU-3DFE dataset. Code and the collected 70K image-depth data will be publicly available.

关键词： Three-dimensional displays Face Image reconstruction Shape Solid modeling Two dimensional displays Training data

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：