检索结果-内蒙古大学图书馆

The equipment nameplate dataset for scene text detection and recognition

学校读者我要写书评

暂无评论

The equipment nameplate dataset for scene text detection and...

2019 IEEE International Conference on Robotics and Biomimetics, ROBIO 2019

作者： Chen, Xiaolong Zhang, Zhengfu Qiao, Yu Zhang, Pu Guo, Lanqing Chen, Wenrui Chen, Chen Fu, Bin Guangzhou Power Supply Bureau Co. Ltd. Guangzhou China Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

ISBN: (纸本)9781728163215

In this paper, we introduce the Equipment Nameplate Dataset, a large dataset for scene text detection and recognition. Natural images in this dataset are taken in the wild and thus this dataset includes various intra-class inconsistency such as ill illumination conditions and partly occluded, which makes our dataset more challenging than other datasets. In order to make people train detection and recognition model separately, we annotate our dataset not only word instance, but also text region by using rectangle bounding boxes. Some detailed statistics information about our dataset will be given so that people can use them to analyse and develop their own models. Moreover, we use our dataset to test some famous detection and recognition models and present the corresponding results in order to make researcher compare them with their own models. Dataset will be publicly available on the website. © 2019 IEEE.

关键词： computer vision

Orientation robust scene text recognition in natural scene

学校读者我要写书评

暂无评论

Orientation robust scene text recognition in natural scene

2019 IEEE International Conference on Robotics and Biomimetics, ROBIO 2019

作者： Chen, Xiaolong Zhang, Zhengfu Qiao, Yu Lai, Jiangyu Jiang, Jian Zhang, Zeyu Fu, Bin Guangzhou Power Supply Bureau Co. Ltd. Guangzhou China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

ISBN: (纸本)9781728163215

In recent years, scene text recognition has achieved significant improvement and various state-of-the-art recognition approaches have been proposed. This paper focused on recognizing text in natural photos of equipment nameplates, which has wide applications in industrial automations. This task only receives little attentions in previous works. The challenge of this problem comes from multi-orientation, curved, noisy and blurry text patches in equipment nameplates. To address this problem, we propose a deep model for text recognition in multi-oriented nameplates, namely, Orientation Robust Scene Text recognition (ORSTR). Specifically, our model employs a rectification module to transform curved, distorted or multi-orientation text to near-horizontal text with a carefully designed rectification module. Once the near-horizontal text has been generated, recognition network will output the predictions of text patches. Our scene text recognition model achieves 90.8% recognition accuracy on equipment nameplate dataset which outperforms previous scene text recognition model (CRNN) about 0.8%. Several extensive experiments have been conducted to verify the effectiveness of our model. © 2019 IEEE.

关键词： Nameplates

Self-grouping convolutional neural networks

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Guo, Qingbei Wu, Xiao-Jun Kittler, Josef Feng, Zhiquan Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence Jiangnan University Wuxi214122 China Shandong Provincial Key Laboratory of Network based Intelligent Computing University of Jinan Jinan250022 China Centre for Vision Speech and Signal Processing University of Surrey GuildfordGU2 7XH United Kingdom

Although group convolution operators are increasingly used in deep convolutional neural networks to improve the computational efficiency and to reduce the number of parameters, most existing methods construct their group convolution architectures by a predefined partitioning of the filters of each convolutional layer into multiple regular filter groups with an equal spatial group size and data-independence, which prevents a full exploitation of their potential. To tackle this issue, we propose a novel method of designing self-grouping convolutional neural networks, called SG-CNN, in which the filters of each convolutional layer group themselves based on the similarity of their importance vectors. Concretely, for each filter, we first evaluate the importance value of their input channels to identify the importance vectors, and then group these vectors by clustering. Using the resulting data-dependent centroids, we prune the less important connections, which implicitly minimizes the accuracy loss of the pruning, thus yielding a set of diverse group convolution filters. Subsequently, we develop two fine-tuning schemes, i.e. (1) both local and global fine-tuning and (2) global only fine-tuning, which experimentally deliver comparable results, to recover the recognition capacity of the pruned network. Comprehensive experiments carried out on the CIFAR-10/100 and ImageNet datasets demonstrate that our self-grouping convolution method adapts to various state-of-the-art CNN architectures, (Figure presented) Figure 1: Evolution of group convolutions. (a) Regular convolution. (b) Regular group convolution. (c) Permuting group convolution. (d) Learned group convolution. (d) Self-grouping convolution. Note that white channels represent the ignored input channels, and gray channels indicate the reused input channels. such as ResNet and DenseNet, and delivers superior performance in terms of compression ratio, speedup and recognition accuracy. We demonstrate the ability of SG-CNN

关键词： Convolution

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Gu, Jinjin Cai, Haoming Chen, Haoyu Ye, Xiaoxing Ren, Jimmy S. Dong, Chao School of Data Science Chinese University of Hong Kong Shenzhen China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China SenseTime Research SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method. Copyright © 2020, The Authors. All rights reserved.

关键词： Image reconstruction

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Pan, Liang Wu, Tong Cai, Zhongang Liu, Ziwei Yu, Xumin Rao, Yongming Lu, Jiwen Zhou, Jie Xu, Mingye Luo, Xiaoyuan Fu, Kexue Gao, Peng Wang, Manning Wang, Yali Qiao, Yu Zhou, Junsheng Wen, Xin Xiang, Peng Liu, Yu-Shen Han, Zhizhong Yan, Yuanjie An, Junyi Zhu, Lifa Lin, Changwei Liu, Dongrui Li, Xin Gómez-Fernández, Francisco Wang, Qinlong Yang, Yang S-Lab Nanyang Technological University Singapore SenseTime-CUHK Joint Lab The Chinese University of Hong Kong Hong Kong Sensetime Research Shanghai AI Laboratory China Department of Automation Tsinghua University China University of Chinese Academy of Sciences China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China Digital Medical Research Center School of Basic Medical Science Fudan University China School of Software BNRist Tsinghua University China *** Wayne State University State Key Laboratory for Novel Software Technology Nanjing University China DeepGlint Shanghai Jiao Tong University China Sichuan University China Xi'an Jiaotong University China

As real-scanned point clouds are mostly partial due to occlusions and viewpoints, reconstructing complete 3D shapes based on incomplete observations becomes a fundamental problem for computer vision. With a single incomplete point cloud, it becomes the partial point cloud completion problem. Given multiple different observations, 3D reconstruction can be addressed by performing partial-to-partial point cloud registration. Recently, a large-scale Multi-View Partial (MVP) point cloud dataset has been released, which consists of over 100,000 high-quality virtual-scanned partial point clouds. Based on the MVP dataset, this paper reports methods and results in the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration. In total, 128 participants registered for the competition, and 31 teams made valid submissions. The top-ranked solutions will be analyzed, and then we will discuss future research directions. Copyright © 2021, The Authors. All rights reserved.

关键词： Surface measurement

Orientation Robust Scene Text recognition in Natural Scene*

学校读者我要写书评

暂无评论

Orientation Robust Scene Text Recognition in Natural Scene*

IEEE International Conference on Robotics and Biomimetics

作者： Xiaolong Chen Zhengfu Zhang Yu Qiao Jiangyu Lai Jian Jiang Zeyu Zhang Bin Fu Guangzhou Power Supply Bureau Co. Ltd. Guangzhou China ShenZhen Key Lab of Computer Vision and Pattern Recognition Chinese Academy of Sciences

In recent years, scene text recognition has achieved significant improvement and various state-of-the-art recognition approaches have been proposed. This paper focused on recognizing text in natural photos of equipment nameplates, which has wide applications in industrial automations. This task only receives little attentions in previous works. The challenge of this problem comes from multi-orientation, curved, noisy and blurry text patches in equipment nameplates. To address this problem, we propose a deep model for text recognition in multi-oriented nameplates, namely, Orientation Robust Scene Text recognition (ORSTR). Specifically, our model employs a rectification module to transform curved, distorted or multi-orientation text to near-horizontal text with a carefully designed rectification module. Once the near-horizontal text has been generated, recognition network will output the predictions of text patches. Our scene text recognition model achieves 90 . 8% recognition accuracy on equipment nameplate dataset which outperforms previous scene text recognition model (CRNN) about 0 . 8%. Several extensive experiments have been conducted to verify the effectiveness of our model.

关键词：

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction

学校读者我要写书评

暂无评论

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reco...

International Conference on computer vision (ICCV)

作者： Xiaoxing Zeng Xiaojiang Peng Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology University of Chinese Academy of Sciences China

ISBN: (数字)9781728148038

ISBN: (纸本)9781728148045

Reconstructing the detailed geometric structure from a single face image is a challenging problem due to its ill-posed nature and the fine 3D structures to be recovered. This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. DF2Net decomposes the reconstruction process into three stages, each of which is processed by an elaborately-designed network, namely D-Net, F-Net, and Fr-Net. D-Net exploits a U-net architecture to map the input image to a dense depth image. F-Net refines the output of D-Net by integrating features from depth and RGB domains, whose output is further enhanced by Fr-Net with a novel multi-resolution hypercolumn architecture. In addition, we introduce three types of data to train these networks, including 3D model synthetic data, 2D image reconstructed data, and fine facial images. We elaborately exploit different datasets (or combination) together with well-designed losses to train different networks. Qualitative evaluation indicates that our DF2Net can effectively reconstruct subtle facial details such as small crow's feet and wrinkles. Our DF2Net achieves performance superior or comparable to state-of-the-art algorithms in qualitative and quantitative analyses on real-world images and the BU-3DFE dataset. Code and the collected 70K image-depth data will be publicly available.

关键词： Three-dimensional displays Face Image reconstruction Shape Solid modeling Two dimensional displays Training data

TTPP: Temporal transformer with progressive prediction for efficient action anticipation

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Wang, Wen Peng, Xiaojiang Su, Yanzhou Qiao, Yu Cheng, Jian School of Information and Communication Engineering University of Electronic Science and Technology of China Chengdu Sichuan611731 China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab. Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society

Video action anticipation aims to predict future action categories from observed frames. Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states, and predict future actions from the hidden representations. It is well known that the recurrent pipeline is inefficient in capturing long-term information which may limit its performance in predication task. To address this problem, this paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction (TTPP) framework, which repurposes a Transformer-style architecture to aggregate observed features, and then leverages a light-weight network to progressively predict future features and actions. Specifically, predicted features along with predicted probabilities are accumulated into the inputs of subsequent prediction. We evaluate our approach on three action datasets, namely TVSeries, THUMOS-14, and TV-Human-Interaction. Additionally we also conduct a comprehensive study for several popular aggregation and prediction strategies. Extensive results show that TTPP not only outperforms the state-of-the-art methods but also more efficient. Copyright © 2020, The Authors. All rights reserved.

关键词： Forecasting

Learning to learn a cold-start sequential recommender

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Huang, Xiaowen Sang, Jitao Yu, Jian Xu, Changsheng School of Computer and Information Technology Beijing Key Lab of Traffic Data Analysis and Mining Beijing Jiaotong University Haidian Qu Shi Beijing China National Lab of Pattern Recognition Institute of Automation Chinese Academy of Sciences 95 Zhongguancun Rd Haidian Qu Shi Beijing China School of Artificial Intelligence University of Chinese Academy of Sciences 80 Zhongguancun Rd Haidian Qu Shi Beijing China Peng Cheng Laboratory Shenzhen China

The cold-start recommendation is an urgent problem in contemporary online applications. It aims to provide users whose behaviors are literally sparse with as accurate recommendations as possible. Many data-driven algorithms, such as the widely used matrix factorization, underperform because of data sparseness. This work adopts the idea of meta-learning to solve the user's cold-start recommendation problem. We propose a meta-learning based cold-start sequential recommendation framework called metaCSR, including three main components: Diffusion Representer for learning better user/item embedding through information diffusion on the interaction graph;Sequential Recommender for capturing temporal dependencies of behavior sequences;Meta Learner for extracting and propagating transferable knowledge of prior users and learning a good initialization for new users. metaCSR holds the ability to learn the common patterns from regular users' behaviors and optimize the initialization so that the model can quickly adapt to new users after one or a few gradient updates to achieve optimal performance. The extensive quantitative experiments on three widely-used datasets show the remarkable performance of metaCSR in dealing with user cold-start problem. Meanwhile, a series of qualitative analysis demonstrates that the proposed metaCSR has good generalization. Copyright © 2021, The Authors. All rights reserved.

关键词： Factorization