检索结果-内蒙古大学图书馆

A new unified method for detecting text from marathon runners and sports players in video

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Nag, Sauradip Shivakumara, Palaiahnakote Pal, Umapada Lu, Tong Blumenstein, Michael Kalyani Government Engineering College Kalyani Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China Sydney Australia

Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions. Candidate regions are used for determining the number of temporal frame clusters obtained by K-means clustering on frame differences. This process in turn detects key frames. The proposed method explores Bayesian probability for skin portions using color values at both pixel and component levels of temporal frames, which provides fused images with skin components. Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences between them. We further propose adaptive pixels linking a deep learning model for text detection from torso regions. The proposed method is tested on our own dataset collected from marathon/sports video and three standard datasets, namely, RBNR, MMM and R-ID of marathon images, to evaluate the performance. In addition, the proposed method is also tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text datasets, to show the objectiveness of the proposed method. A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods. Copyright © 2020, The Authors. All rights reserved.

关键词： Pixels

Inconsistency Distillation For Consistency:Enhancing Multi-View Clustering via Mutual Contrastive Teacher-Student Leaning

学校读者我要写书评

暂无评论

Inconsistency Distillation For Consistency:Enhancing Multi-V...

IEEE International Conference on Data Mining (ICDM)

作者： Dunqiang Liu Shu-Juan Peng Xin Liu Lei Zhu Zhen Cui Taihao Li Dept. of Comput. Sci. & Fujian Key Lab. of Big Data Intelligence and Security Huaqiao University Xiamen China Zhejiang Lab Hangzhou China Xiamen Key Lab. of Computer Vision and Pattern Recognition Huaqiao University Xiamen China Key Lab. of Computer Vision and Machine Learning (Huaqiao University) Fujian Province University Xiamen China School of Information Sci. and Eng. Shandong Normal University Jinan China School of Computer Sci. and Eng. Nanjing University of Science and Technology Nanjing China

Multi-view clustering has attracted more attention recently since many real-world data are comprised of different representations or views. Recent multi-view clustering works mainly exploit the instance consistency to obtain the shared representations across different views, and apply a single-view clustering method to perform data partitions. However, these existing methods often ignore the inconsistency of instance associations within the views, which may enlarge the intra-class diversity among the views and therefore degrade the clustering performance. To address this issue, this paper proposes an efficient mutual contrastive teacher-student leaning (MC-TSL) model to enhance the multi-view clustering, which is the first attempt to study the inconsistency distillation for consistency learning. First, the proposed MC-TSL approach exploits a view-specific encoder with two heads, an instance encoding head and a semantic distillation head, respectively, for capturing the consistent and discriminative feature representations. To be specific, the former head exploits a cross-view contrastive learning method to obtain a redundancy-free consistent representation at the instance level, while the latter head designs a mutual teacher-student learning module to capture the intra-view information at semantic level. By training these two heads in an end-to-end manner, the discriminative multi-view embeddings are efficiently obtained and refined by minimizing the weighted sum of the reconstruction loss, contrastive loss and contrast distillation loss. Extensive experiments verify the superiorities of the proposed MC-TSL framework and show its competitive clustering performances.

关键词： Training Learning systems Clustering methods Semantics Encoding Data mining

A new cold feature based handwriting analysis for enthnicity/nationality identification

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Nag, Sauradip Shivakumara, Palaiahnakote Yirui, Wu Pal, Umapada Lu, Tong Kalyani Government Engineering College Kalyani Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia College of Computer and Information Hohai University Nanjing China Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China

Identifying crime for forensic investigating teams when crimes involve people of different nationals is challenging. This paper proposes a new method for ethnicity (nationality) identification based on Cloud of Line Distribution (COLD) features of handwriting components. The proposed method, at first, explores tangent angle for the contour pixels in each row and the mean of intensity values of each row in an image for segmenting text lines. For segmented text lines, we use tangent angle and direction of base lines to remove rule lines in the image. We use polygonal approximation for finding dominant points for contours of edge components. Then the proposed method connects the nearest dominant points of every dominant point, which results in line segments of dominant point pairs. For each line segment, the proposed method estimates angle and length, which gives a point in polar domain. For all the line segments, the proposed method generates dense points in polar domain, which results in COLD distribution. As character component shapes change, according to nationals, the shape of the distribution changes. This observation is extracted based on distance from pixels of distribution to Principal Axis of the distribution. Then the features are subjected to an SVM classifier for identifying nationals. Experiments are conducted on a complex dataset, which show the proposed method is effective and outperforms the existing method. Copyright © 2018, The Authors. All rights reserved.

关键词： Crime

Group shift pointwise convolution for volumetric medical image segmentation

学校读者我要写书评

暂无评论

arXiv 2021年

作者： He, Junjun Ye, Jin Li, Cheng Song, Diping Chen, Wanli Wang, Shanshan Gu, Lixu Qiao, Yu School of Biomedical Engineering Shanghai Jiao Tong University Shanghai China Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Guangdong Shenzhen China Shanghai AI Lab Shanghai China Paul C. Lauterbur Research Center for Biomedical Imaging Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Guangdong Shenzhen China The Chinese University of Hong Kong Hong Kong Peng Cheng Laboratory Guangdong Shenzhen China Pazhou Lab Guangdong Guangzhou China

Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images. Compared with the 2D counterparts, 3D convolutions can capture the spatial context in three dimensions. Nevertheless, models employing 3D convolutions introduce more trainable parameters and are more computationally complex, which may lead easily to model overfitting especially for medical applications with limited available training data. This paper aims to improve the effectiveness and efficiency of 3D convolutions by introducing a novel Group Shift Pointwise Convolution (GSP-Conv). GSP-Conv simplifies 3D convolutions into pointwise ones with 1 × 1 × 1 kernels, which dramatically reduces the number of model parameters and FLOPs (e.g. 27× fewer than 3D convolutions with 3 × 3 × 3 kernels). Naïve pointwise convolutions with limited receptive fields cannot make full use of the spatial image context. To address this problem, we propose a parameter-free operation, Group Shift (GS), which shifts the feature maps along different spatial directions in an elegant way. With GS, pointwise convolutions can access features from different spatial locations, and the limited receptive fields of pointwise convolutions can be compensated. We evaluate the proposed methods on two datasets, PROMISE12 and BraTS18. Results show that our method, with substantially decreased model complexity, achieves comparable or even better performance than models employing 3D convolutions. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

New COLD Feature Based Handwriting Analysis for Enthnicity/Nationality Identification

学校读者我要写书评

暂无评论

New COLD Feature Based Handwriting Analysis for Enthnicity/N...

International Workshop on Frontiers in Handwriting recognition

作者： Sauradip Nag Palaiahnakote Shivakumara Yirui Wu Umapada Pal Tong Lu Kalyani Government Engineering College Kalyani Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia College of Computer and Information Hohai University Nanjing China Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University Nanjing China

Identifying crime for forensic investigating teams when crimes involve people of different nationals is challenging. This paper proposes a new method for ethnicity (nationality) identification based on Cloud of Line Distribution (COLD) features of handwriting components. The proposed method, at first, uses tangent angle of the contour pixels in each row and the mean of intensity values of each row for segmenting text lines. For segmented text lines, we use tangent angle and direction of base lines to remove rule lines in the image. We use polygonal approximation for finding dominant points for contours of edge components. Then the proposed method connects the nearest dominant points of every dominant point, which results in line segments of dominant point pairs. For each line segment, the proposed method estimates angle and length, which gives a point in polar domain. For all the line segments, the proposed method generates dense points in polar domain, which results in COLD distribution. As character component shapes change, according to nationals, the shape of the distribution changes. This observation is extracted based on distance from pixels of distribution to Principal Axis of the distribution. Then the features are subjected to an SVM classifier for identifying nationals. Experiments are conducted on a complex dataset, which show the proposed method is effective and outperforms the existing method.

关键词： Feature extraction Shape Image edge detection Writing Image segmentation Forensics Support vector machines

A novel hybrid convolutional neural network for accurate organ segmentation in 3d head and neck CT images

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Chen, Zijie Li, Cheng He, Junjun Ye, Jin Song, Diping Wang, Shanshan Gu, Lixu Qiao, Yu Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Guangdong Shenzhen China Shanghai AI Lab Shanghai China Shenzhen Yino Intelligence Techonology Co. Ltd. Guangdong Shenzhen China Co. Ltd. Guangdong Shenzhen China Paul C. Lauterbur Research Center for Biomedical Imaging Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Guangdong Shenzhen China School of Biomedical Engineering Shanghai Jiao Tong University Shanghai China Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China Peng Cheng Laboratory Guangdong Shenzhen China Pazhou Lab Guangdong Guangzhou China

Radiation therapy (RT) is widely employed in the clinic for the treatment of head and neck (HaN) cancers. An essential step of RT planning is the accurate segmentation of various organs-at-risks (OARs) in HaN CT images. Nevertheless, segmenting OARs manually is time-consuming, tedious, and error-prone considering that typical HaN CT images contain tens to hundreds of slices. Automated segmentation algorithms are urgently required. Recently, convolutional neural networks (CNNs) have been extensively investigated on this task. Particularly, 3D CNNs are frequently adopted to process 3D HaN CT images. There are two issues with naïve 3D CNNs. First, the depth resolution of 3D CT images is usually several times lower than the in-plane resolution. Direct employment of 3D CNNs without distinguishing this difference can lead to the extraction of distorted image features and influence the final segmentation performance. Second, a severe class imbalance problem exists, and large organs can be orders of times larger than small organs. It is difficult to simultaneously achieve accurate segmentation for all the organs. To address these issues, we propose a novel hybrid CNN that fuses 2D and 3D convolutions to combat the different spatial resolutions and extract effective edge and semantic features from 3D HaN CT images. To accommodate large and small organs, our final model, named Organ-Net2.5D, consists of only two instead of the classic four downsampling operations, and hybrid dilated convolutions are introduced to maintain the respective field. Experiments on the MICCAI 2015 challenge dataset demonstrate that OrganNet2.5D achieves promising performance compared to state-of-the-art methods. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

Local Gradient Difference Features for Classification of 2D-3D Natural Scene Text Images

学校读者我要写书评

暂无评论

Local Gradient Difference Features for Classification of 2D-...

International Conference on pattern recognition

作者： Lokesh Nandanwar Palaiahnakote Shivakumara Ramachandra Raghavendra Tong Lu Umapada Pal Daniel Lopresti Nor Badrul Anuar Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia Faculty of Information Technology and Electrical Engineering IIK NTNU Norway National Key Lab for Novel Software Technology Nanjing University Nanjing China Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Computer Science & Engineering Lehigh University Bethlehem PA USA

Methods developed for normal 2D text detection do not work well for text that is rendered using decorative, 3D effects, etc. This paper proposes a new method for classification of 2D and 3D natural scene text images so that an appropriate recognition method can be chosen accordingly based on the classification results for better performance. The proposed method explores local gradient differences for obtaining candidate pixels, which represent a stroke. To study the spatial distribution of candidate pixels, we propose a measure, called COLD, which is denser for pixels toward the center of strokes and scattered for non-stroke pixels. This observation leads us to introduce mass features for extracting the regular spatial pattern of COLD, which indicates a 2D text image. The extracted features are fed into a Neural Network (NN) for classification. The proposed method is tested on (i) a new dataset introduced in this work (ii) a second dataset assembled from standard natural scene datasets (iii) Non-Text Image datasets which does not contain text, rather it contains objects. Experimental results of the proposed method on images with text and non-text show that the proposed method is independent of text. The proposed approach improves text detection and recognition performance significantly after classification.

关键词： Three-dimensional displays Image recognition Graphical models Text recognition Artificial neural networks Feature extraction Standards

CRNN Based Jersey-Bib Number/Text recognition in Sports and Marathon Images

学校读者我要写书评

暂无评论

CRNN Based Jersey-Bib Number/Text Recognition in Sports and ...

International Conference on Document Analysis and recognition

作者： Sauradip Nag Raghavendra Ramachandra Palaiahnakote Shivakumara Umapada Pal Tong Lu Mohan Kankanhalli Department of Computer Science & Engineering Kalyani Government Engineering College Kalyani India Faculty of Information Technology and Electrical Engineering Norwegian University of Science and Technology Norway Faculty of Computer System and Information Technology University of Malaya Malaysia Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India National Key Lab for Novel Software Technology Nanjing University China Department of computer science National University of Singapore Singapore

The primary challenge in tracing the participants in sports and marathon video or images is to detect and localize the jersey/Bib number that may present in different regions of their outfit captured in cluttered environment conditions. In this work, we proposed a new framework based on detecting the human body parts such that both Jersey Bib number and text is localized reliably. To achieve this, the proposed method first detects and localize the human in a given image using Single Shot Multibox Detector (SSD). In the next step, different human body parts namely, Torso, Left Thigh, Right Thigh, that generally contain a Bib number or text region is automatically extracted. These detected individual parts are processed individually to detect the Jersey Bib number/text using a deep CNN network based on the 2-channel architecture based on the novel adaptive weighting loss function. Finally, the detected text is cropped out and fed to a CNN-RNN based deep model abbreviated as CRNN for recognizing jersey/Bib/text. Extensive experiments are carried out on the four different datasets including both bench-marking dataset and a new dataset. The performance of the proposed method is compared with the state-of-the-art methods on all four datasets that indicates the improved performance of the proposed method on all four datasets.

关键词： Text recognition Sports Image recognition Hip Thigh Torso Biological system modeling

Clustering-based Discriminative Locality Alignment for Face Gender recognition

学校读者我要写书评

暂无评论

Clustering-based Discriminative Locality Alignment for Face ...

IEEE/RSJ International Conference on Intelligent Robotics and Systems

作者： Duo Chen Jun Cheng Dacheng Tao College of Communication Engineering Chongqing University Chongqing 400044 China. He is also with the Shenzhen Key Laboratory of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences. Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen 518055 China. He is also with the Chinese University of Hong Kong and Guangdong Provincial Key Laboratory of Robotics and Intelligent System. Center for Quantum Computation and Intelligent System Faculty of Engineering and Information Technology University of Technology Sydney New South Wales 2007 Australia.

ISBN: (数字)9781467317368

ISBN: (纸本)9781467317375

To facilitate human-robot interactions, human gender information is very important. Motivated by the success of manifold learning for visual recognition, we present a novel clustering-based discriminative locality alignment (CDLA) algorithm to discover the low-dimensional intrinsic submanifold from the embedding high-dimensional ambient space for improving the face gender recognition performance. In particular, CDLA exploits the global geometry through k-means clustering, extracts the discriminative information through margin maximization and explores the local geometry through intra cluster sample concentration. These three properties uniquely characterize CDLA for face gender recognition. The experimental results obtained from the FERET data sets suggest the superiority of the proposed method in terms of recognition speed and accuracy by comparing with several representative methods.

关键词： FERET CDLA comparing recognition (Psychology) Gender Global geometry Compare Face Face