检索结果-内蒙古大学图书馆

Deep zero-shot learning for scene sketch

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Xie, Yao Xu, Peng Ma, Zhanyu Pattern Recognition and Intelligent System Lab. Beijing University of Posts and Telecommunications

We introduce a novel problem of scene sketch zero-shot learning (SSZSL), which is a challenging task, since (i) different from photo, the gap between common semantic domain (e.g., word vector) and sketch is too huge to exploit common semantic knowledge as the bridge for knowledge transfer, and (ii) compared with single-object sketch, more expressive feature representation for scene sketch is required to accommodate its high-level of abstraction and complexity. To overcome these challenges, we propose a deep embedding model for scene sketch zero-shot learning. In particular, we propose the augmented semantic vector to conduct domain alignment by fusing multi-modal semantic knowledge (e.g., cartoon image, natural image, text description), and adopt attention-based network for scene sketch feature learning. Moreover, we propose a novel distance metric to improve the similarity measure during testing. Extensive experiments and ablation studies demonstrate the benefit of our sketch-specific design. Copyright © 2019, The Authors. All rights reserved.

关键词： Embeddings

Symmetry-Enhanced Attention Network for Acute Ischemic Infarct Segmentation with Non–Contrast CT Images

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Liang, Kongming Han, Kai Li, Xiuli Cheng, Xiaoqing Li, Yiming Wang, Yizhou Yu, Yizhou Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Deepwise AI Lab Beijing China Department of Medical Imaging Jinling Hospital Nanjing University School of Medicine Jiangsu Nanjing China Department of Computer Science and Technology Peking University Beijing China The University of Hong Kong Pokfulam Hong Kong

Quantitative estimation of the acute ischemic infarct is crucial to improve neurological outcomes of the patients with stroke symptoms. Since the density of lesions is subtle and can be confounded by normal physiologic changes, anatomical asymmetry provides useful information to differentiate the ischemic and healthy brain tissue. In this paper, we propose a symmetry enhanced attention network (SEAN) for acute ischemic infarct segmentation. Our proposed network automatically transforms an input CT image into the standard space where the brain tissue is bilaterally symmetric. The transformed image is further processed by a U-shape network integrated with the proposed symmetry enhanced attention for pixel-wise labelling. The symmetry enhanced attention can efficiently capture context information from the opposite side of the image by estimating long-range dependencies. Experimental results show that the proposed SEAN outperforms some symmetry-based state-of-the-art methods in terms of both dice coefficient and infarct localization. Copyright © 2021, The Authors. All rights reserved.

关键词： Computerized tomography

NiuEM: A Nested-iterative Unsupervised Learning Model for Single-particle Cryo-EM Image Processing

学校读者我要写书评

暂无评论

NiuEM: A Nested-iterative Unsupervised Learning Model for Si...

2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

作者： Hu, Rui Cai, Jiaming Zheng, Wangjie Yang, Yang Shen, Hong-Bin Shanghai Jiao Tong University Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Department of Computer Science and Engineering Shanghai200240 China Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai200240 China Shanghai Jiao Tong University Department of Bioinformatics and Biostatistics Shanghai200240 China

ISBN: (纸本)9781728162157

Cryo-electron microscopy (cryo-EM) has become a mainstream technology for solving spatial structures of biomacromolecules, while the processing of cryo-EM images is a very challenging task. One of the great challenges is the high noise in the images. A common method is to cluster the images with close projecting angles to get mean images, which are used for 3D reconstruction. However, due to the extremely low signal-to-noise-ratio, common clustering methods often fail to obtain high-quality mean images, leading to poorly reconstructed structures. In this study, we present a new unsupervised learning framework, called NiuEM, to discriminate images captured from different angles and yield cluster-mean images. NiuEM first generates pseudo-labels and then exploits both contrastive loss and cross-entropy loss for training convolutional layers to learn feature representations. Moreover, the pseudo-labels are updated iteratively to enhance the reliability of labels. We assess the performance of NiuEM on four data sets via both visualized and quantitative experiments. Especially, two kinds of metrics are adopted to measure the performance, regarding the clustering quality and the resolution of reconstructed 3D models, respectively. The experimental results show that NiuEM achieves very competitive clustering accuracy in the comparison with the state-of-the-art image clustering methods. Moreover, the cluster mean images yielded by NiuEM lead to better initial 3D models compared with the mainstream reconstruction tools. © 2020 IEEE.

关键词： Image reconstruction

Research on Real-time Video Action Classification Based on Three-Dimensional Convolutional Neural Network

学校读者我要写书评

暂无评论

Research on Real-time Video Action Classification Based on T...

IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC)

作者： Mingli Yu Chuang Zhang Ming Wu Pattern Recognition and Intelligent System Lab BUPT Beijing China

This paper describes a model for performing action classification in real-time video streaming. This model can simultaneously analyze the spatio-temporal information of video under the constraint of low delay. In addition, in order to prevent the model from judging motionless segments in the video as motion, the model in this article is equipped with the ability of distinguish the segments of motion from the stationary ones. The experimental results show that the model can complete the action classification task with little delay, which ensures that the classification result can be output in real time with the constant input of the video image.

关键词： Streaming media Optical flow Real-time systems Optical buffering Convolutional neural networks Image segmentation

Histogram transform-based speaker identification

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Ma, Zhanyu Yu, Hong Pattern Recognition and Intelligent System Lab Beijing University of Posts and Telecommunications Beijing China

A novel text-independent speaker identification (SI) method is proposed. This method uses the Mel-frequency Cepstral coefficients (MFCCs) and the dynamic information among adjacent frames as feature sets to capture speaker’s characteristics. In order to utilize dynamic information, we design super-MFCCs features by cascading three neighboring MFCCs frames together. The probability density function (PDF) of these super-MFCCs features is estimated by the recently proposed histogram transform (HT) method, which generates more training data by random transforms to realize the histogram PDF estimation and recedes the commonly occurred discontinuity problem in multivariate histograms computing. Compared to the conventional PDF estimation methods, such as Gaussian mixture models, the HT model shows promising improvement in the SI performance. Copyright © 2018, The Authors. All rights reserved.

关键词： Gaussian distribution

Language identification with deep bottleneck features

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Ma, Zhanyu Yu, Hong Pattern Recognition and Intelligent System Lab Beijing University of Posts and Telecommunications Beijing China

In this paper we proposed an end-to-end short utterances speech language identification(SLD) approach based on a Long Short Term Memory (LSTM) neural network which is special suitable for SLD application in intelligent vehicles. Features used for LSTM learning are generated by a transfer learning method. Bottle-neck features of a deep neural network (DNN) which are trained for mandarin acoustic-phonetic classification are used for LSTM training. In order to improve the SLD accuracy of short utterances a phase vocoder based time-scale modification(TSM) method is used to reduce and increase speech rated of the test utterance. By splicing the normal, speech rate reduced and increased utterances, we can extend length of test utterances so as to improved improved the performance of the SLD system. The experimental results on AP17-OLR database shows that the proposed methods can improve the performance of SLD, especially on short utterance with 1s and 3s durations. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

TA4REC: Recurrent Neural Networks with Time Attention Factors for Session-based Recommendations

学校读者我要写书评

暂无评论

TA4REC: Recurrent Neural Networks with Time Attention Factor...

International Joint Conference on Neural Networks

作者： Yu Sun Peize Zhao Honggang Zhang Pattern Recognition and Intelligent System lab Beijing University of Posts and Telecommunications Beijing China

Recommender systems show increasingly importance with the development of E-commerce, news and multimedia applications. Traditional recommendation algorithms such as collaborative-filtering-based methods and graph-based methods mainly use items' original attributes and relationships between items and users, ignoring items' chronological order in browsing sessions. In recent years, RNN-based methods show their superiority when dealing with the sequential data, and some modified RNN models have been proposed. However, these RNN models only use the sequence order of items and neglect items' browsing time information. It is widely accepted that users tend to spend more time on their interested items, and these interested items are always closely related to users' current target. Based on the above view, items' browsing time is an important feature in recommendations. In this paper, we propose a modified RNN-based recommender system called TA4Rec, which can recommend the probable Item that may be clicked in the next moment. Our main contribution is to introduce a method to calculate the time-attention factors from browsing items' duration time and add time-attention factors to the RNN-based model. We conduct experiments on RecSys Challenge 2015 dataset and the result shows that TA4Rec model has gained obvious improvement on session-based recommendations than the classic session-based recommender method.

关键词： Logic gates Training Recommender systems Standards Recurrent neural networks Logistics

Image Caption via Visual Attention Switch on DenseNet

学校读者我要写书评

暂无评论

Image Caption via Visual Attention Switch on DenseNet

IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC)

作者： Yanlong Hao Jiyang Xie Zhiqing Lin Pattern Recognition and Intelligent System Lab. Beijing University of Posts and Telecommunications Beijing China

We introduce a novel approach that is used to convert images into the corresponding language descriptions. This method follows the most popular encoder-decoder architecture. The encoder uses the recently proposed densely convolutional neural network (DenseNet) to extract the feature maps. Meanwhile, the decoder uses the long short time memory (LSTM) to parse the feature maps to descriptions. We predict the next word of descriptions by taking the effective combination of feature maps with word embedding of current input word by “visual attention switch”. Finally, we compare the performance of the proposed model with other baseline models and achieve good results.

关键词： Feature extraction Decoding Visualization Switches Training Dictionaries Data models

Real-time Lesion Detection of Cardiac Coronary Artery Using Deep Neural Networks

学校读者我要写书评

暂无评论

Real-time Lesion Detection of Cardiac Coronary Artery Using ...

IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC)

作者： Tianming Du Xuqing Liu Honggang Zhang Bo Xu Pattern Recognition and Intelligent System Lab Beijing University of Posts and Telecommunications Beijing China CIT Chinese Academy of Medical Sciences

In the field of cardiac arterial interventional therapy, coronary angiography imaging provides key information to physicians for treatment strategy selection, while the lesion identification process is time-consuming and error-prone even for experienced doctors. This paper proposes a method for the automatic detection of lesion in cardiac coronary angiography based on the deep learning and convolution neural network for the very first time. We used 2925 medical images for building the model. Several lesions exist on the vessel of each image. We will regard these lesion areas as objects that are different from other background areas. We designed a model based on the convolution neural network, applying some advanced building block including CReLU, Inception and other advanced technology such as batch normalization, residual connections, skip-layer connection in our network model. After training, the network model can distinguish the difference between a lesion area and a normal vessel area (background), which can detect the location of the coronary artery lesion in real time without any manual intervention. For the stenosis lesion, the recall rate of detection achieves 0.88.

关键词： Lesions Arteries Convolution Training Task analysis Biomedical imaging